CN116186534B

CN116186534B - Pre-training model updating method and device and electronic equipment

Info

Publication number: CN116186534B
Application number: CN202211665706.0A
Authority: CN
Inventors: 柴业坤; 王硕寰; 孙宇
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-12-23
Filing date: 2022-12-23
Publication date: 2024-02-23
Anticipated expiration: 2042-12-23
Also published as: CN116186534A

Abstract

The invention discloses a method and a device for updating a pre-training model and electronic equipment, relates to the technical field of computers, and particularly relates to the technical field of artificial intelligence such as natural language processing and deep learning. Comprising the following steps: acquiring a first bias, wherein the first bias is based on a second bias updated in a first updating direction; updating bias parameters of a first designated layer in the pre-training model based on the first bias and the second bias to obtain a first updating model and a second updating model; acquiring a first rewarding value corresponding to the first bias and a second rewarding value corresponding to the second bias; thereby generating a third bias; and based on the third bias, returning to execute the operation of acquiring the updated model until the target bias of the first designated layer is acquired. Therefore, the target bias can be determined more quickly through forward inference, and the related data volume is smaller, so that the calculation resource is saved, the time for determining the target bias is saved, the efficiency is improved, and the condition is provided for industrialized deployment.

Description

Pre-training model updating method and device and electronic equipment

Technical Field

The disclosure relates to the technical field of computers, in particular to the technical field of artificial intelligence such as natural language processing and deep learning, and especially relates to a method and a device for updating a pre-training model and electronic equipment.

Background

With the development of computer technology, natural language processing applications are becoming more and more widespread.

In the related art, in order to quickly acquire a target model matched with a current task, a pre-training model may be updated based on training data corresponding to the current task, so as to acquire the target model corresponding to the current task. Therefore, how to reduce the amount of computation for updating the pre-training model to save the computing resources becomes an important research direction.

Disclosure of Invention

The disclosure provides a method and a device for updating a pre-training model and electronic equipment.

In one aspect of the present disclosure, a method for updating a pre-training model is provided, including:

acquiring a first bias and a second bias updated by the first bias based on a first updating direction;

updating bias parameters of a first designated layer in the pre-training model based on the first bias and the second bias respectively to obtain a first updating model and a second updating model;

sample data corresponding to a current task are respectively input into the first updating model and the second updating model to obtain a first rewarding value corresponding to the first bias and a second rewarding value corresponding to the second bias;

Generating a third bias according to the first reward value, the second reward value, the first bias, the second bias and the first updating direction;

and based on the third bias, returning to execute the operation of acquiring the updated model until the target bias of the bias parameter of the first designated layer under the sample data is acquired.

In another aspect of the present disclosure, there is provided an updating apparatus for a pre-training model, including:

the first acquisition module is used for acquiring a first bias and a second bias updated by the first bias based on a first updating direction;

the second acquisition module is used for updating the bias parameters of a first designated layer in the pre-training model based on the first bias and the second bias respectively so as to acquire a first updating model and a second updating model;

the third acquisition module is used for respectively inputting sample data corresponding to the current task into the first updating model and the second updating model so as to acquire a first rewarding value corresponding to the first bias and a second rewarding value corresponding to the second bias;

the generating module is used for generating a third bias according to the first reward value, the second reward value, the first bias, the second bias and the first updating direction;

And a fourth obtaining module, configured to return to performing the operation of obtaining the update model based on the third offset until a target offset of the offset parameter of the first specified layer under the sample data is obtained.

In another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of updating a pre-training model as described in the embodiments of the above aspect.

In another aspect of the disclosure, a non-transitory computer readable storage medium storing computer instructions for causing a computer to execute the method for updating a pre-training model according to the embodiment of the above aspect is provided.

In another aspect of the disclosure, a computer program product is provided, including a computer program, which when executed by a processor implements the method for updating a pre-training model according to the embodiment of the above aspect.

According to the method, the device and the electronic equipment for updating the pre-training model, a first bias can be acquired firstly, a second bias after the first bias is updated based on a first updating direction is acquired, then bias parameters of a first designated layer in the pre-training model are updated based on the first bias and the second bias respectively to acquire the first updating model and the second updating model, sample data corresponding to a current task are input into the first updating model and the second updating model respectively to acquire a first rewarding value corresponding to the first bias and a second rewarding value corresponding to the second bias, a third bias is generated according to the first rewarding value, the second rewarding value, the first bias, the second bias and the first updating direction, and finally the operation of acquiring the updating model is executed back based on the third bias until the target bias of the bias parameters of the first designated layer under the sample data is acquired. Therefore, the target bias corresponding to the bias parameters of the linear processing layer in the pre-training model can be rapidly determined through forward inference, and the related data volume is small, so that the calculation resources are saved, the time for determining the target bias is saved, the efficiency is improved, and the condition is provided for industrialized deployment.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flowchart of a method for updating a pre-training model according to an embodiment of the disclosure;

FIG. 2 is a flowchart of a method for updating a pre-training model according to an embodiment of the disclosure;

FIG. 3 is a schematic structural diagram of an apparatus for updating a pre-training model according to another embodiment of the present disclosure;

FIG. 4 is a block diagram of an electronic device for implementing a method of updating a pre-trained model according to an embodiment of the disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Artificial intelligence is the discipline of studying the process of making a computer mimic certain mental processes and intelligent behaviors (e.g., learning, reasoning, thinking, planning, etc.) of a person, both hardware-level and software-level techniques. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning, deep learning, a big data processing technology, a knowledge graph technology and the like.

Natural language processing is the processing, understanding, and use of human language (e.g., chinese, english, etc.) by a computer, which is an interdisciplinary of computer science and linguistics, and is often referred to as computational linguistics. Since natural language is the fundamental sign of humans as distinguished from other animals. Without language, human thinking is not talking, so natural language processing embodies the highest tasks and boundaries of artificial intelligence, that is, machines achieve true intelligence only when computers have the ability to process natural language.

Deep learning refers to a multi-layer artificial neural network and a method of training it. A neural network takes a large number of matrix numbers as input, weights the matrix numbers by a nonlinear activation method, and then generates another data set as output. Through proper matrix quantity, multiple layers of tissues are linked together to form a neural network 'brain' for precise and complex processing, just like people identify object labeling pictures.

The following describes a method and a device for updating a pre-training model and an electronic device according to an embodiment of the disclosure with reference to the accompanying drawings.

The method for updating the pre-training model provided by the embodiment of the disclosure can be performed by the device for updating the pre-training model provided by the embodiment of the disclosure, and the device can be configured in electronic equipment.

Fig. 1 is a flowchart of a method for updating a pre-training model according to an embodiment of the present disclosure.

As shown in fig. 1, the method for updating the pre-training model may include the following steps:

step 101, acquiring a first bias and a second bias after the first bias is updated based on the first updating direction.

The first bias may be a vector initialized randomly, or may be any vector, etc., which is not limited in this disclosure.

Alternatively, a non-gradient optimization algorithm may be used to sample a set of low-dimensional vectors as the first bias. Alternatively, the vector may be a vector initialized randomly, or may be an arbitrary vector, or the like, which is not limited in the present disclosure.

The first update direction may be a direction in which the number increases or a direction in which the number decreases.

Alternatively, a preset offset corresponding to the first update direction may be added to the first offset to obtain the second offset. Alternatively, a non-gradient optimization algorithm may be used to determine the second bias based on the first update direction.

Wherein, the non-gradient learning algorithm can comprise a natural evolution strategy (Natura l Evo l ut i on Strategy, NES), covariance matrix adaptation (Covar i ance Matr ix Adaptat ion, CMA)), and other evolution algorithms; reinforcement learning algorithms (Po l icy Grad ient), and the like. The present disclosure is not limited in this regard.

Step 102, updating bias parameters of a first designated layer in the pre-training model based on the first bias and the second bias respectively to obtain a first updated model and a second updated model.

The first designated layer may be a linear processing layer in the pre-training model, such as a full-connection layer, which is not limited in this disclosure.

In addition, the pre-training model may be any type of pre-training model, such as BERT (b id i rect ion ecoder representat i ons from transformers, bi-directional encoder) or ELMo (embedd i ngs from l anguage mode l s, language model embedding), transducer model, and the like, which is not limited in this disclosure.

Specifically, the first bias is added to the bias parameters of the first designated layer to obtain updated bias parameters, so as to obtain a first updated model, and the second bias is added to the bias parameters of the first designated layer to obtain updated bias parameters, so as to obtain a second updated model.

Alternatively, since the linear processing layer closest to the output layer in the pre-training model has a larger influence on the model, the specified number of linear processing layers closest to the output layer in the pre-training model may be determined as the first specified layer. Therefore, the linear processing layer with larger influence can be processed first, and the efficiency of acquiring the target model corresponding to the current task can be improved.

Wherein the specified number may be 1, 2, 3, etc. If the specified number can be 3, the 3 linear processing layers whose output layers are closest to each other can be determined as the first specified layer.

Optionally, the first bias and the second bias corresponding to the first designated layer may also be determined according to the number of layers of the first designated layer in the pre-training model.

In addition, the dimensions of the first bias and the second bias may be determined according to the dimensions of the bias parameters in the first specified layer. The dimensions of the first bias and the second bias may be the same as or smaller than the dimensions of the bias parameters. The present disclosure is not limited in this regard.

Step 103, the sample data corresponding to the current task is respectively input into a first updating model and a second updating model to obtain a first rewarding value corresponding to the first bias and a second rewarding value corresponding to the second bias.

It can be appreciated that the method for updating the pre-training model provided by the present disclosure may be suitable for any scenario in which fine adjustment is performed on the pre-training model to obtain a target model corresponding to a task. For example, the present task may be applied to text classification, generation of question-answer pairs, text understanding, image processing tasks, and so forth, which is not limited by the present disclosure.

It will be appreciated that the current task is different, as is the corresponding sample data. For example, the type of the sample data may be various, for example, may be text data, or may also be image data, audio data, or the like.

Optionally, the sample data may be input into the first update model and the second update model respectively, so as to obtain a first prediction label output by the first update model and a first prediction label output by the second update model through processing of the first update model and the second update model, and then the first prediction label and the second prediction label may be matched with a label corresponding to the sample data respectively, so as to determine a difference between the first prediction label and the label, a difference between the second prediction label and the label, determine a first reward value corresponding to the first bias according to the difference between the first prediction label and the label, and determine a second reward value corresponding to the second bias according to the difference between the second prediction label and the label.

For example, a loss function may be used to determine a first loss value between the first predictive tag and the labeling tag, and a second loss value between the second predictive tag and the labeling tag, and then determine a corresponding first prize value according to the first loss value, and determine a corresponding second prize value according to the first loss value. Or, the accuracy, the comprehensive evaluation index, and the like may be determined according to the difference between the first prediction label and the labeling label, and the difference between the second prediction label and the labeling label, respectively, and used as the corresponding reward value, which is not limited in the disclosure.

Step 104, generating a third bias according to the first prize value, the second prize value, the first bias, the second bias, and the first update direction.

Optionally, under the condition that the designated number of values is 1, determining a reward value in a first updating direction according to a difference value between the second reward value and the first reward value, determining a reference bias and a second updating direction from the first bias and the second bias according to the reward value in the first updating direction, and finally generating a third bias based on the reference bias and the second updating direction. Thus, the reference bias and the second update direction are determined based on the change in the prize value, and the third bias is determined based on the second update direction and the reference bias, thereby making the generated third bias more accurate.

Wherein the prize value for the first update direction may reflect whether the first bias changes to a second bias is a positive or a negative stimulus for the pre-training model. If the prize value of the first updating direction is more than 0, the method is forward excitation; if the prize value for the first update direction is less than 0, then the incentive is reversed.

If the prize value in the first update direction is greater than 0, the second bias is better than the first bias in quality and is more consistent with the current task, and therefore, the reference bias is determined to be the second bias. If the prize in the first update direction is less than 0, which indicates that the first bias is of better quality than the second bias and is more consistent with the current task, the reference bias is determined to be the first bias.

If the prize value corresponding to the first update direction is greater than 0, the first update direction is the same as the second update direction, that is, the first update direction is the direction of increasing the value, the second update direction is also the direction of increasing the value, and if the first update direction is the direction of decreasing the value, the second update direction is also the direction of decreasing the value.

If the prize value corresponding to the first update direction is less than 0, the first update direction is opposite to the second update direction. That is, the first update direction is the direction in which the value increases, the second update direction is the direction in which the value decreases, and if the first update direction is the direction in which the value decreases, the second update direction is also the direction in which the value increases.

Optionally, after determining the reference bias and the second update direction, inputting the reference bias and the second update direction into a preset bias generation model to obtain a third bias. Alternatively, an offset corresponding to each element in the reference offset in the second update direction may be determined, and then a third offset may be generated based on the offset corresponding to each element and the reference offset. The present disclosure is not limited in this regard.

The preset bias generation model may be a model obtained based on a non-gradient learning algorithm, and optionally, the first gradient learning algorithm may include: natural evolution strategies (Natura l Evo l ut ion Strategy, NES), covariance matrix adaptation (Covar i ance Matr ix Adaptat ion, CMA)), and the like; reinforcement learning algorithms (Po l i cy Grad i ent), and the like. The present disclosure is not limited in this regard.

Optionally, when the specified number of values is greater than 1, determining a reward value of a first update direction corresponding to each linear processing layer in the first specified layer according to a difference value between the second reward value and the first reward value and a distance between each linear processing layer in the first specified layer and the output layer, determining a reference bias and a second update direction corresponding to each linear processing layer from the first bias and the second bias corresponding to each linear processing layer according to the reward value of the first update direction corresponding to each linear processing layer, and finally generating a third bias corresponding to each linear processing layer based on the reference bias and the second update direction corresponding to each linear processing layer.

Alternatively, the weight corresponding to each linear output layer may be determined according to the distance between each linear processing layer and the output layer, and then the prize value of the first update direction corresponding to each linear processing layer in the first designated layer may be determined according to the product of the weight and the difference between the second prize value and the first prize value.

Wherein, the linear processing layer with the closer distance to the output layer has larger influence on the preprocessing model, and therefore, the weight corresponding to the linear processing layer with the closer distance to the output layer is larger.

In the embodiment of the disclosure, the bias parameters corresponding to the plurality of linear processing layers can be updated at the same time, and the reward value of the first update direction corresponding to each linear processing layer can be determined according to the distance between each linear processing layer and the output layer, so that the reference bias and the second update direction corresponding to each linear processing layer in the plurality of linear processing layers which are processed at the same time can be determined according to the influence degree of each linear processing layer, and the third bias corresponding to the linear processing layer can be determined more accurately, thereby further improving the efficiency of updating the pre-training model to acquire the target model corresponding to the current task.

Step 105, based on the third bias, the operation of acquiring the update model is returned to be executed until the target bias of the bias parameter of the first designated layer under the sample data is acquired.

The target bias can be more accurate target bias corresponding to the bias parameter of the first designated layer under the sample data, and the target bias is utilized to update the bias parameter of the first designated layer to generate an update model, so that more accurate and reliable processing can be performed on the sample data.

For example, after determining the third bias, the first designated layer in the pre-training model may be masked based on the third bias to obtain the third model, then the sample data is input into the third update model to obtain a third reward value corresponding to the third bias, then a fourth bias is determined according to the third reward value, the third bias, the reference bias and the corresponding reward value, and then the operation of obtaining the update model is performed based on the fourth bias return until the target bias of the bias parameter of the first designated layer under the sample data is obtained, and so on. The present disclosure is not limited in this regard.

Alternatively, the operation of obtaining the prize value may be stopped when the specified number of training steps is reached; alternatively, the operation of obtaining the prize value may be stopped after the specified training period is reached, and then the target bias may be determined from a plurality of biases obtained in the training process, which is not limited in this disclosure.

Optionally, in the case that the difference between the nth reward value corresponding to the nth bias and the adjacent first L reward values is smaller than the first threshold, determining that the nth-L bias is a target bias of the bias parameter of the first designated layer under the sample data, where n is a positive integer, and L is a positive integer smaller than n.

The first threshold may be a preset value, which is used to determine whether the nth reward value is similar to the adjacent first L reward values, and further determine whether to stop the operation of acquiring the update model based on the nth bias. For example, the first threshold may be 0.1,0.2, etc. The present disclosure is not limited in this regard.

For example, in the case where n has a value of 6, the first L prize values adjacent to the 6 th prize value may be obtained, where L may have a value of 1, 2, 3, 4, 5, and so on, for example, where L has a value of 3, then the 3 rd prize value, the 4 th prize value, and the 5 th prize value may be obtained, and if the differences between the 6 th prize value and the 3 rd prize value, the 4 th prize value, and the 5 th prize value are all less than the first threshold, the third bias is determined to be the target bias, which is not limited in the present disclosure.

It will be appreciated that in the case where the difference between the nth prize value corresponding to the nth bias and the adjacent top L prize values is less than the first threshold, this indicates that the bias has iterated to optimum and that the nth bias is similar to, or even identical to, the top L bias, and therefore, the nth-L bias can be determined to be the target bias for the first designated layer under the sample data. Alternatively, any of the n-L bias to the n-th bias may be determined as the target bias. The present disclosure is not limited in this regard.

It can be appreciated that the method for updating the pre-training model provided by the present disclosure may be applicable to any pre-training model updating scenario, for example, may be applied to text classification, generation of question-answer pairs, text understanding, and the like, which is not limited in this disclosure.

The following is a brief description of the update process of the pre-training model provided by the present disclosure, taking as an example the application to text classification.

It can be understood that, first, a first bias is obtained, the first bias is updated based on a second bias after the first updating direction, then bias parameters of a first designated layer in the pre-training model are updated based on the first bias and the second bias respectively to obtain a first updating model and a second updating model, sample data corresponding to a text classification task are input into the first updating model and the second updating model respectively to obtain a first rewarding value corresponding to the first bias determined according to the first updating model and a second rewarding value corresponding to the second bias determined according to the second updating model. And generating a third bias according to the first reward value, the second reward value, the first bias, the second bias and the first updating direction, updating bias parameters of a first designated layer in the pre-training model based on the third bias to obtain a third updating model, inputting sample data corresponding to a text classification task into the third updating model to obtain a third reward value corresponding to the third bias, generating a fourth bias based on the third reward value, a larger reward value in the first reward value and the second reward value, a bias corresponding to the larger reward value and the third bias, and returning to execute operation of obtaining the updating model based on the fourth bias until the target bias of the bias parameters of the first designated layer under the sample data is obtained.

It should be noted that the above examples are only illustrative, and should not be taken as limiting the update process of the pre-training model in the embodiments of the present disclosure.

In the embodiment of the disclosure, a first bias is obtained first, a second bias is updated based on a first updating direction, then bias parameters of a first designated layer in a pre-training model are updated based on the first bias and the second bias respectively to obtain the first updating model and the second updating model, sample data corresponding to a current task are input into the first updating model and the second updating model respectively to obtain a first rewarding value corresponding to the first bias and a second rewarding value corresponding to the second bias, a third bias is generated according to the first rewarding value, the second rewarding value, the first bias, the second bias and the first updating direction, and finally, based on the third bias, the operation of obtaining the updating model is executed again until the target bias of the bias parameters of the first designated layer under the sample data is obtained. Therefore, the target bias corresponding to the bias parameters of the linear processing layer in the pre-training model can be rapidly determined through forward inference, and the related data volume is small, so that the calculation resources are saved, the time for determining the target bias is saved, the efficiency is improved, and the condition is provided for industrialized deployment.

Fig. 2 is a flowchart of a method for updating a pre-training model according to an embodiment of the present disclosure.

As shown in fig. 2, the method for updating the pre-training model may include the following steps:

step 201, a first offset is obtained, and a second offset is obtained after the first offset is updated based on the first updating direction.

Step 202, updating bias parameters of a first designated layer in the pre-training model based on the first bias and the second bias respectively to obtain a first updated model and a second updated model.

Step 203, the sample data corresponding to the current task is input into the first update model and the second update model respectively, so as to obtain a first rewarding value corresponding to the first bias and a second rewarding value corresponding to the second bias.

Step 204, generating a third bias according to the first prize value, the second prize value, the first bias, the second bias, and the first update direction.

Step 205, based on the third bias, the operation of obtaining the update model is performed back until the target bias of the bias parameter of the first designated layer under the sample data is obtained.

The specific implementation manner of step 201 to step 205 may refer to the detailed descriptions in other embodiments of the disclosure, and will not be described in detail herein.

And 206, updating the bias parameters of the first designated layer based on the target bias to acquire a first target model.

It may be appreciated that after determining the target bias, the bias parameters of the first designated layer in the pre-training model may be updated, specifically, the target bias and the bias parameters of the first designated layer may be added to obtain updated bias parameters corresponding to the first designated layer, so that the first target model may be obtained.

Step 207, inputting the verification data corresponding to the current task into the first target model to obtain a first output result corresponding to the verification data.

The verification data may be data for verifying whether a result corresponding to the verification data can be accurately predicted according to the first target model generated by the updated bias parameter corresponding to the first designated layer.

It will be appreciated that the current task is different and the corresponding verification data is also different. For example, if the current task is text classification, the corresponding verification data may be text data. If the current task is text recognition, the corresponding verification data may be image data including text, or the like. The present disclosure is not limited in this regard.

It should be noted that the test data may be the same as the sample data, or the test data may include the sample data, or the test data may not include the sample data. The present disclosure is not limited in this regard.

And step 208, determining a second designated layer of the bias parameter to be updated in the first target model under the condition that the first matching degree between the first output result and the labeling result corresponding to the verification data is smaller than a second threshold value.

Wherein the second threshold may be 80%, 50%, etc., which is not limited by the present disclosure. And under the condition that the first matching degree is smaller than the second threshold value, the generated first target model is poor in performance, so that the updated bias parameters corresponding to the first to top layers can be fixed, and the bias parameters corresponding to other linear processing layers in the first target model can be continuously updated.

Alternatively, a cross entropy loss function may be sampled to determine a first matching degree between the first output result and the labeling result corresponding to the verification data, or a loss value calculation mode matched with the current task may be adopted to determine the first matching degree between the first output result and the labeling result corresponding to the verification data. The present disclosure is not limited in this regard.

Alternatively, the second designated layer may be one or more linearly processed layers adjacent to the first designated layer. Alternatively, the second designated layer may be a linear processing layer having a predetermined interval from the first designated layer.

For example, in the pre-training model, the number of the linear processing layers is 24, the first designated layer is 24 th layer, the preset interval is 2, and the number of the second designated layers is 3, and the second designated layers are 19 th layer, 20 th layer and 21 st layer.

And step 209, the operation of acquiring the first bias and the second bias is performed based on the layer number difference of the second designated layer and the first designated layer in the pre-training model, until the target bias of the bias parameter of the second designated layer under the sample data is determined.

For example, after determining the layer number difference between the first specified layer and the first specified layer in the pre-training model, the second specified layer may be determined, so as to generate a first bias and a second bias corresponding to the second specified layer, then determining a target bias of the second specified layer under the sample data, then updating a bias parameter corresponding to the second specified layer based on the target bias corresponding to the second specified layer to obtain a second target model, then inputting verification data corresponding to the current task into the second target model to obtain a third output result corresponding to the verification data, and determining the third specified layer in the second target model when the first matching degree between the third output result and a labeling result corresponding to the verification data is smaller than a first threshold value, until the corresponding model is determined to be the target model corresponding to the current task when the first matching degree corresponding to the verification data is larger than the first threshold value.

In the embodiment of the disclosure, after obtaining the target bias of the bias parameter of the first designated layer under the sample data, the bias parameter of the first designated layer may be updated based on the target bias to obtain a first target model, and the verification data corresponding to the current task is input into the first target model to obtain a first output result corresponding to the verification data; and under the condition that the first matching degree between the first output result and the labeling result corresponding to the verification data is smaller than a second threshold value, determining a second designated layer with the bias parameters to be updated in the first target model, and finally returning to execute the operation of acquiring the first bias and the second bias based on the layer number difference between the second designated layer and the first designated layer in the pre-training model until the target bias of the bias parameters of the second designated layer under the sample data is determined. Therefore, under the condition that the performance corresponding to the first target model is not ideal, the second designated layer in the first target model is determined, the target bias corresponding to the second designated layer is further determined, the performance of the model corresponding to the current task is gradually improved, forward inference can be performed, the target bias corresponding to the linear processing layer in the pre-training model can be rapidly determined, the related data amount is less, therefore, the calculation resources are saved, the efficiency of the generated target model corresponding to the current task is improved, and conditions are provided for industrialized deployment.

In order to implement the above embodiment, the present disclosure further provides an updating device for a pre-training model.

Fig. 3 is a schematic structural diagram of an apparatus for updating a pre-training model according to an embodiment of the present disclosure.

As shown in fig. 3, the pre-training model updating device 300 includes:

a first obtaining module 310, configured to obtain a first offset, and a second offset after the first offset is updated based on the first update direction;

the second obtaining module 320 is configured to update bias parameters of a first designated layer in the pre-training model based on the first bias and the second bias, so as to obtain a first updated model and a second updated model;

the third obtaining module 330 is configured to input sample data corresponding to a current task into a first update model and a second update model respectively, so as to obtain a first reward value corresponding to a first bias and a second reward value corresponding to a second bias;

a generating module 340, configured to generate a third bias according to the first prize value, the second prize value, the first bias, the second bias, and the first update direction;

the fourth obtaining module 350 is configured to return to performing an operation of obtaining the update model based on the third bias until a target bias of the bias parameter of the first specified layer under the sample data is obtained.

Optionally, the method further comprises:

and the first determining module is used for determining a specified number of linear processing layers closest to the output layer in the pre-training model as a first specified layer.

Optionally, the generating module 340 is specifically configured to:

determining a reward value in a first updating direction according to a difference value between the second reward value and the first reward value under the condition that the value of the designated number is 1;

determining a reference bias and a second updating direction from the first bias and the second bias according to the reward value of the first updating direction;

a third bias is generated based on the reference bias and the second update direction.

Optionally, the generating module 340 is specifically configured to:

determining the reward value of the first updating direction corresponding to each linear processing layer in the first designated layer according to the difference value between the second reward value and the first reward value and the distance between each linear processing layer in the first designated layer and the output layer when the value of the designated number is larger than 1;

determining a reference bias and a second updating direction corresponding to each linear processing layer from the first bias and the second bias corresponding to each linear processing layer according to the rewarding value of the first updating direction corresponding to each linear processing layer;

And generating a third bias corresponding to each linear processing layer based on the reference bias corresponding to each linear processing layer and the second updating direction.

Optionally, the fourth obtaining module 350 is specifically configured to:

and determining that the n-L bias is a target bias of the bias parameters of the first designated layer under the sample data under the condition that the difference value between the n-th reward value corresponding to the n-th bias and the adjacent first L reward values is smaller than a first threshold value, wherein n is a positive integer, and L is a positive integer smaller than n.

Optionally, the method further comprises:

the fifth acquisition module is used for updating the bias parameters of the first designated layer based on the target bias to acquire a first target model;

the sixth acquisition module is used for inputting the verification data corresponding to the current task into the first target model so as to acquire a first output result corresponding to the verification data;

the second determining module is used for determining a second designated layer of the bias parameter to be updated in the first target model under the condition that the first matching degree between the first output result and the labeling result corresponding to the verification data is smaller than a second threshold value;

and the third determining module is used for returning to execute the operation of acquiring the first bias and the second bias based on the layer number difference of the second designated layer and the first designated layer in the pre-training model until the target bias of the bias parameter of the second designated layer under the sample data is determined.

The functions and specific implementation principles of the foregoing modules in the embodiments of the present disclosure may refer to the foregoing method embodiments, and are not repeated herein.

According to the pre-training model updating device, a first bias can be acquired firstly, a second bias is updated based on a first updating direction, then bias parameters of a first designated layer in the pre-training model are updated based on the first bias and the second bias respectively to acquire the first updating model and the second updating model, sample data corresponding to a current task are input into the first updating model and the second updating model respectively to acquire a first rewarding value corresponding to the first bias and a second rewarding value corresponding to the second bias, a third bias is generated according to the first rewarding value, the second rewarding value, the first bias, the second bias and the first updating direction, finally, based on the third bias, the operation of acquiring the updating model is executed again until the target bias of the bias parameters of the first designated layer under the sample data is acquired. Therefore, the target bias corresponding to the bias parameters of the linear processing layer in the pre-training model can be rapidly determined through forward inference, and the related data volume is small, so that the calculation resources are saved, the time for determining the target bias is saved, the efficiency is improved, and the condition is provided for industrialized deployment.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 4 illustrates a schematic block diagram of an example electronic device 400 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 4, the apparatus 400 includes a computing unit 401 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 402 or a computer program loaded from a storage unit 408 into a Random Access Memory (RAM) 403. In RAM 403, various programs and data required for the operation of device 400 may also be stored. The computing unit 401, ROM 402, and RAM 403 are connected to each other by a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

The various components in device 400 are connected to I/O interface 404, including: an input unit 406 such as a keyboard, a mouse, etc.; an output unit 407 such as various types of displays, speakers, and the like; a storage unit 408, such as a magnetic disk, optical disk, etc.; and a communication unit 409 such as a network card, modem, wireless communication transceiver, etc. The communication unit 409 allows the device 400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 401 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 401 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 401 performs the respective methods and processes described above, for example, the update method of the pre-training model. For example, in some embodiments, the method of updating the pre-training model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 400 via the ROM 402 and/or the communication unit 409. When the computer program is loaded into RAM 403 and executed by computing unit 401, one or more steps of the method of updating a pre-trained model described above may be performed. Alternatively, in other embodiments, the computing unit 401 may be configured to perform the update method of the pre-training model in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application specific integrated circuits (AS ics), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Vi rtua l Pr i vate Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

According to the technical scheme, a first bias can be acquired firstly, a second bias is updated by the first bias based on a first updating direction, then bias parameters of a first designated layer in a pre-training model are updated based on the first bias and the second bias respectively to acquire the first updating model and the second updating model, sample data corresponding to a current task are input into the first updating model and the second updating model respectively to acquire a first rewarding value corresponding to the first bias and a second rewarding value corresponding to the second bias, a third bias is generated according to the first rewarding value, the second rewarding value, the first bias, the second bias and the first updating direction, and finally the operation of acquiring the updating model is executed based on the third bias until the target bias of the bias parameters of the first designated layer under the sample data is acquired. Therefore, the target bias corresponding to the bias parameters of the linear processing layer in the pre-training model can be rapidly determined through forward inference, and the related data volume is small, so that the calculation resources are saved, the time for determining the target bias is saved, the efficiency is improved, and the condition is provided for industrialized deployment.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method of updating a pre-training model, comprising:

respectively inputting sample data corresponding to a current task into the first updating model and the second updating model to obtain a first rewarding value corresponding to the first offset and a second rewarding value corresponding to the second offset, wherein the sample data is one or more of text data, image data and audio data;

2. The method of claim 1, further comprising:

and determining a specified number of linear processing layers closest to an output layer in the pre-training model as the first specified layer.

3. The method of claim 2, wherein the generating a third bias based on the first prize value, the second prize value, the first bias, the second bias, and the first update direction comprises:

determining the reward value of the first updating direction according to the difference value between the second reward value and the first reward value under the condition that the value of the appointed number is 1;

determining a reference bias and a second updating direction from the first bias and the second bias according to the bonus value of the first updating direction;

the third bias is generated based on the reference bias and the second update direction.

4. The method of claim 3, wherein the generating a third bias based on the first prize value, the second prize value, the first bias, the second bias, and the first update direction comprises:

determining a reward value of a first updating direction corresponding to each linear processing layer in the first designated layer according to a difference value between the second reward value and the first reward value and a distance between each linear processing layer in the first designated layer and an output layer when the designated number of values is larger than 1;

5. The method of claim 1, wherein the returning to performing the operation of obtaining the updated model based on the third bias until obtaining a target bias for the bias parameters of the first specified layer under the sample data comprises:

And determining that the n-L bias is a target bias of the bias parameter of the first designated layer under the sample data under the condition that the difference value between the n-th reward value corresponding to the n-th bias and the adjacent first L reward values is smaller than a first threshold value, wherein n is a positive integer, and L is a positive integer smaller than n.

6. The method of any of claims 1-5, wherein after the returning to perform the operation of obtaining the updated model based on the third bias until obtaining the target bias for the bias parameters of the first specified layer under the sample data, further comprising:

updating the bias parameters of the first designated layer based on the target bias to obtain a first target model;

inputting the verification data corresponding to the current task into the first target model to obtain a first output result corresponding to the verification data;

determining a second designated layer of the bias parameter to be updated in the first target model under the condition that a first matching degree between the first output result and the labeling result corresponding to the verification data is smaller than a second threshold value;

and returning to execute the operation of acquiring the first bias and the second bias based on the layer number difference of the second designated layer and the first designated layer in the pre-training model until determining the target bias of the bias parameter of the second designated layer under the sample data.

7. An updating device of a pre-training model, comprising:

the third acquisition module is used for respectively inputting sample data corresponding to the current task into the first updating model and the second updating model to acquire a first rewarding value corresponding to the first bias and a second rewarding value corresponding to the second bias, wherein the sample data is one or more of text data, image data and audio data;

8. The apparatus of claim 7, further comprising:

and the first determining module is used for determining a specified number of linear processing layers closest to the output layer in the pre-training model as the first specified layer.

9. The apparatus of claim 8, wherein the generating module is specifically configured to:

10. The apparatus of claim 9, wherein the generating module is specifically configured to:

11. The apparatus of claim 7, wherein the fourth acquisition module is specifically configured to:

12. The apparatus of any of claims 7-11, further comprising:

a fifth obtaining module, configured to update bias parameters of the first designated layer based on the target bias, to obtain a first target model;

a sixth obtaining module, configured to input verification data corresponding to the current task into the first target model, so as to obtain a first output result corresponding to the verification data;

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-6.