CN112434552A

CN112434552A - Neural network model adjusting method, device, equipment and storage medium

Info

Publication number: CN112434552A
Application number: CN202011092190.6A
Authority: CN
Inventors: 熊凯
Original assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Current assignee: Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority date: 2020-10-13
Filing date: 2020-10-13
Publication date: 2021-03-02

Abstract

The embodiment of the application discloses a method, a device, equipment and a storage medium for adjusting a neural network model, which relate to the technical field of neural networks and comprise the following steps: inputting target amount of sample data into the neural network model, updating model parameters of the neural network model according to a first output result of the neural network model, wherein the model parameters before updating are first parameters, and the model parameters after updating are second parameters; and inputting the target amount of application data into the updated neural network model, and correcting the model parameters of the neural network model according to the second output result of the updated neural network model and the second parameters, wherein the model parameters before correction are the first parameters, and the model parameters after correction are the third parameters. By adopting the method, the technical problem that the performance of the deep learning model cannot be ensured when the deep learning model is applied to different scenes in the prior art can be solved.

Description

Neural network model adjusting method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of neural networks, in particular to a method, a device, equipment and a storage medium for adjusting a neural network model.

Background

Deep learning, which is an important field in machine learning, has been widely applied to work and life of people, for example, in the field of face recognition, a face recognition technology based on deep learning has been widely applied to scenes such as terminal face unlocking, face check-in, face payment, and the like. Generally, in order to implement a face recognition technology based on deep learning, a deep learning model needs to be constructed, and the deep learning model is trained through a large amount of training data, so that the face recognition is performed by the trained deep learning model.

In the process of implementing the invention, the inventor finds that the prior art has the following defects: when the deep learning model is trained, because the acquisition and labeling of training data require high cost, the open source data set is adopted as the training data set in the prior art, so that the trained deep learning model cannot be applied to scenes which are not covered by the open source data set. For example, in the face recognition technology, the open source data set MS1M is usually used as the training data set, but the proportion of asian face pictures in MS1M is small, so when the trained deep learning model is applied in an asian face recognition scene, the accuracy of the deep learning model is not high.

In summary, when the deep learning model is applied to different scenes (especially, scenes not covered by the source data set), how to ensure the performance of the deep learning model becomes a technical problem that needs to be solved urgently.

Disclosure of Invention

The embodiment of the application provides a neural network model adjusting method, device, equipment and storage medium, and aims to solve the technical problem that the performance of a deep learning model cannot be guaranteed when the deep learning model is applied to different scenes in the prior art.

In a first aspect, an embodiment of the present application provides a neural network model adjusting method, including:

inputting target amount of sample data into a neural network model, updating model parameters of the neural network model according to a first output result of the neural network model, wherein the model parameters are first parameters before updating, and the model parameters are second parameters after updating;

inputting a target amount of application data into the updated neural network model, and correcting the model parameters of the neural network model according to the second output result of the updated neural network model and the second parameters, wherein the model parameters before correction are first parameters, and the model parameters after correction are third parameters.

Further, the neural network model comprises a backbone network model and a head network model, and the backbone network model is used for extracting a feature vector of input data of the neural network model; the head network model is used for obtaining an output result of the neural network model according to the feature vector.

Further, the backbone network model comprises a first backbone network model, the head network model comprises a first head network model and a second head network model,

when the target amount of sample data is input to the first backbone network model, the first backbone network model is used for outputting a first feature vector, and when the target amount of application data is input to the first backbone network model, the first backbone network model is used for outputting a second feature vector;

the first head network model is used for obtaining the first output result according to the first feature vector;

the second head network model is used for obtaining the second output result according to the second feature vector.

Further, the first parameters include first initial parameters of the first backbone network model and second initial parameters of the first head network model, and the second parameters include first false update parameters of the first backbone network model and first true update parameters of the first head network model;

the inputting the target amount of sample data into a neural network model and updating the model parameters of the neural network model according to the first output result of the neural network model includes:

inputting the target amount of sample data into a neural network model to obtain a corresponding first output result;

calculating a first loss function of the neural network model according to the first output result;

determining a first gradient of the first initial parameter and a second gradient of the second initial parameter according to the first loss function;

updating the first initial parameter to the first false update parameter according to the first gradient and a first learning rate, and updating the second initial parameter to the first true update parameter according to the second gradient and the first learning rate.

Further, the first parameters further include third initial parameters of the second head network model, and the third parameters include second true update parameters of the first backbone network model and third true update parameters of the second head network model;

the inputting the target amount of application data into the updated neural network model, and correcting the model parameters of the neural network model according to the second output result of the updated neural network model and the second parameters includes:

inputting the target amount of application data into the updated neural network model to obtain a corresponding second output result;

calculating a second loss function of the updated neural network model according to the second output result;

determining a third gradient of the first false update parameter and a fourth gradient of the third initial parameter according to the second loss function;

and correcting the first initial parameter into the second true update parameter according to the first gradient, the third gradient and a second learning rate, and correcting the third initial parameter into the third true update parameter according to the fourth gradient and the second learning rate.

Further, the backbone network model includes a second backbone network model, and the head network model includes a third head network model.

Further, the first parameters include a fourth initial parameter of the second backbone network model and a fifth initial parameter of the third head network model, and the second parameters include a second false update parameter of the second backbone network model and a third false update parameter of the third head network model;

calculating a third loss function of the neural network model according to the first output result;

determining a fifth gradient of the fourth initial parameter and a sixth gradient of the fifth initial parameter according to the third loss function;

updating the fourth initial parameter to the second false update parameter according to the fifth gradient and a third learning rate, and updating the fifth initial parameter to the third false update parameter according to the sixth gradient and the third learning rate.

Further, the third parameter includes a fourth true update parameter of the second backbone network model and a fifth true update parameter of the third head network model;

calculating a fourth loss function of the updated neural network model according to the second output result;

determining a seventh gradient of the second false update parameter and an eighth gradient of the third false update parameter according to the fourth loss function;

and correcting the fourth initial parameter to the fourth true update parameter according to the fifth gradient, the seventh gradient and a fourth learning rate, and correcting the fifth initial parameter to the fifth true update parameter according to the sixth gradient, the eighth gradient and the fourth learning rate.

Furthermore, the sample data has k groups, each group of sample data has target amount of sample data, k is more than 1,

performing iterative updating on the model parameters of the neural network model for k times according to k groups of sample data, wherein the model parameters are the first parameters before the iterative updating for k times, and the model parameters are the first intermediate parameters after the iterative updating for k times;

updating the model parameters of the neural network model from the first parameters to second parameters according to the first intermediate parameters and a fifth learning rate;

wherein, the one-time iteration updating process comprises the following steps:

inputting a group of sample data to the neural network model to obtain a first output result;

constructing a fifth loss function of the neural network model according to the first output result;

determining a ninth gradient of the model parameter before the iteration is updated according to the fifth loss function;

updating the model parameters according to the ninth gradient and the fifth learning rate.

Furthermore, the application data has k groups, each group of application data has a target amount of application data,

performing iterative correction on the updated model parameters of the neural network model for k times according to k groups of application data, wherein the model parameters are the second parameters before the iterative correction for k times, and the model parameters are the second intermediate parameters after the iterative correction for k times;

correcting the model parameter of the neural network model from the first parameter to a third parameter according to the second intermediate parameter, a sixth learning rate and the second parameter;

wherein, the one-time iterative correction process comprises the following steps:

inputting a set of application data to the updated neural network model to obtain a second output result;

constructing a sixth loss function of the updated neural network model according to the second output result;

determining a tenth gradient of the model parameter before the iteration correction according to the sixth loss function;

and correcting the model parameters according to the tenth gradient and the sixth learning rate.

Further, the neural network model comprises a backbone network model;

inputting target amount of sample data into the backbone network model to obtain a first output result output by the backbone network model, wherein the first output result is a third eigenvector;

constructing a seventh loss function of the backbone network model according to the first output result;

determining an eleventh gradient of the first parameter according to the seventh loss function;

updating the model parameters from the first parameters to second parameters according to the eleventh gradient and a seventh learning rate;

inputting a target amount of application data into the updated backbone network model to obtain a second output result output by the backbone network model, wherein the second output result is a fourth feature vector;

constructing an eighth loss function of the updated backbone network model according to the second output result;

determining a twelfth gradient of the second parameter according to the eighth loss function;

updating the model parameter from the first parameter to a third parameter according to the twelfth gradient, the eleventh gradient, and the eighth learning rate.

Further, the target amount of sample data is input to the neural network model through a first data loader;

the target amount of application data is input to the updated neural network model through a second data loader.

In a second aspect, an embodiment of the present application further provides a neural network model adjusting apparatus, including:

the updating module is used for inputting the sample data of a target quantity into the neural network model and updating the model parameters of the neural network model according to the first output result of the neural network model, wherein the model parameters are first parameters before updating, and the model parameters are second parameters after updating;

and the correction module is used for inputting the target amount of application data into the updated neural network model, correcting the model parameters of the neural network model according to the second output result of the updated neural network model and the second parameters, wherein the model parameters before correction are first parameters, and the model parameters after correction are third parameters.

In a third aspect, an embodiment of the present application further provides a neural network model adjusting apparatus, including:

one or more processors;

a memory for storing one or more programs;

when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the neural network model adjusting method according to the first aspect.

In a fourth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the neural network model adjusting method according to the first aspect.

According to the method, the device, the equipment and the storage medium for adjusting the neural network model, the model parameters of the neural network model are updated from the first parameters to the second parameters according to the first output result of the neural network model processing sample data, and then the model parameters of the neural network model are corrected from the first parameters to the third parameters according to the second output result and the second parameters of the neural network model processing application data, wherein the application data and the set quantity are both target quantities. And taking a sample scene corresponding to the sample data and an application scene corresponding to the application data as related tasks when the neural network model is trained, and breaking the limit of constructing a large number of related tasks when the meta-learning is performed. Furthermore, the model parameters are updated by using the sample data, the model parameters are corrected based on the application data, and when the model parameters are corrected, not only is the second output result of the application data considered, but also the first output result of the sample data is combined, so that the neural network model is suitable for the sample scene and the application scene simultaneously by the optimization mode, the technical problems of forgetting and overfitting in the traditional fine tuning are effectively avoided, and the generalization capability of the neural network model is better. Generally speaking, the quantity of sample data is far greater than that of application data, and then, when the neural network model is trained based on the same quantity of sample data and application data, only part of the sample data is used for training the neural network model, that is, only the application data is up-sampled, so that the problem of data imbalance can be avoided, the performance of an original data domain (sample data) is maintained as much as possible during the training of the neural network model, the performance improvement of a new data domain (application data) is concerned, and the training efficiency and the training speed of the neural network model are ensured.

Drawings

Fig. 1 is a flowchart of a neural network model adjustment method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a neural network model adjustment apparatus according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a neural network model adjustment apparatus according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are for purposes of illustration and not limitation. It should be further noted that, for the convenience of description, only some of the structures related to the present application are shown in the drawings, not all of the structures.

It is to be noted that, in this document, relational terms such as first and second are used solely to distinguish one entity or action or object from another entity or action or object without necessarily requiring or implying any actual such relationship or order between such entities or actions or objects. For example, "first" and "second" of the first output result and the second output result are used to distinguish two different output results.

In the prior art, when a deep learning model is applied to different scenes, in order to ensure the performance of the deep learning model, fine-tuning (fine-tuning) may be performed on the deep learning model by using training data in a current application scene, for example, a scene a corresponds to an open source data set, after training of the deep learning model is completed based on the open source data set, if the deep learning model is used in a specific scene B, the deep learning model may be fine-tuned by using the training data acquired in the scene B, so that the deep learning model is suitable for the scene B. However, when the above scheme is adopted for fine tuning, forgetting (deforming) and overfitting (overfitting) problems occur. Wherein, forgetting means that after fine tuning, the performance of the deep learning model in the scene a becomes poor (e.g. the recognition accuracy is low). The overfitting means that after fine tuning, the deep learning model has good performance only in the scene B and cannot be generalized. That is, after fine tuning, although the deep learning model adapts to the B scene, it cannot be applied to the a scene any more. In order to ensure the performance of the deep learning model, the deep learning model can be trained by using the open source data set in the scene A and the training data acquired in the scene B together, that is, traversing all the data for a plurality of times to realize the training of the deep learning model. Although the above joint training can solve the problems of forgetting and overfitting, the number of data in the source data set is far greater than the number of training data collected in the B scene, so the problem of unbalanced data is caused in the joint training, and the training difficulty of the deep learning model is increased.

Therefore, the embodiment of the present application provides a neural network model adjusting method, so that the trained neural network model is suitable for the a scene and the B scene at the same time, and the problems of forgetting, overfitting and high training difficulty can be avoided.

Specifically, the neural network model adjusting method provided in the embodiment of the present application may be executed by a neural network model adjusting device, the neural network model adjusting device may be implemented in a software and/or hardware manner, and the neural network model adjusting device may be formed by two or more physical entities or may be formed by one physical entity. For example, the neural network model adjusting device may be an intelligent device with data operation and analysis capabilities, such as a computer, a mobile phone, a tablet computer, or an interactive smart tablet.

The neural network model adjusting method provided by the embodiment of the application applies the meta-learning to the training process of the neural network model. The meta-learning refers to learning a large number of related tasks, so that a meta-learning model can be quickly adapted to a new task (equivalent to a new scene) based on a small amount of data in an application. Namely, meta learning can guide the learning of a new task by using the past knowledge and experience, and has the learning ability. The key point of meta-learning is to construct a large number of related tasks, however, in practical applications, such as in a face recognition technology, it is difficult to construct a large number of related tasks (because the amount of training data used for face recognition is large, and the practical application scenario of face recognition is limited), therefore, in this embodiment, when meta-learning is used, the limitation of constructing a large number of related tasks is broken, and only a scene a and a scene B are used as two related tasks, so as to obtain a neural network model suitable for the scene a and the scene B, which specifically adopts the following technical means:

fig. 1 is a flowchart of a neural network model adjustment method provided in an embodiment of the present application, and referring to fig. 1, the neural network model adjustment method specifically includes:

step 110, inputting the target amount of sample data into the neural network model, and updating the model parameters of the neural network model according to the first output result of the neural network model, wherein the model parameters before updating are first parameters, and the model parameters after updating are second parameters.

The neural network model refers to a deep learning model constructed by using a neural network, the type and structure of the neural network used in the neural network model can be set according to actual conditions, and the embodiment does not limit the neural network model. Further, the model parameters of the neural network model include weights, and optionally include bias terms and the like. It can be understood that the initial model parameters are adopted when the neural network model is not trained, and the initial model parameters can be set parameters for people, and the model parameters are changed along with the generation of the training process so as to improve the accuracy of the neural network model and further improve the performance of the neural network model.

In the embodiment, the sample data refers to data selected in the open source data set, that is, the sample data is data suitable for the a scene constructed based on samples in the open source data set, and for convenience of description, the a scene is referred to as a sample scene in the embodiment. It can be understood that the open source data set contains a large amount of sample data of the same type, for example, when the neural network model is used for face recognition, a corresponding open source data set contains a large amount of face images. Typically, a target amount of sample data is selected from the open source data set, where the target amount is equal to the amount of data input by a batch (batch) of the neural network model and is much smaller than the amount of sample data contained in the open source data set, and in this case, the target amount of sample data may also be understood as batch sample data. In the embodiment, the number of data input by the neural network model in one batch is recorded as m, that is, the target number is m, m is greater than or equal to 2, and the specific value can be set according to the actual situation. Optionally, when the target number of sample data is selected in the open source data set, m sample data may be randomly selected in the open source data set, or m sample data may be selected in the open source data set according to the sequence of the sample data, or m sample data may be selected in the open source data set by using another method.

Specifically, the target amount of sample data is input into the neural network model, and at this time, the neural network model has the task of learning the target amount of sample data, and then, the model parameters of the neural network model are updated according to the output result of the neural network model, so as to achieve the purpose of training the neural network model. In the embodiment, the result output when the neural network model processes the sample data is recorded as a first output result. It can be understood that the neural network models have different functions, and the first output result has different meanings, for example, the function of the neural network model is face recognition (face classification), and then the first output result is a logits value predicted by a class, wherein the logits value can be understood as an unnormalized probability that corresponding sample data belongs to the corresponding class, and a multi-class cross entropy loss of the sample data can be calculated through the logits value, wherein the multi-class cross entropy loss can also be understood as cross entropy, which is a loss function of the neural network model when processing a multi-class task. For another example, if the function of the neural network model is face recognition, the first output result is a feature vector of the sample data. It can be understood that the first output result is a set, which includes an output result obtained after the neural network model processes each sample data input this time.

Furthermore, the model parameters are updated according to the first output result, so that the performance of the updated neural network model is superior to that of the neural network model before updating. When the model parameters are updated, a Stochastic Gradient Descent (SGD) algorithm is used, that is, a loss function of the neural network model is constructed according to the first output result, and the type of the loss function can be set according to actual conditions. After the loss function is calculated, the partial derivative of the model parameter is calculated according to the loss function to obtain the gradient of the model parameter, and then the model parameter is updated according to the gradient. It can be understood that when the difference between the first output result and the real result corresponding to the input sample data is large, the gradient of the model parameter is large, and the updating amplitude of the model parameter is large, so that the first output result output by the updated neural network model is closer to the real result. In the embodiment, the model parameter before the update is recorded as a first parameter, and the updated model parameter is recorded as a second parameter. Optionally, when the model parameter is updated, a learning rate is added, where the learning rate may also be understood as a learning rate, and is used to control a training speed of the neural network model, where too small a learning rate may result in slow learning of the neural network model, and many times of training when a loss function converges, and too large a learning rate may result in oscillation of a learning process of the neural network model, and therefore, in the embodiment, an appropriate learning rate may be selected in combination with an actual situation of the neural network model, and the learning rate may be adjusted in combination with the actual situation.

It should be noted that the above-mentioned stochastic gradient descent algorithm is only an optional manner, and in practical applications, an Adaptive Moment Estimation (Adam) algorithm, a rectifiedadam (radam) algorithm, and the like may also be used to update the model parameters.

Optionally, in a training process, a set of sample data may be input, or multiple sets of sample data may be input, where each set of sample data has a target number of sample data. When a group of sample data is input, the model parameters are directly updated according to the corresponding first output result, namely the first parameters are updated to the second parameters. When a plurality of groups of sample data are input, the model parameters can be updated iteratively based on the plurality of groups of sample data, one group of sample data corresponds to one iteration update, at this time, the current model parameters can be updated to be one intermediate parameter in each iteration update process, and after the iteration update is completed, the second parameters are obtained according to the finally obtained intermediate parameters and the first parameters before the iteration update, and the model parameters are updated to be the second parameters. The one-time iterative updating process includes inputting a group of sample data into the neural network model to obtain a corresponding first output result, and then iteratively updating the model parameters according to the first output result, wherein the updating mode is the same as the above mentioned model parameter updating mode (SGD, etc.). After the iterative update, the model parameter is changed into an intermediate parameter, then a group of sample data is input again to perform the iterative update again, and at the moment, the model parameter is changed from one intermediate parameter into another intermediate parameter. After the iterative updating is completed according to the above mode, the difference degree between the first parameter and the final intermediate parameter is determined, the difference degree can reflect the modification amplitude of the iterative updating to the model parameter, and then, on the basis of the difference degree, the learning rate is introduced to update the first parameter so as to obtain the second parameter. I.e. the model parameters of the neural network model are modified to the second parameters.

It should be noted that the process of training the neural network model by the sample data of the target number can also be understood as inner layer optimization of the neural network model, and the neural network model can be suitable for the sample scene corresponding to the sample data through the inner layer optimization.

And 120, inputting the target amount of application data into the updated neural network model, and correcting the model parameters of the neural network model according to the second output result of the updated neural network model and the second parameters, wherein the model parameters before correction are the first parameters, and the model parameters after correction are the third parameters.

In the embodiment, when the neural network model is trained, a training process refers to a process of modifying (updating and correcting) model parameters of the neural network model by using a target amount of sample data and a target amount of application data. The application data refers to data collected based on another scene (B scene), and the B scene may also be understood as a specific application scene. For example, the application scenario specifically includes performing face recognition on a certain cell, and the corresponding application data is to acquire a face picture of a user in the cell. Thereafter, a target amount of application data is selected among the collected total application data. The embodiment of the selection method of the application data is not limited, and the target amount is the amount of data input by one batch (batch) of the neural network model, so that the target amount of application data can also be understood as batch application data. In general, the open source data set contains a much larger amount of sample data than the amount of application data collected.

Specifically, the current neural network model is an updated neural network model, the model parameters of the updated neural network model are second parameters, the target amount of application data is input into the updated neural network model, and then the model parameters of the neural network model are updated according to the corresponding output results, so that the purpose of training the neural network model is achieved. In the embodiment, the result output when the neural network model processes the application data is recorded as the second output result. It can be understood that the learning manner and meaning of the second output result are the same as those of the first output result, and are not described herein again.

After that, the model parameters of the neural network model (before updating) are corrected based on the second output result, which may also be understood as updating the model parameters. When the model parameters are corrected, a second parameter obtained according to the sample data is considered, and a learning rate is introduced, so that the trained neural network model is adaptive to a sample scene corresponding to the sample data and an application scene corresponding to the application data. Typically, when the model parameters are modified, the same algorithm is adopted as in the foregoing steps, and in the embodiment, the random gradient descent algorithm is used as an example for description, and at this time, the process of modifying the model parameters specifically includes: and calculating a loss function according to the second output result, wherein the type of the loss function is the same as that of the loss function in the step, calculating the gradient of a second parameter according to the loss parameter, and correcting the first parameter by combining the gradient of the first parameter calculated in the step and the learning rate. Introducing a hyper-parameter with a numerical range between 0 and 1, respectively taking the result of subtracting the hyper-parameter from 1 as the weights of two gradients to carry out weighted summation on the two weights, and then correcting the first parameter into a third parameter by combining the result of weighted summation and the learning rate. Optionally, the learning rate introduced when processing the application data and the learning rate introduced when processing the sample data may be the same or different, and the embodiment does not limit this. It can be understood that, for the neural network model, there are only two related tasks in the training process, namely a task of learning application data in an application scene and a task of learning sample data in a sample scene.

For example, the number of sets of sample data is equal to the number of sets of application data, that is, when only one set of sample data is input in one training process in the foregoing steps, one set of application data is also input in this step, when multiple sets of sample data are input in one training process in the foregoing steps, multiple sets of application data are also input in this step, and the processing manner of the neural network model on multiple sets of application data is the same as the processing manner on multiple sets of sample data.

In practical applications, the neural network model may include a plurality of submodels, and in some cases, the submodel for processing the sample data and the submodel for processing the application data are not completely the same. For example, the neural network model includes a first sub-model, a second sub-model and a third sub-model, wherein the first sub-model is used for outputting the feature vectors of the sample data and the application data, the second sub-model is used for outputting the logits values corresponding to the sample data, and the third sub-model is used for outputting the logits values corresponding to the application data. Then, after the target amount of sample data is input into the neural network model, the model parameters of the first sub-model and the second sub-model may be updated. After the target amount of application data is input into the neural network model, the model parameters of the first sub-model and the third sub-model can be modified, at this time, the processing mode of the first sub-model is the same as the processing mode, namely, the model parameters are updated from the first parameters to the second parameters, and then the model parameters are modified from the first parameters to the third parameters by combining the second parameters. For the second sub-model, the updated second parameter can be directly used as the modified third parameter. The third submodel and the second submodel have the same processing mode, and the updated model parameters are directly used as third parameters.

It should be noted that the process of training the neural network model by the target amount of application data may also be understood as outer layer optimization of the neural network model, and the neural network model may be adapted to the application scenario corresponding to the application data by the outer layer optimization.

After the model parameters are corrected to the third parameters, one training process can be considered to be finished. Then, the training process is started again, i.e. the process returns to step 110, at this time, the third parameter obtained in the training process can be used as the first parameter of the current training process. The training is repeated for the neural network model (i.e. the inner layer optimization and the outer layer optimization are iterated) until the loss function of the neural network model converges or a termination condition is reached (e.g. a preset training number is reached or each application data is traversed N (N ≧ 1)). Optionally, after the training is finished, the target amount of sample data and the target amount of application data may be constructed again to test the trained neural network model, if the output result of the neural network model is stable and the accuracy reaches the expectation, it is indicated that the neural network model has been optimized well, and on the premise that the performance of the sample scene is basically unchanged, the neural network model also has good performance in the application scene and can be put into use, otherwise, the neural network model may be trained continuously until the neural network model passes the test. Generally, when the training of the neural network model is finished according to the technical scheme provided above, all the application data has been traversed (input into the neural network model) at least once, and only part of the sample data is used for training the neural network model.

Optionally, in the foregoing process, when the target amount of sample data is input into the neural network model, the target amount of sample data is input into the neural network model through the first data loader. When the target amount of application data is input into the updated neural network model, the target amount of application data is input into the updated neural network model through the second data loader. The data loader can be understood as a program for loading data into the neural network model, the performance of the data loader affects the performance of the neural network model, and if the data loader cannot accurately load the data into the neural network model, the neural network model cannot learn correct data, and the accuracy of the neural network model cannot be guaranteed. In the embodiment, sample data and application data are loaded by adopting different data loaders, at this time, the data loader for loading the sample data is marked as a first data loader, the data loader for loading the application data is marked as a second data loader, and the accuracy of data loading is ensured by the two data loaders.

The above technical solution is that the model parameter of the neural network model is updated from the first parameter to the second parameter according to the first output result of the neural network model processing the sample data, and then the model parameter of the neural network model is modified from the first parameter to the third parameter according to the second output result and the second parameter of the neural network model processing application data, wherein the application data and the set number are both target numbers, so as to solve the technical problem that the performance of the deep learning model cannot be guaranteed when the deep learning model is applied to different scenes in the prior art. And taking a sample scene corresponding to the sample data and an application scene corresponding to the application data as related tasks when the neural network model is trained, and breaking the limit of constructing a large number of related tasks when the meta-learning is performed. Furthermore, the model parameters are updated by using the sample data, the model parameters are corrected based on the application data, and when the model parameters are finally corrected, not only is the second output result of the application data considered, but also the first output result of the sample data is combined, so that the neural network model is suitable for the sample scene and the application scene simultaneously by the optimization mode, the technical problems of forgetting and overfitting in the traditional fine tuning are effectively avoided, and the generalization capability of the neural network model is better. Generally speaking, the quantity of sample data is far greater than that of application data, and then, when the neural network model is trained based on the same quantity of sample data and application data, only part of the sample data is used for training the neural network model, that is, only the application data is up-sampled, so that the problem of data imbalance can be avoided, the performance of an original data domain (sample data) is maintained as much as possible during the training of the neural network model, the performance improvement of a new data domain (application data) is concerned, and the training efficiency and the training speed of the neural network model are ensured.

On the basis of the above embodiment, when the neural network models adopt different structures, the rules (such as loss function types, update rules, and the like) for adjusting the model parameters according to the output results are different.

In one embodiment, the neural network model comprises a trunk network model and a head network model, wherein the trunk network model is used for extracting a feature vector of input data of the neural network model; the head network model is used for obtaining an output result of the neural network model according to the feature vector.

Illustratively, the neural network model is used for multi-classification face recognition as an example, and in this case, the neural network model includes a trunk network model and a head network model. The Backbone network model is a model formed by a Backbone network (Backbone), and is used for extracting the characteristics of data input into the neural network model and outputting characteristic vectors, and during face recognition, the Backbone network model is used for extracting the characteristics of a face image and outputting the characteristic vectors. The type of the backbone network model is not limited, for example, the backbone network model may be a residual network, a face recognition network (MobileFaceNet) running on a mobile terminal, or the like. When the sample data or the application data is input into the neural network model, the sample data or the application data is specifically input into the backbone network model, and the features of the data are extracted by the backbone network model to output feature vectors. In the embodiment, the number of the backbone network models is one, and the sample scene and the application scene share the backbone network models, wherein the sharing means that the sample data and the application data are both input into the backbone network models, so that the backbone network models learn the data in the two scenes, and the backbone network models are adapted to the two scenes simultaneously.

The Head network model refers to a model formed by a Head network (Head), and the type embodiment of the Head network model is not limited. The head network model is used for inputting the feature vector output by the backbone network model and obtaining an output result according to the feature vector, wherein in the embodiment, the output result of the head network model is a logits value identified by the human face. Optionally, the two scenes may share the head network model, that is, the number of the head network models is one, and the head network models are used for learning the feature vectors of the two scenes so as to be adapted to the two scenes simultaneously. Optionally, each scene corresponds to one head network model, at this time, each head network model only learns the feature vector of one scene to adapt to the corresponding scene, in this case, in one training process, the model parameters of the backbone network model are adjusted twice according to the first output result and the second output result, and the model parameters of the head network model are adjusted only once according to the output result of the corresponding scene.

As can be seen from the above, when the number of the head network models is different, the adjustment times of the model parameters are different, and thus, the adjustment modes are different due to the different adjustment times. At this time, according to the number of the head network models, the following scheme is included in the embodiment:

according to the first scheme, the backbone network model comprises a first backbone network model, the head network model comprises a first head network model and a second head network model, when sample data of a target quantity is input to the first backbone network model, the first backbone network model is used for outputting a first feature vector, and when application data of the target quantity is input to the first backbone network model, the first backbone network model is used for outputting a second feature vector; the first head network model is used for obtaining a first output result according to the first feature vector; the second head network model is used for obtaining a second output result according to the second feature vector.

In the scheme, the number of the backbone network models is one and is recorded as a first backbone network model, and the number of the head network models is two and is respectively recorded as a first head network model and a second head network model. The first head network model corresponds to a sample scene and the second head network model corresponds to an application scene.

Specifically, the target amount of sample data is input to the first backbone network model, and the first backbone network model outputs the corresponding feature vector. It will be appreciated that the first feature vector is a set containing the feature vectors of each sample data input. In the embodiment, the feature vectors corresponding to the target amount of application data are recorded as second feature vectors, and it can be understood that the second feature vectors are a set including the feature vectors of each input application data. And then, inputting a second feature vector to the second head network model, so that the second head network model outputs a corresponding second output result according to the second feature vector. Optionally, the first backbone network model may determine, through a data loader used when inputting data, that currently input data is sample data or application data, and then input the obtained feature vector to the corresponding head network model. Optionally, different labels are used for the sample data and the application data, so that the first backbone network model distinguishes the input data, and the obtained feature vector is input to the corresponding head network model.

Typically, when updating the model parameters of the neural network model according to the first output result, the first parameters include first initial parameters of the first backbone network model and second initial parameters of the first head network model, and the second parameters include first false update parameters of the first backbone network model and first true update parameters of the first head network model, and accordingly, inputting the target amount of sample data into the neural network model, and updating the model parameters of the neural network model according to the first output result of the neural network model includes: inputting the target amount of sample data into a neural network model to obtain a corresponding first output result; calculating a first loss function of the neural network model according to the first output result; determining a first gradient of the first initial parameter and a second gradient of the second initial parameter according to the first loss function; the first initial parameter is updated to a first false update parameter according to the first gradient and the first learning rate, and the second initial parameter is updated to a first true update parameter according to the second gradient and the first learning rate.

Specifically, since the neural network model includes the first backbone network model and the first head network model, before updating the model parameters, the first parameters should include the model parameters of the first backbone network model and the model parameters of the first head network model, where the model parameters of the first backbone network model are recorded as first initial parameters and expressed as θ, and the model parameters of the first head network model are recorded as second initial parameters and expressed as θ

Illustratively, m sample data are selected from the open source data set and denoted as { x }₁、x₂、……、x_mAnd then, loading m sample data into the first backbone network model in batches through a first data loader, wherein a function of the first backbone network model is recorded as f_θ(. h), the first feature vector output by the first backbone network model is { f_θ(x₁)、f_θ(x₂)、……、f_θ(x_m) After that, the first feature vector f is processed_θ(x₁)、f_θ(x₂)、……、f_θ(x_m) Is input to a first head network model, wherein a function of the first head network model is recorded as

First output result output by the first head network model

Further, after the first output result is obtained, a loss function of the neural network model is constructed according to the first output result, and in the embodiment, the loss function is recorded as the first loss function. Wherein the first loss function may be of the typeTaking the actual situation as an example, the first loss function is softmax loss function, ArcFace loss function, CosFace loss function, AirFace loss function, etc., in the embodiment, the first loss function is softmax loss function, wherein the softmax loss function is cross entropy in nature, and the first loss function is L_in1Is shown, and

1≤i≤m。

after the first loss function is obtained, model parameters (a first initial parameter and a second initial parameter) of the first backbone network model and the first head network model can be updated according to the first loss function, during updating, an SGD algorithm is adopted, namely, a partial derivative calculation mode is utilized, gradients of the first initial parameter and the second initial parameter are respectively calculated according to the first loss function, and updating amplitudes of the first initial parameter and the second initial parameter are determined according to the gradients, wherein the gradient corresponding to the first initial parameter θ is recorded as a first gradient, and the first gradient can be expressed as a first gradient

Second initial parameter

The corresponding gradient is denoted as the second gradient, which can be expressed as

After that, the first initial parameter is updated according to the first gradient, and since the first backbone network model also needs to extract features of the application data in a training process, and the current update of the first initial parameter is only for sample data, the current update process may be considered as a false update (fake update), where the false update may also be understood as an intermediate update, rather than a final update. And then, recording the model parameters obtained by the false update as first false update parameters, namely, updating the first backbone network model from the first initial parameters to the first false update parameters.Since the neural network model is trained by using the thought of meta-learning, the learning rate is introduced when the model parameters are updated. In the embodiment, the learning rate used when the sample data is processed in the present scheme is recorded as the first learning rate. Then, updating the model parameters based on the first learning rate and the first gradient, wherein the specific process can be formally expressed as:

wherein alpha is₁Denotes a first learning rate, theta₁Indicating a first false update parameter.

Similarly, the second initial parameter is updated according to the second gradient, and since the first head network model only processes sample data in one training process, the updating process may be regarded as a true update (real update), where the true update may also be understood as a final update. And then, recording the model parameters obtained after the true update as first true update parameters, namely updating the first head network model from the second initial parameters to the first true update parameters. And when updating, introducing a first learning rate, namely updating model parameters based on the first learning rate and the first gradient, wherein the specific process can be formally expressed as:

wherein the content of the first and second substances,

representing the first true update parameter.

And updating the first initial parameter and the second initial parameter to obtain an updated neural network model, wherein the updated neural network model comprises an updated first trunk network model, an updated first head network model and an un-updated second head network model. And then, inputting a target amount of application data into the updated neural network model to modify the model parameters into third parameters. At this time, it is set that the first parameter further includes a third initial parameter of the second head network model, and the third parameter includes a second true update parameter of the first backbone network model and a third true update parameter of the second head network model. Correspondingly, inputting the target amount of application data into the updated neural network model, and correcting the model parameters of the neural network model according to the second output result of the updated neural network model and the second parameters includes: inputting the target amount of application data into the updated neural network model to obtain a corresponding second output result; calculating a second loss function of the updated neural network model according to the second output result; determining a third gradient of the first false update parameter and a fourth gradient of the third initial parameter according to the second loss function; and correcting the first initial parameter into a second true update parameter according to the first gradient, the third gradient and the second learning rate, and correcting the third initial parameter into a third true update parameter according to the fourth gradient and the second learning rate.

Specifically, the first parameters should also include model parameters of the second head network model before updating, and in the embodiment, the model parameters before updating are denoted as third initial parameters, which are denoted by ψ. For example, since the sample data and the application data of the input neural network model are the same in quantity, in the present scheme, m application data are selected from all application data collected in an application scene and are marked as { x'₁、x’₂、……、x’_mAnd then, loading the m application data into the updated first backbone network model in batch by using a second data loader, wherein the model parameter of the updated first backbone network model is a first false update parameter theta₁Thus, the function of the updated first backbone network model is denoted as

At this time, the second feature vector output by the first backbone network model is

Then, the second feature vector is used

Inputting the data into a second head network model, wherein the function of the second head network model is recorded as h_ψ(. a) a second output result output by the second head network model

Specifically, the second loss function is of the same type as the first loss function, and in the embodiment, the second loss function is also a softmax loss function. L for the second loss function_out1It is shown that,

i is more than or equal to 1 and less than or equal to m. After the second loss function is obtained, the model parameters (the first initial parameter and the third initial parameter) of the first backbone network model and the second head network model can be corrected according to the second loss function. In the correction, the SGD algorithm is used. Because the updated first trunk network model and the second head network model are used in the scheme, when the gradient is calculated, the gradients of a first false update parameter and a third initial parameter are respectively calculated according to the second loss function, wherein the first false update parameter theta₁The corresponding gradient is denoted as the third gradient, which can be expressed as

The gradient corresponding to the third initial parameter ψ is denoted as a fourth gradient, and the fourth gradient can be expressed as

After that, the model parameters of the first backbone network model and the second head network model are modified, wherein the modification process may also be understood as an update process of the model parameters. Specifically, when the model parameter of the first backbone network model is corrected, the first initial parameter is corrected to obtain the final modelThe parameters, that is, the model parameters at the end of the training process, so the modification process may also be understood as true update (real update), and the modified model parameters are recorded as the second true update parameters. Typically, since the first trunk network model has undergone one false update, when the current true update is performed, a first gradient in a false update process needs to be considered, so that the modified neural network model maintains the performance of the sample scene as much as possible, and the problem of forgetting is avoided. In the present scheme, the learning rate introduced when the application data is processed is recorded as a second learning rate, where the second learning rate and the first learning rate may be the same or different, and at this time, a specific process of modifying from the first initial parameter to the second true update parameter may be formally expressed as:

wherein, beta₁It is indicated that the second learning rate is,

can be understood as a fusion gradient, λ₁Representing a hyper-parameter, which can be set in accordance with the actual situation, by means of which the importance of the two gradients can be balanced, it being understood that lambda₁When larger, the first backbone network model is more concerned with the application data, λ₁When smaller, the first backbone network model focuses more on the sample data. Theta used to the left of the arrow represents the second true update parameter and theta used to the right of the arrow represents the first initial parameter. It can be known from the above formal representation that the forgetting problem is avoided by adding the first gradient, and the first backbone network model is suitable for the application scenario by adding the second gradient, that is, the first backbone network model has better performance in the application scenario. It should be noted that, since the second gradient is obtained from the first false update parameter, rather than the first initial parameter, the second gradient is based on the idea of learning by the learning in meta-learning, and it is expected that the first backbone network model has better performance in the application scenario on the basis of being applied to the sample scenario, and it can also be understood that the second gradient is obtained by using the updated first backbone network model as the second initial parameterThe reference is fine-tuned, which avoids the overfitting problem encountered in the prior art when fine-tuning directly.

Likewise, the third initial parameter is updated according to the fourth gradient. Since the second head network model only processes the application data in one training process, the update process may be regarded as a real update (real update). And then, recording the model parameters obtained after the true update as third true update parameters, namely updating the second head network model from the third initial parameters to the third true update parameters. And when updating, introducing a second learning rate, wherein the specific process can be formally expressed as:

where ψ used on the left of the arrow indicates a third true update parameter, and ψ used on the right of the arrow indicates a third initial parameter.

It can be understood that, since the first head network model is only true updated once, the updated first true update parameter may be directly used as a final model parameter of the first head network model in the current training process, and at this time, the third parameter should include the first true update parameter in addition to the second true update parameter and the third true update parameter.

Optionally, after the training is finished, the third parameter is used as the first parameter, and the training is performed again until the loss functions (the first loss function and the second loss function) converge or a set training number is reached or the training is interrupted manually. The loss function convergence means that the loss value of the loss function is within a certain range when training is completed for a continuously set number of times.

And in a second scheme, the backbone network model comprises a second backbone network model, and the head network model comprises a third head network model.

In the scheme, the number of the backbone network models is one and is recorded as a second backbone network model, and the number of the head network models is also one and is recorded as a third head network model. In the embodiment, the feature vectors corresponding to the sample data of the target quantity are marked as fifth feature vectors, then the fifth feature vectors are input to the third head network model, and the third head network model outputs a corresponding first processing result according to the fifth feature vectors. Similarly, the target amount of application data is input to the second backbone network model, so that the second backbone network model outputs the corresponding feature vector. Optionally, the second backbone network model may determine that the currently input data is sample data or application data through a data loader used when the data is input. Optionally, the sample data and the application data use different labels, so that the second backbone network model distinguishes the input data, and the obtained feature vector is input to the corresponding head network model.

Typically, when the model parameters of the neural network model are updated according to the first output result, the first parameters include a fourth initial parameter of the second backbone network model and a fifth initial parameter of the third head network model, and the second parameters include a second false update parameter of the second backbone network model and a third false update parameter of the third head network model. Correspondingly, inputting the sample data of the target quantity into the neural network model, and updating the model parameters of the neural network model according to the first output result of the neural network model includes: inputting the target amount of sample data into a neural network model to obtain a corresponding first output result; calculating a third loss function of the neural network model according to the first output result; determining a fifth gradient of the fourth initial parameter and a sixth gradient of the fifth initial parameter according to a third loss function; and updating the fourth initial parameter to a second false update parameter according to the fifth gradient and the third learning rate, and updating the fifth initial parameter to a third false update parameter according to the sixth gradient and the third learning rate.

In particular toIn the present embodiment, the model parameter of the second trunk network model before being updated is recorded as a fourth initial parameter, which is expressed as θ₂The model parameters of the third head network model are recorded as fifth initial parameters and expressed as

Illustratively, m sample data are selected from the open source data set and denoted as { x }₁、x₂、……、x_mAnd then inputting m sample data into a second backbone network model in batches by utilizing a first data loader, wherein a function of the second backbone network model is recorded as

The fifth feature vector output by the second backbone network model is recorded as

Then, the fifth feature vector is used

Inputting to a third head network model, wherein the function of the third head network model is recorded as

First output result output by the third head network model

Further, after the first output result is obtained, a loss function of the neural network model is constructed according to the first output result, and in the embodiment, the loss function is recorded as a third loss function. Wherein, the thirdThe type of the loss function can be set according to actual conditions, in this embodiment, the third loss function is the softmax loss function as an example, and in this case, the third loss function is L_in2It is shown that,

1≤i≤m。

after the third loss function is obtained, the model parameters (the fourth initial parameter and the fifth initial parameter) of the second trunk network model and the third head network model can be updated according to the third loss function, during updating, the SGD algorithm is adopted, namely, the method of solving partial derivatives is utilized, the gradients of the fourth initial parameter and the fifth initial parameter are respectively calculated according to the third loss function, so as to determine the updating amplitudes of the fourth initial parameter and the fifth initial parameter according to the gradients, wherein the fourth initial parameter theta is₂The corresponding gradient is denoted as the fifth gradient, which may be expressed as

Fifth initial parameter

The corresponding gradient is denoted as the sixth gradient, which may be expressed as

And then, updating the fourth initial parameter according to the fifth gradient, wherein in a training process, the second backbone network model further needs to extract features of the application data, and the updating of the fourth initial parameter is only for sample data, so that the updating process can be regarded as a fake updating (fake update). And then, recording the model parameters obtained after the false update as second false update parameters, namely updating the second backbone network model from the first initial parameters to the second false update parameters. Since the neural network model is trained by using the thought of meta-learning, the learning rate is introduced when the model parameters are updated. In this scheme, the learning rate used when the sample data is processed is recorded as the third learning rate. After thatAnd updating the model parameters based on the third learning rate and the fifth gradient, wherein the specific process can be expressed in a formalization mode as follows:

wherein alpha is₂Indicates the third learning rate, theta₃Indicating a second false update parameter.

Similarly, the fifth initial parameter is updated according to the sixth gradient, and since the third head network model needs to process application data in one training process, and the current update on the fifth initial parameter is only for sample data, the current update process may be considered as a fake update (fake update). And then, recording the model parameters obtained after the false update as third false update parameters, namely updating the third head network model from the fifth initial parameters to the third false update parameters. And when updating, introducing a third learning rate, namely updating the model parameters based on the third learning rate and a sixth gradient, wherein the specific process can be expressed in a formalization mode as follows:

wherein alpha is₂It represents the third learning rate and the third learning rate,

indicating a third false update parameter.

And updating the fourth initial parameter and the fifth initial parameter to obtain an updated neural network model, wherein the updated neural network model comprises an updated second trunk network model and an updated third head network model. And then, inputting the target amount of sample data into the updated neural network model to obtain a third parameter. At this time, it is set that the third parameter includes a fourth true update parameter of the second backbone network model and a fifth true update parameter of the third head network model. Correspondingly, inputting the target amount of application data into the updated neural network model, and correcting the model parameters of the neural network model according to the second output result of the updated neural network model and the second parameters includes: inputting the target amount of application data into the updated neural network model to obtain a corresponding second output result; calculating a fourth loss function of the updated neural network model according to the second output result; determining a seventh gradient of the second false update parameter and an eighth gradient of the third false update parameter according to a fourth loss function; correcting the fourth initial parameter into a fourth true update parameter according to the fifth gradient, the seventh gradient and the fourth learning rate; and modifying the fifth initial parameter to a fifth true update parameter according to the sixth gradient, the eighth gradient and the fourth learning rate.

For example, since the sample data and the application data of the input neural network model are the same in quantity, in the present scheme, m application data are selected from all application data collected in an application scene and are marked as { x'₁、x’₂、……、x’_mAnd then, inputting the m application data into the updated second backbone network model in batch by using a second data loader, wherein the model parameter of the updated second backbone network model is a second false update parameter theta₃Thus, the function of the updated second backbone network model is denoted as

At this time, the sixth feature vector output by the first backbone network model is

Then, the sixth feature vector is used

Inputting the model parameter into the updated third head network model, wherein the model parameter of the updated third head network model is a third false update parameter

Therefore, the function of the updated third header network model is recorded as

Second output result output by the third head network model

Specifically, a loss function of the neural network model is constructed according to the second output result, and is recorded as a fourth loss function in the present scheme, where the type of the fourth loss function is the same as that of the third loss function, and in the present scheme, the fourth loss function is also a softmax loss function. Fourth loss function L_out2It is shown that,

i is more than or equal to 1 and less than or equal to m. After the fourth loss function is obtained, the first parameters (the fourth initial parameter and the fifth initial parameter) of the second backbone network model and the third head network model may be modified according to the fourth loss function. In the correction, the SGD algorithm is used. Since the updated second trunk network model and the updated third head network model are used in this step, when the gradient is calculated, the gradients of the second false update parameter and the third false update parameter are calculated according to the fourth loss function, wherein the second false update parameter θ is₃The corresponding gradient is denoted as the seventh gradient, which can be expressed as

Third false update parameter

The corresponding gradient is denoted as the eighth gradient, which may be expressed as

After that, the first parameter is corrected, wherein the correction process can also be understood as an update process of the model parameter. And when the fourth initial parameter of the second backbone network model is corrected, obtaining the final model parameter, namely the sample data and the application data are subjected to correction at presentOnce training is performed, therefore, the current modification process can also be understood as true update (real update), and the modified model parameter is recorded as a fourth true update parameter. Typically, since the second backbone network model undergoes a false update, a fifth gradient in the false update process needs to be considered during the true update, so that the finally obtained second backbone network model is suitable for both the sample scenario and the application scenario. In the true update, a learning rate is also introduced, the learning rate used in processing the application data is recorded as a fourth learning rate in the present solution, the fourth learning rate and the third learning rate may be the same or different, and at this time, a specific process of modifying the fourth initial parameter into a fourth true update parameter may be formally expressed as:

wherein, beta₂It represents the fourth learning rate and the fourth learning rate,

can be understood as a fusion gradient, λ₂Denotes a hyper-parameter, which can be set in accordance with the actual situation, lambda₂And λ₁The functions of the above-mentioned components are the same, and are not described herein. Theta used to the left of the arrow₂Theta used to the right of the arrow representing the fourth true update parameter₂Representing the fourth initial parameter. It can be understood that the beneficial effects of the problem and implementation solved when the second backbone network model is truly updated are the same as the beneficial effects of the problem and implementation solved when the first backbone network model is truly updated, and the detailed description thereof is omitted here.

Likewise, the fifth initial parameter is corrected according to the eighth gradient, wherein the correction process can also be understood as an update process of the model parameter. When the fifth initial parameter of the third head network model is modified, the final model parameter is obtained, that is, the sample data and the application data have been trained once, so the modification process can also be understood as true update (real update), and the modified model parameter is recorded as the fifth true update parameter. Typically, the third head network model has undergone a false update, and thus, is not updated in the present contextAnd when performing false update, considering a sixth gradient in the false update process, so that the finally obtained third head network model is suitable for the sample scene and the application scene simultaneously. In the true update, a fourth learning rate is also introduced, and at this time, the specific process of modifying from the fifth initial parameter to the fifth true update parameter can be formally expressed as:

wherein the content of the first and second substances,

can be understood as a fusion gradient, λ₂Indicating the hyper-parameter, which may be specified in accordance with the actual situation, to the left of the arrow

Indicating the fifth true update parameter, used to the right of the arrow

Indicating a sixth initial parameter. It can be understood that the beneficial effects of the problem and implementation solved when the third head network model is updated really are the same as the beneficial effects of the problem and implementation solved when the second backbone network model is updated really, and are not described herein again.

After the end of the current training, the third parameter is used as the first parameter, and the training is performed again until the loss functions (the third loss function and the fourth loss function) converge, or the set training number is reached, or the training is manually interrupted.

It can be understood that the target amount of sample data may be regarded as a group of data, similarly, the target amount of application data may be regarded as a group of data, and a group of sample data and feature data are input in the training processes of the first scheme and the second scheme, and in practical application, a plurality of groups of data may also be adopted, that is, in one training process, a plurality of groups of data are used, and the fusion gradients corresponding to each group of data are averaged, so that the averaged fusion gradient is used as the gradient adopted in true update.

And a third scheme is that the backbone network model comprises a second backbone network model, and the head network model comprises a third head network model.

The method comprises the steps of firstly, performing false updating and true updating on a first parameter, then, performing false updating and true updating on the first parameter, wherein the false updating and the true updating are performed on the first parameter, and the false updating and the true updating are performed on the first parameter.

Specifically, the sample data has k groups, each group of sample data contains target number of sample data, and k is larger than 1; correspondingly, inputting the sample data of the target quantity into the neural network model, and updating the model parameters of the neural network model according to the first output result of the neural network model includes: performing iterative updating on model parameters of the neural network model for k times according to k groups of sample data, wherein the model parameters before the iterative updating for k times are first parameters, and the model parameters after the iterative updating for k times are first intermediate parameters; and updating the model parameters of the neural network model from the first parameters to the second parameters according to the first intermediate parameters and the fifth learning rate. Wherein, the one-time iteration updating process comprises the following steps: inputting a group of sample data into a neural network model to obtain a first output result; constructing a fifth loss function of the neural network model according to the first output result; determining a ninth gradient of the model parameters before the iteration is updated according to the fifth loss function; and updating the model parameters according to the ninth gradient and the fifth learning rate.

Specifically, k times of iterative updating are performed on model parameters of the neural network model by using k groups of sample data, the model parameters before the k times of iterative updating are first parameters, and the model parameters after the k times of iterative updating are updatedAn intermediate parameter. Typically, the first iteration update is taken as an example for description, and in this case, the first parameter includes a model parameter θ of the second backbone network model₄And model parameters of a third head network model

Specifically, m sample data are selected from the open source data set and are marked as { x₁、x₂、……、x_mInputting the data into a second backbone network model, wherein the function of the second backbone network model is recorded as

The feature vector output by the second backbone network model is recorded as

Then, the feature vector is processed

Accordingly, the first output result output by the third head network model

And then, constructing a loss function of the neural network model according to the first output result, in this scheme, recording the constructed loss function as a fifth loss function, wherein the type of the fifth loss function can be set according to actual conditions, in this scheme, taking the fifth loss function as the softmax loss function as an example, and at this time, using L as the fifth loss function₀It is shown that,

i is more than or equal to 1 and less than or equal to m. After obtaining the fifth loss function, the first parameter (theta) is calculated according to the fifth loss function₄And

) In the present embodiment, the gradient corresponding to the first parameter is recorded as a ninth gradient, and at this time, θ is₄The corresponding ninth gradient is represented as

The corresponding ninth gradient is represented as

Then according to

To theta₄Updating is carried out, and a learning rate is introduced during updating, in the scheme, the learning rate used during sample data processing is recorded as a fifth learning rate, and at the moment, the learning rate is calculated according to theta₄The specific process of updating can be formally expressed as:

wherein alpha is₃It represents the fifth learning rate and the fifth learning rate,

representing the updated model parameters of the iteration, in the same way, according to

To pair

And updating, wherein the specific process can be formally expressed as:

wherein the content of the first and second substances,

representing the updated model parameters. Thus, one iteration of updating is completed. Thereafter, the data is read from the open sourceSelecting m sample data in a set and marking as { x₁、x₂、……、x_mIt can be understood that the sample data of the two groups may be completely or partially different, and then, the m sample data are input into the second backbone network model, where the model parameters of the second backbone network model are

Function is recorded as

The feature vector output by the second backbone network model is recorded as

Then, the feature vector is processed

Input to a third head network model, wherein the model parameters of the third head network model are

Function is as

Accordingly, the first output result output by the third head network model

Then, a fifth loss function is constructed according to the first output result, and in the iteration updating process, the fifth loss function uses L₁It is shown that,

i is more than or equal to 1 and less than or equal to m. After the fifth loss function is obtained, the calculation is carried out according to the fifth loss function

A ninth gradient of

And

a ninth gradient of

Then according to

To pair

Performing an update, and introducing a fifth learning rate at the time of the update, at this time, to

The specific process of updating can be formally expressed as:

wherein the content of the first and second substances,

representing updated model parameters, in accordance with

To pair

And updating, wherein the specific process can be formally expressed as:

wherein the content of the first and second substances,

representing the updated model parameters. And then, continuing to perform iterative updating until k iterative updating is performed, wherein for the second backbone network model, the specific process of the k iterative updating can be formally expressed as:

wherein the content of the first and second substances,

representing the updated model parameters for k iterations,

represents the model parameters, L, after k-1 iterative updates_k-1Representing a fifth loss function calculated at the time of the kth iterative update. Similarly, for the third head network model, the specific process of k iterative updates can be formally expressed as:

wherein the content of the first and second substances,

representing the updated model parameters for k iterations,

representing the model parameters after k-1 iteration updates. At this time, the process of the present invention,

and

and the first intermediate parameter is obtained after the k times of iteration updating.

And then, updating the first parameter of the neural network model according to the first intermediate parameter to obtain a second parameter, and introducing a fifth learning rate during updating. Specifically, the model parameter θ for the second backbone network model₄Formalized representation at update time as

Model parameters for a third head network model

Formalized representation at update time as

θ₅And

constituting the second parameter. From the above formal representation, when the first parameter is updated to the second parameter, the variation of the model parameter (i.e. the variation of the model parameter during the k iterations is updated)

And

) As a gradient, an update of the model parameters is then effected. It is understood that the above process only targets sample data, and therefore, the present update process may be considered as a fake update (fake update). After that, a true update by the application data is required.

Accordingly, the application data has k groups, and each group of application data has a target number of application data. Correspondingly, inputting the target amount of application data into the updated neural network model, and correcting the model parameters of the neural network model according to the second output result of the updated neural network model and the second parameters includes: performing iterative correction on the updated model parameters of the neural network model for k times according to the application data of k groups of target quantities, wherein the model parameters before the iterative correction for k times are second parameters, and the model parameters after the iterative correction for k times are second intermediate parameters; and modifying the model parameters of the neural network model from the first parameters to third parameters according to the second intermediate parameters, the sixth learning rate and the second parameters. Wherein, the one-time iterative correction process comprises the following steps: inputting the current group of application data into the updated neural network model to obtain a second output result; constructing a sixth loss function of the neural network model according to the second output result; determining a tenth gradient of the model parameter before the iteration correction according to the sixth loss function; and correcting the model parameters of the updated neural network model according to the tenth gradient and the sixth learning rate.

Specifically, k iterative corrections are performed on the model parameters of the updated neural network model by using k groups of application data, wherein the k iterative corrections can also be understood as k iterative updates. The model parameters before the k iterative corrections are the second parameters, and the model parameters after the k iterative corrections are updated. Note that the specific means for one iterative correction is the same as that for one iterative update, and only the learning rate introduced in one iterative correction is denoted as the sixth learning rate, and the sixth learning rate and the fifth learning rate may be the same or different. At this time, for the updated second backbone network model, the specific process of k iterative corrections can be formally expressed as:

wherein, beta₃A sixth learning rate is expressed, and the calculated loss function is expressed as a sixth loss function and the calculated gradient is expressed as a tenth gradient θ in the iterative correction₅For the second parameters of the updated second backbone network model,

denotes theta₅Corresponding tenth gradient, L'₀A sixth loss function representing a calculation at the first iterative correction,

representing the model parameters of the second backbone network model after the first iteration of modification,

to represent

Corresponding tenth gradient, L'₁A sixth loss function representing a calculation at the second iterative correction,

representing the model parameters of the second backbone network model after the second iteration correction,

representing the model parameters of the second backbone network model after the k-1 iteration correction,

to represent

Corresponding tenth gradient, L'_k-1Representing a sixth loss function calculated at the modification of the k-th iteration,

representing the modified model parameters of the second backbone network model at the k-th iteration. Similarly, for the updated third head network model, the specific process of k iterative corrections can be formally expressed as:

wherein the content of the first and second substances,

being a second parameter of the third head network model,

to represent

The corresponding tenth gradient is set to have a gradient,

representing the model parameters of the third head network model after the first iteration of modification,

to represent

The corresponding tenth gradient is set to have a gradient,

representing the model parameters of the third head network model after the second iteration of modification,

representing the model parameters of the third head network model after the k-1 th iteration correction,

to represent

The corresponding tenth gradient is set to have a gradient,

representing the modified model parameters of the third head network model at the k-th iteration. At this time, the second intermediate parameter includes

And

and then, correcting the first parameter of the neural network model according to the second intermediate parameter to obtain a third parameter, and introducing a sixth learning rate during correction. Wherein, in the modification, the second parameter is also considered, so that the modified neural network model is adapted to the sample scene and the application scene. Typically, the modification is determined by a second intermediate parameterAnd taking the parameter variable quantity of the second parameter as the gradient of the second parameter, similarly, determining the parameter variable quantity when the first parameter is updated to the second parameter and taking the parameter variable quantity of the second parameter as the gradient of the first parameter, and then correcting the first parameter to be a third parameter according to the two gradients and a third learning rate, wherein the correction formalization of the model parameter of the second trunk network model is expressed as that of the second trunk network model

Wherein theta used to the left of the arrow₄Theta used to the right of the arrow representing the modified third parameter₄Representing the first parameter before modification. As is clear from the above formal representation, by adding θ₄-θ₅Avoid the forgetting problem by adding

Therefore, the second backbone network model is suitable for application scenarios, namely, the second backbone network model has better performance in the application scenarios. It is required to explain that

The gradient obtained based on the false update is not the gradient obtained based on the first parameter, so that the method is essentially based on the idea of learning in the meta-learning, and the second backbone network model is expected to have better performance in the application scene on the basis of being suitable for the sample scene, and can also be understood as

And the updated first trunk network model is taken as a reference for fine tuning, so that the overfitting problem in the prior art when the fine tuning is directly carried out can be avoided. Similarly, for the third head network model, the modified formal representation of the model parameters is as follows

Wherein the arrow is used to the left

Indicating the modified third parameter, used to the right of the arrow

The first parameter before modification is represented, and the specific solved problem and the achieved beneficial effect thereof are the same as those of the second backbone network model when the second backbone network model is really updated, and details are not repeated herein.

After the training is finished, the third parameter is used as the first parameter, and the training is performed again until the loss functions (the fifth loss function and the sixth loss function) converge or the set training number is reached.

It should be noted that the neural network model structure in the present embodiment is consistent with the neural network model structure in the second embodiment. In practical applications, the neural network model structure may also be a neural network model structure in the first scheme, at this time, a processing manner of the first backbone network model is the same as a processing manner of the second backbone network model, and a processing manner of the first head network model and the second head network model may refer to a false update process of the third head network model.

In another embodiment, the neural network model comprises a backbone network model. Correspondingly, inputting the sample data of the target quantity into the neural network model, and updating the model parameters of the neural network model according to the first output result of the neural network model includes: inputting the target amount of sample data into the backbone network model to obtain a first output result output by the backbone network model, wherein the first output result is a third feature vector; constructing a seventh loss function of the backbone network model according to the first output result; determining an eleventh gradient of the first parameter according to a seventh loss function; and updating the model parameters from the first parameters to the second parameters according to the eleventh gradient and the seventh learning rate. Inputting the target amount of application data into the updated neural network model, and correcting the model parameters of the neural network model according to the second output result of the updated neural network model and the second parameters includes: inputting the target amount of application data into the updated backbone network model to obtain a second output result output by the backbone network model, wherein the second output result is a fourth feature vector; constructing an eighth loss function of the backbone network model according to the second output result; determining a twelfth gradient of the second parameter according to the eighth loss function; and updating the model parameters from the first parameters to the third parameters according to the twelfth gradient, the eleventh gradient and the eighth learning rate.

For example, the description is given by taking a neural network model for single-class face recognition as an example, at this time, since it is not necessary to calculate the logits value, the neural network model may only include a backbone network model, and the backbone network model is used to extract features of data input to the neural network model and output feature vectors, where the type embodiment of the backbone network model is not limited, and for example, the backbone network model may be a residual network, a face recognition network (MobileFaceNet) running on a mobile terminal, and the like. In an embodiment, the number of the backbone network models is one, and the sample scene and the application scene share the backbone network model. Specifically, a target amount of sample data is selected from the open source data set, and then the target amount of sample data is input to the backbone network model, so that the backbone network model outputs a first output result, where the first output result is a feature vector output by the backbone neural network. And then, constructing a loss function of the backbone network model according to the third eigenvector, wherein in the embodiment, the loss function is recorded as a seventh loss function, and the type of the seventh loss function may be set according to an actual situation, for example, the seventh loss function is a contrast loss (contrast loss) function, a triplet loss function, or the like. Then, a gradient of the first parameter is determined according to the seventh loss function by using a partial derivative calculation method, and in the embodiment, the gradient is denoted as an eleventh gradient, where a calculation method of the eleventh gradient is the same as a calculation method of each gradient in the above embodiments, and details are not described here. After that, the first parameter is updated according to the eleventh gradient, and a seventh learning rate is used in the updating, it can be understood that the current updating is a false updating, and a specific updating manner is the same as the false updating manner adopted in the first and second schemes in the above embodiment, or another false updating manner may also be adopted. And then, selecting application data with a set number of application data set by a target from all the currently acquired application data, and inputting the application data into the updated backbone network model to obtain a corresponding second output result, wherein the second output result is a feature vector output by the backbone neural network, and in the embodiment, the feature vector is recorded as a fourth feature vector. And then, constructing a loss function of the backbone network model according to the second output result, wherein in the embodiment, the loss function is recorded as an eighth loss function, and the eighth loss function and the seventh loss function are of the same type. Then, the gradient of the second parameter is determined according to the eighth loss function by taking the partial derivative, and in the embodiment, the gradient is referred to as a twelfth gradient, wherein the twelfth gradient is calculated in the same manner as the eleventh gradient. And then, the first parameter is corrected according to the twelfth gradient and the second parameter, and an eighth learning rate is used in the correction, and the eighth learning rate and the seventh learning rate may be the same or synchronous, so that it can be understood that the current correction process is a true update process, and a specific correction manner is the same as that of the true update manner adopted in the second embodiment, or another true update manner may also be adopted.

After the training is finished, the third parameter is used as the first parameter, and the training is performed again until the loss functions (the seventh loss function and the eighth loss function) converge or the set training times is reached.

It can be understood that, in practical application, k iterative updates may also be performed by using k sets of sample data and k iterative modifications may also be performed by using k sets of application data, and a specific process may refer to scheme three, which is only to remove a relevant description of a third head network model in scheme three.

It should be noted that, when the first parameter is updated to the third parameter, the eleventh gradient is introduced, so that the performance of the neural network model in the sample scene can be maintained as much as possible, and the problem of forgetting is avoided.

Fig. 2 is a schematic structural diagram of a neural network model adjusting apparatus according to an embodiment of the present disclosure, and referring to fig. 2, the neural network model adjusting apparatus includes: an update module 201 and a correction module 202.

The updating module 201 is configured to input target amount of sample data to a neural network model, and update a model parameter of the neural network model according to a first output result of the neural network model, where the model parameter is a first parameter before updating, and the model parameter is a second parameter after updating; a correcting module 202, configured to input a target amount of application data to the updated neural network model, and correct a model parameter of the neural network model according to a second output result of the updated neural network model and the second parameter, where the model parameter before the correction is a first parameter, and the model parameter after the correction is a third parameter.

On the basis of the embodiment, the neural network model comprises a trunk network model and a head network model, wherein the trunk network model is used for extracting the characteristic vector of the input data of the neural network model; the head network model is used for obtaining an output result of the neural network model according to the feature vector.

On the basis of the above embodiment, the backbone network model includes a first backbone network model, the head network model includes a first head network model and a second head network model, the first backbone network model is configured to output a first feature vector when the target amount of sample data is input to the first backbone network model, and the first backbone network model is configured to output a second feature vector when the target amount of application data is input to the first backbone network model; the first head network model is used for obtaining the first output result according to the first feature vector; the second head network model is used for obtaining the second output result according to the second feature vector.

On the basis of the above embodiment, the first parameters include first initial parameters of the first backbone network model and second initial parameters of the first head network model, and the second parameters include first false update parameters of the first backbone network model and first true update parameters of the first head network model. The update module 201 includes: the first input unit is used for inputting the sample data of the target quantity into the neural network model so as to obtain a corresponding first output result; a first loss calculation unit, configured to calculate a first loss function of the neural network model according to the first output result; a first gradient calculation unit for determining a first gradient of the first initial parameter and a second gradient of the second initial parameter according to the first loss function; a first parameter updating unit, configured to update the first initial parameter to the first false update parameter according to the first gradient and a first learning rate, and update the second initial parameter to the first true update parameter according to the second gradient and the first learning rate.

On the basis of the above embodiment, the first parameters further include third initial parameters of the second head network model, and the third parameters include second true update parameters of the first backbone network model and third true update parameters of the second head network model. Accordingly, the modification module 202 includes: the second input unit is used for inputting the target amount of application data into the updated neural network model to obtain a corresponding second output result; a second loss calculation unit, configured to calculate a second loss function of the updated neural network model according to the second output result; a second gradient calculation unit for determining a third gradient of the first false update parameter and a fourth gradient of the third initial parameter according to the second loss function; a first parameter modification unit, configured to modify the first initial parameter into the second true update parameter according to the first gradient, the third gradient, and a second learning rate, and modify the third initial parameter into the third true update parameter according to the fourth gradient and the second learning rate.

On the basis of the above embodiments, the backbone network model includes a second backbone network model, and the head network model includes a third head network model.

On the basis of the foregoing embodiment, the first parameters include a fourth initial parameter of the second backbone network model and a fifth initial parameter of the third head network model, and the second parameters include a second false update parameter of the second backbone network model and a third false update parameter of the third head network model. The update module 201 includes: the third input unit is used for inputting the sample data of the target quantity into the neural network model so as to obtain a corresponding first output result; a third loss calculation unit, configured to calculate a third loss function of the neural network model according to the first output result; a third gradient calculation unit for determining a fifth gradient of the fourth initial parameter and a sixth gradient of the fifth initial parameter according to the third loss function; a second parameter updating unit, configured to update the fourth initial parameter to the second false update parameter according to the fifth gradient and a third learning rate, and update the fifth initial parameter to the third false update parameter according to the sixth gradient and the third learning rate.

On the basis of the above embodiment, the third parameter includes a fourth true update parameter of the second backbone network model and a fifth true update parameter of the third head network model. The modification module 202 includes: the fourth input unit is used for inputting the target amount of application data into the updated neural network model to obtain a corresponding second output result; a fourth loss calculating unit, configured to calculate a fourth loss function of the updated neural network model according to the second output result; a fourth gradient calculation unit configured to determine a seventh gradient of the second false update parameter and an eighth gradient of the third false update parameter according to the fourth loss function; a second parameter modification unit, configured to modify the fourth initial parameter into the fourth true update parameter according to the fifth gradient, the seventh gradient, and a fourth learning rate; correcting the fifth initial parameter to the fifth true update parameter according to the sixth gradient, the eighth gradient, and the fourth learning rate.

On the basis of the above embodiment, the sample data has k groups, where each group of sample data has a target number of sample data, and k > 1, and the updating module 201 includes: the iteration updating unit is used for performing iteration updating on the model parameters of the neural network model for k times according to k groups of sample data, wherein the model parameters are the first parameters before the iteration updating for k times, and the model parameters are the first intermediate parameters after the iteration updating for k times; and the third parameter updating unit is used for updating the model parameters of the neural network model from the first parameters to the second parameters according to the first intermediate parameters and a fifth learning rate. Wherein, the one-time iteration updating process comprises the following steps: inputting a current group of sample data into the neural network model to obtain a first output result; constructing a fifth loss function of the neural network model according to the first output result; determining a ninth gradient of the model parameter before the iteration is updated according to the fifth loss function; updating the model parameters according to the ninth gradient and the fifth learning rate.

On the basis of the above embodiment, the application data has k groups, each group of application data has a target number of application data therein, and the modification module 202 includes: the iterative correction unit is used for performing iterative correction on the updated model parameters of the neural network model for k times according to k groups of application data, wherein the model parameters are the second parameters before the iterative correction for k times, and the model parameters are the second intermediate parameters after the iterative correction for k times; and the third parameter correcting unit is used for correcting the model parameters of the neural network model from the first parameters to third parameters according to the second intermediate parameters, the sixth learning rate and the second parameters. Wherein, the one-time iterative correction process comprises the following steps: inputting the current group of application data into the updated neural network model to obtain a second output result; constructing a sixth loss function of the updated neural network model according to the second output result; determining a tenth gradient of the model parameter before the iteration correction according to the sixth loss function; and correcting the model parameters according to the tenth gradient and the sixth learning rate.

On the basis of the above embodiment, the neural network model includes a backbone network model. The update module 201 includes: a fifth input unit, configured to input target amount of sample data to the backbone network model to obtain a first output result output by the backbone network model, where the first output result is a third feature vector; a fifth loss calculating unit, configured to construct a seventh loss function of the backbone network model according to the first output result; a fifth gradient calculation unit for determining an eleventh gradient of the first parameter according to the seventh loss function; a fourth parameter updating unit, configured to update the model parameter from the first parameter to a second parameter according to the eleventh gradient and a seventh learning rate. The modification module 202 includes: a sixth input unit, configured to input a target amount of application data to the updated backbone network model to obtain a second output result output by the backbone network model, where the second output result is a fourth feature vector; a sixth loss calculating unit, configured to construct an eighth loss function of the updated backbone network model according to the second output result; a sixth gradient calculation unit configured to determine a twelfth gradient of the second parameter according to the eighth loss function; a fourth parameter modification unit, configured to update the model parameter from the first parameter to a third parameter according to the twelfth gradient, the eleventh gradient, and the eighth learning rate.

On the basis of the embodiment, target amount of sample data is input into the neural network model through a first data loader; a target amount of application data is input to the updated neural network model through a second data loader.

The neural network model adjusting device provided by the above can be used for executing the neural network model adjusting method provided by any of the above embodiments, and has corresponding functions and beneficial effects.

It should be noted that, in the embodiment of the neural network model adjusting apparatus, the included units and modules are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

Fig. 3 is a schematic structural diagram of a neural network model adjustment apparatus according to an embodiment of the present application. As shown in fig. 3, the neural network model adjusting apparatus includes a processor 30, a memory 31, an input device 32, and an output device 33; the number of processors 30 in the neural network model adjustment device may be one or more, and one processor 30 is taken as an example in fig. 3. The processor 30, the memory 31, the input device 32 and the output device 33 in the neural network model adjusting apparatus may be connected by a bus or other means, and fig. 3 illustrates the connection by a bus as an example.

The memory 31 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the neural network model adjusting method in the embodiment of the present invention (for example, the updating module 201 and the modifying module 202 in the neural network model adjusting apparatus). The processor 30 executes various functional applications and data processing of the neural network model adjusting apparatus by executing software programs, instructions and modules stored in the memory 31, that is, implements the neural network model adjusting method described above.

The memory 31 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the neural network model adjusting device, and the like. Further, the memory 31 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 31 may further include memory located remotely from processor 30, which may be connected to the neural network model adjustment device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 32 may be used to receive input numeric or character information and generate key signal inputs relating to user settings and function control of the neural network model adjusting apparatus. The output device 33 may include a display device such as a display screen.

The neural network model adjusting device comprises a neural network model adjusting device, can be used for executing any neural network model adjusting method, and has corresponding functions and beneficial effects.

In addition, the present invention also provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform operations related to the neural network model adjustment method provided in any of the embodiments of the present application, and have corresponding functions and advantages.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product.

Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory. The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A neural network model adjustment method is characterized by comprising the following steps:

2. The neural network model adjusting method according to claim 1, wherein the neural network model includes a trunk network model and a head network model, the trunk network model is used for extracting feature vectors of the input data of the neural network model; the head network model is used for obtaining an output result of the neural network model according to the feature vector.

3. The neural network model adjustment method according to claim 2, wherein the backbone network model includes a first backbone network model, the head network model includes a first head network model and a second head network model,

4. The neural network model adjustment method according to claim 3, wherein the first parameters include first initial parameters of the first backbone network model and second initial parameters of the first head network model, and the second parameters include first false update parameters of the first backbone network model and first true update parameters of the first head network model;

5. The neural network model adjustment method of claim 4, wherein the first parameters further include third initial parameters of the second head network model, the third parameters including second true update parameters of the first backbone network model and third true update parameters of the second head network model;

6. The neural network model adjustment method according to claim 2, wherein the backbone network model includes a second backbone network model, and the head network model includes a third head network model.

7. The neural network model adjustment method according to claim 6, wherein the first parameters include a fourth initial parameter of the second backbone network model and a fifth initial parameter of the third head network model, and the second parameters include a second false update parameter of the second backbone network model and a third false update parameter of the third head network model;

8. The neural network model adjustment method of claim 7, wherein the third parameters include a fourth true update parameter of the second backbone network model and a fifth true update parameter of the third head network model;

9. The neural network model adjustment method according to claim 3 or 6, wherein the sample data has k groups, and each group of sample data has a target number of sample data, k > 1;

wherein, the one-time iteration updating process comprises the following steps:

10. The neural network model adjustment method of claim 9, wherein the application data has k groups, and a target number of application data are included in each group of application data,

11. The neural network model adjustment method according to claim 1, wherein the neural network model includes a backbone network model;

12. The neural network model adjustment method according to claim 1, wherein the target amount of sample data is input to the neural network model through a first data loader;

13. A neural network model adjustment apparatus, comprising:

14. A neural network model adjustment apparatus, characterized by comprising:

one or more processors

A memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the neural network model adjustment method of any one of claims 1-12.

15. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a neural network model adjusting method according to any one of claims 1 to 12.