CN113837379A

CN113837379A - Neural network training method and device, and computer readable storage medium

Info

Publication number: CN113837379A
Application number: CN202111083941.2A
Authority: CN
Inventors: 林少波; 曾星宇; 陈大鹏; 赵瑞
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2021-09-14
Filing date: 2021-09-14
Publication date: 2021-12-24
Also published as: WO2023040147A1

Abstract

The embodiment of the disclosure provides a training method and a device of a neural network and a computer readable storage medium, wherein the method comprises the following steps: performing iterative training on the initial network by adopting a sample of each task in the first data set based on the initial meta-knowledge, the initial task knowledge of each task and a task loss function to obtain the task knowledge of each task; the first data set is a source domain data set containing an initial task category; based on task knowledge, initial meta knowledge and a meta loss function of each task, respectively carrying out iterative training on an initial network by adopting samples corresponding to various tasks in a first data set to obtain optimal meta knowledge; and iteratively training the network corresponding to the optimal meta-knowledge on the sample of each task in the second data set based on the optimal meta-knowledge, the task knowledge of each task and the task loss function to obtain the optimal task knowledge of each task and a target network corresponding to the optimal task knowledge. The present disclosure improves the generalization to conduct small sample training.

Description

Neural network training method and device, and computer readable storage medium

Technical Field

The present disclosure relates to computer vision technologies, and in particular, to a neural network training method and apparatus, and a computer-readable storage medium.

Background

Small sample learning is one of the emerging research areas where computer vision is important. Currently, many CNN-based target detectors have met with great success due to the rapid development of Convolutional Neural Networks (CNNs). Wherein the small amount of sample learning aims to develop the capability of the deep learning model under the small amount of sample scenes. The main few sample learning methods include: a meta-learning (meta-learning) model based on a segment (Episode) training method and a Pre-training-fine-tuning (Pre-train finetune) based transfer learning method.

However, the two types of current small sample learning methods are very similar in optimization process, but have corresponding implementation modes, which are not compatible with each other and have limitations.

Disclosure of Invention

The embodiment of the disclosure provides a training method and device for a neural network and a computer-readable storage medium, which improve the generalization of a small amount of sample training.

The technical scheme of the disclosure is realized as follows:

the embodiment of the present disclosure provides a training method of a neural network, the method including:

performing iterative training on an initial network by adopting a sample of each task in a first data set based on initial meta-knowledge, initial task knowledge of each task and a task loss function to obtain the task knowledge of each task; the first dataset is a source domain dataset comprising an initial task category;

based on the task knowledge, the initial meta knowledge and the meta loss function of each task, respectively performing iterative training on the initial network by adopting samples corresponding to each task in the first data set to obtain optimal meta knowledge;

performing iterative training on a network corresponding to the optimal meta-knowledge by adopting a sample of each task in a second data set based on the optimal meta-knowledge, the task knowledge of each task and the task loss function to obtain the optimal task knowledge of each task and a target network corresponding to the optimal task knowledge; wherein the second data set is a target domain data set of a target task class, and the sample size of each task class in the initial task class is larger than the sample size of each task class in the target task class.

In the above method, the iteratively training the initial network by using the sample of each task in the first data set based on the initial meta-knowledge, the initial task knowledge of each task, and the task loss function to obtain the task knowledge of each task includes:

performing iterative training on the initial network by adopting a sample of each task in a first data set based on initial meta-knowledge and initial task knowledge of each task to obtain a first training result;

and determining the task knowledge of each task when the sum of the task losses corresponding to each sample is determined to be minimum based on the task loss function and the first training result.

In the above method, the iteratively training the initial network by using samples corresponding to various tasks in the first data set based on the task knowledge of each task, the initial meta knowledge, and the meta loss function to obtain the optimal meta knowledge includes:

based on the task knowledge and the initial meta-knowledge of each task, respectively performing iterative training on the initial network by adopting samples corresponding to each task in the first data set to obtain a second training result;

and determining the optimal meta-knowledge when determining that the sum of the meta-losses corresponding to various tasks is minimum based on the meta-loss function and the second training result.

In the above method, during the training process based on a small number of samples of the segment, the first data set includes a query data set and a support data set corresponding to the segment; the support data set is used in the training process of task knowledge; the query data set is used in the training process of the meta-knowledge; wherein a segment characterizes a task;

the query data set comprises M tasks, and each task corresponds to N samples; the support data set comprises H tasks, and each task corresponds to K samples; the M tasks and the H tasks are different in types; m is a positive integer greater than or equal to 1, and N is a positive integer greater than or equal to 1; h is a positive integer greater than or equal to 1, and K is a positive integer greater than or equal to 1;

during testing of a small number of samples based on a segment, the second data set is a data set of a target task category different from an initial task category of the first data set; the second data set comprises a second data set, a second data set and a third data set, wherein the second data set comprises the total number of Q tasks, and each task corresponds to a small number of samples; q is a positive integer of 1 or more.

In the above method, in the pre-training process based on the pre-training-fine-tuning small number of samples, the first data set includes: m tasks are adopted, and each task corresponds to N samples; m is a positive integer greater than or equal to 1, and N is a positive integer greater than or equal to 1;

in a pre-training-fine-tuning-based fine tuning process of a small number of samples, the second data set is a data set of a target task category different from an initial task category of the first data set; the second data set comprises a second data set, a second data set and a third data set, wherein the second data set comprises the total number of Q tasks, and each task corresponds to a small number of samples; q is a positive integer of 1 or more.

In the above method, the iteratively training the initial network by using a sample of each task in the first data set based on the initial meta-knowledge, the initial task knowledge of each task, and the task loss function to obtain the task knowledge of each task includes:

and performing iterative training on the initial network by adopting a sample of each task in the first data set based on the initial meta-knowledge, the initial task knowledge of each task and a task loss function in combination with parameter compression processing to obtain the task knowledge of each task.

In the above method, the iteratively training the initial network by using a sample of each task in the first data set based on the initial meta-knowledge, the initial task knowledge of each task, and the task loss function in combination with parameter compression processing to obtain the task knowledge of each task includes:

training the initial network by adopting a sample of each task in a first data set based on the initial meta-knowledge and the initial task knowledge of each task to obtain a first sub-training result;

obtaining a first sub-loss and a first sub-network based on the first sub-training result and the task loss function;

under the condition that the first sub-loss is larger than a preset loss threshold value, performing element-random disturbance processing on the first sub-network to obtain a first disturbance sub-network; the first perturbation sub-network is a network obtained by compressing parameters of an initial network;

and continuing to perform iterative training on the first perturbation sub-network by adopting the sample of each task in the first data set until the obtained loss is less than or equal to the preset loss threshold value, so as to obtain task knowledge of each task.

based on the task knowledge, the initial meta knowledge and the meta loss function of each task, combining with parameter compression processing, and adopting samples corresponding to each task in the first data set to respectively carry out iterative training on the initial network, so as to obtain the optimal meta knowledge.

In the above method, the iteratively training the initial network by using samples corresponding to various tasks in the first data set based on the task knowledge of each task, the initial meta knowledge, and the meta loss function in combination with parameter compression processing to obtain the optimal meta knowledge includes:

training the initial network by adopting the data of each task in the first data set based on the initial meta-knowledge and the task knowledge of each task to obtain a second sub-training result;

obtaining a second sub-loss and a second sub-network based on the second sub-training result and the meta-loss function;

under the condition that the second sub-loss is larger than a preset loss threshold value, performing element-random disturbance processing on the second sub-network to obtain a second disturbance sub-network; the second perturbation sub-network is a network obtained by compressing parameters of an initial network;

and continuing to perform iterative training on the second perturbation sub-network by adopting the sample of each task in the first data set until the obtained loss is less than or equal to the preset loss threshold value, so as to obtain the optimal meta-knowledge.

In the method, the parameter compression processing mode includes at least one of the following:

discarding the network neurons corresponding to each task;

discarding the area structure of the network corresponding to each task;

and discarding the minimum structural area of the network corresponding to each task.

The embodiment of the present disclosure provides a training device for a neural network, the device including:

the training unit is used for carrying out iterative training on the initial network by adopting a sample of each task in the first data set based on the initial meta-knowledge, the initial task knowledge of each task and the task loss function to obtain the task knowledge of each task; the first dataset is a source domain dataset comprising an initial task category; based on the task knowledge, the initial meta knowledge and the meta loss function of each task, respectively performing iterative training on the initial network by adopting samples corresponding to each task in the first data set to obtain optimal meta knowledge; performing iterative training on a network corresponding to the optimal meta-knowledge by adopting a sample of each task in a second data set based on the optimal meta-knowledge, the task knowledge of each task and the task loss function to obtain the optimal task knowledge of each task and a target network corresponding to the optimal task knowledge; wherein the second data set is a target domain data set comprising target task categories, and the sample size of each task category in the initial task category is greater than the sample size of each task category in the target task category.

In the above apparatus, the apparatus further comprises: a determination unit;

the training unit is further used for performing iterative training on the initial network by adopting a sample of each task in the first data set based on the initial meta-knowledge and the initial task knowledge task of each task to obtain a first training result;

and the determining unit is used for determining the task knowledge of each task when the task loss sum corresponding to each sample is determined to be minimum based on the task loss function and the first training result.

In the above apparatus, the apparatus further comprises: a determination unit;

the training unit is further configured to perform iterative training on the initial network by using samples corresponding to various tasks in the first data set based on the task knowledge and the initial meta knowledge of each task, so as to obtain a second training result;

and the determining unit is used for determining the optimal meta-knowledge when the sum of the meta-losses corresponding to various tasks is determined to be minimum based on the meta-loss function and the second training result.

In the above apparatus, during training based on a small number of samples of a segment, the first data set includes a query data set and a support data set corresponding to the segment; the support data set is used in the training process of task knowledge; the query data set is used in the training process of the meta-knowledge; wherein a segment characterizes a task;

In the above apparatus, in the pre-training process based on the pre-training-fine-tuning small number of samples, the first data set includes: m tasks are adopted, and each task corresponds to N samples; m is a positive integer greater than or equal to 1, and N is a positive integer greater than or equal to 1;

In the above apparatus, the training unit is further configured to perform iterative training on the initial network by using a sample of each task in the first data set based on the initial meta-knowledge, the initial task knowledge of each task, and a task loss function in combination with parameter compression processing, so as to obtain the task knowledge of each task.

In the above apparatus, the apparatus further comprises: an acquisition unit and a parameter compression unit;

the training unit is further configured to train the initial network by using a sample of each task in a first data set based on the initial meta-knowledge and the initial task knowledge of each task to obtain a first sub-training result;

the obtaining unit is further configured to obtain a first sub-loss and a first sub-network based on the first sub-training result and the task loss function;

the parameter compression unit is configured to perform meta-random perturbation processing on the first sub-network to obtain a first perturbation sub-network when the first sub-loss is greater than a preset loss threshold; the first perturbation sub-network is a network obtained by compressing parameters of an initial network;

the training unit is further configured to continue to perform iterative training on the first perturbation sub-network by using a sample of each task in the first data set until the obtained loss is less than or equal to the preset loss threshold, so as to obtain task knowledge of each task.

In the above apparatus, the training unit is further configured to perform iterative training on the initial network by using samples corresponding to various tasks in the first data set, based on the task knowledge of each task, the initial meta knowledge, and the meta loss function, in combination with parameter compression processing, to obtain the optimal meta knowledge.

the training unit is further configured to train the initial network with a sample of each task in the first data set based on the initial meta-knowledge and the task knowledge of each task to obtain a second sub-training result;

the obtaining unit is configured to obtain a second sub-loss and a second sub-network based on the second sub-training result and the meta-loss function;

the parameter compression unit is configured to perform meta-random perturbation processing on the second sub-network to obtain a second perturbation sub-network when the second sub-loss is greater than a preset loss threshold; the second perturbation sub-network is a network obtained by compressing parameters of an initial network;

the training unit is further configured to continue to perform iterative training on the second perturbation sub-network by using the sample of each task in the first data set until the obtained loss is less than or equal to the preset loss threshold, so as to obtain the optimal meta-knowledge.

In the above apparatus, the parameter compression processing mode includes at least one of:

discarding the network neurons corresponding to each task;

discarding the area structure of the network corresponding to each task;

and discarding the minimum structure of the network corresponding to each task.

The embodiment of the present disclosure provides a training apparatus for a neural network, including:

a memory for storing a computer program;

and the processor is used for realizing the training method of the neural network when executing the computer program stored in the memory.

The embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, for implementing the above-mentioned training method of the neural network when being executed by a processor.

The training method and device for the neural network and the computer readable storage medium provided by the embodiment of the disclosure perform iterative training on the initial network by using a sample of each task in a first data set based on initial meta-knowledge, initial task knowledge of each task and a task loss function to obtain task knowledge of each task; the first data set is a source domain data set containing an initial task category; based on task knowledge, initial meta knowledge and a meta loss function of each task, respectively carrying out iterative training on an initial network by adopting samples corresponding to various tasks in a first data set to obtain optimal meta knowledge; performing iterative training on the network corresponding to the optimal meta-knowledge by adopting the sample of each task in the second data set based on the optimal meta-knowledge, the task knowledge of each task and the task loss function to obtain the optimal task knowledge of each task and a target network corresponding to the optimal task knowledge; the second data set is a target domain data set comprising target task categories, and the sample size of each task category in the initial task category is larger than that of each task category in the target task category. By adopting the scheme, the training method of the neural network can be realized under a unified small sample learning framework, the task knowledge is determined through training based on the sample quantity distribution of each task, and then the samples of all kinds of various tasks in the first data set are trained based on the task knowledge and the task distribution to obtain the optimal meta knowledge, so that the subsequent training of the training data set of a small number of samples based on the optimal meta knowledge is realized, and further, the target network corresponding to the optimal task knowledge is obtained to realize the test and reasoning of a specific task. Therefore, no matter which method of the small sample school is adopted, the small sample training can be realized under the framework, and the generalization of the small sample training is improved.

Drawings

FIG. 1 is a schematic diagram of an exemplary target task training method provided by an embodiment of the present disclosure;

FIG. 2 is a flow chart of an alternative neural network training method provided by an embodiment of the present disclosure;

FIG. 3 is a flow chart of another alternative neural network training method provided by an embodiment of the present disclosure;

FIG. 4 is a schematic structural diagram of an alternative training apparatus for neural networks according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of another alternative training apparatus for a neural network according to an embodiment of the present disclosure.

Detailed Description

For the purpose of making the purpose, technical solutions and advantages of the present disclosure clearer, the present disclosure will be described in further detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present disclosure, and all other embodiments obtained by a person of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present disclosure.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where permissible, so that the disclosed embodiments described herein can be practiced in other than the order shown or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. The terminology used herein is for the purpose of describing embodiments of the disclosure only and is not intended to be limiting of the disclosure.

Today, the Deep Neural Network (DNN) model has achieved tremendous success in making progress in many computer vision tasks. However, the performance of these deep neural network models relies heavily on large amounts of data and expensive manual annotations. When annotation data is scarce, DNN is not well suited for test data, especially in the case of training data when test data belongs to different classes. Thus, it is possible to learn to recognize or use only a small number of data markers to quickly detect new objects. Since some object classes are few samples or annotations thereof are extremely difficult to obtain, the generalization capability of the conventional neural network is insufficient, and thus a way of obtaining better generalization capability through a few sample learning comes to hand.

In an embodiment of the present disclosure, a small sample learning method includes: a meta-learning (meta-learning) model based on a segment (Episode) training method and a Pre-training + fine-tuning (Pre-train finetune) based transfer learning method. The meta-learning method comprises two stages, namely meta-training and meta-training, wherein the meta-training stage trains a model on a large amount of basic category data, and the meta-training is responsible for introducing a small amount of category samples on the basis of basic categories for further training. The two stages both adopt an Episode-based training method. One Episode means that a small training set of N-way, K-shot (N categories, K samples per category) is randomly acquired, a certain number of samples are randomly extracted on the corresponding category as a small test set, and the whole training process consists of a plurality of Episodes. The Pre-train finetune method is a common domain migration method and can be applied to the field of learning of a small number of samples. The pre-training stage refers to basic training on a large number of basic categories, a good initialization model is provided, and the fine adjustment refers to fine adjustment of parameters on a few-sample task and migration of the parameters to a corresponding task.

As shown in FIG. 1, different source tasks (1, 2, … …, N) carry out task knowledge and migratable knowledge, and target tasks are obtained through generalization. The generalization ability of different sources is a key to small sample learning, where migratable knowledge is adapted to the target task.

In the embodiment of the present disclosure, the two training methods of the neural network are implemented based on a meta-learning (meta-learning) framework. Based on meta-learning, a unified mathematical framework is constructed to explain the few sample learning methods of Episode-based and Pre-train finetune-based two main streams. The framework consists of a new reconstruction element training phase and an element testing phase. In the meta-training stage, the framework jointly considers the distribution of tasks and the distribution of data, and highlights common elements of a small amount of sample learning, including meta-knowledge, task knowledge, meta-loss, task loss and the distribution of data sets, so that an effective strategy is introduced to improve the model of the overall generalization capability.

Meanwhile, due to the rapid development of Convolutional Neural Networks (CNNs), the dependence of the current CNN model on a large amount of data is very obvious, and if only a small amount of training data exists, the model is often over-fitted.

That is, in the two methods, during the first stage (meta-training) of basic training, there is a significant over-fitting phenomenon on the basic class samples, which results in that the second stage model cannot achieve the optimal effect on a small number of classes (new classes). Aiming at the overfitting problem existing in the Meta-training stage, the embodiment of the disclosure also provides a simple and universal Meta-learning strategy (Meta-dropout) to relieve overfitting of the basic category, so that the generalization of the model is improved. The Meta-dropout can improve the model accuracy on a small number of sample target detection and image classification tasks and has strong universality. That is, Meta-dropout (Meta-stochastic perturbation) is applied to transferable knowledge of the deep neural network model. By utilizing meta-stochastic perturbation, the model exhibits great advantages over current few-sample learning methods on few-sample object detection and few-sample image classification tasks.

The embodiment of the disclosure provides a training method and device for a neural network and a computer-readable storage medium, which can improve the generalization of a small amount of sample training and prevent the problem of overfitting. An exemplary application of the training apparatus for a neural network provided in the embodiment of the present disclosure is described below, the training apparatus for a neural network provided in the embodiment of the present disclosure may be various types of user terminals such as an electronic device, a laptop, a tablet computer, a desktop computer, a set-top box, a mobile device, and the like, and may also be implemented as a server, which is not limited in the embodiment of the present disclosure.

In some embodiments, the server may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform.

Referring to fig. 2, fig. 2 is an alternative flow chart of a training method of a neural network provided in an embodiment of the present disclosure, which will be described with reference to the steps shown in fig. 2.

S101, performing iterative training on an initial network by adopting a sample of each task in a first data set based on initial meta-knowledge, initial task knowledge of each task and a task loss function to obtain task knowledge of each task; the first data set is a source domain data set that includes an initial task category.

S102, based on task knowledge, initial meta knowledge and a meta loss function of each task, performing iterative training on the initial network by adopting samples corresponding to various tasks in the first data set to obtain optimal meta knowledge.

In the embodiment of the present disclosure, a training method of a neural network is implemented based on a meta-learning (meta-learning) framework. Based on meta-learning, a unified mathematical framework is constructed to explain the few sample learning methods of Episode-based and Pre-train finetune-based two main streams. The framework consists of a new reconstruction element training phase (meta-training phase) and a meta-testing phase (meta-testing phase).

It should be noted that meta-learning aims at learning a general learning algorithm, which can be generalized across tasks and enables each new task to learn better.

In the disclosed embodiments, the overall goal of the small sample learning framework in the meta-training process may be: for the distribution of all the source tasks (corresponding to the initial task categories), assuming that the number of the tasks is infinite, each task is represented by the task in the ith, and when the number of data in the task in the ith is also infinite, learning a meta-knowledge is expected to be performed when the loss of the data based on the tasks and various tasks is minimum, so that each task can be well represented.

For example, the overall optimization process of meta-training can be shown as formula (1):

where p (T) represents the distribution of source tasks, where T consists of an infinite number of basic tasks, p (Dt) is the distribution of a training data set Dt for task T, E represents expectation, L is a loss function, w is meta-knowledge, θ_tFor task knowledge of task t, x is a sample or data.

It should be noted that the amount of training data in learning meta-knowledge is infinite. w is used to represent meta knowledge that can be transferred between different tasks. Meta-knowledge is a broad concept that can be a number of components in the learning process, such as parameter initialization, black box model, network structure, etc., and the embodiments of the present disclosure are not limited. Theta or theta_tIs task-specific task knowledge.

It should be noted that, based on the above overall optimization objective, it may be determined that the optimization objective in the meta-training phase may be based on an initial task, and in an initial state of the initial meta-knowledge, through an iterative process, an optimal meta-knowledge is obtained.

For example, the meta-training phase of meta-training can be optimized as shown in equation (2):

wherein, ω is the best element knowledge, ω is the initial element knowledge, and T is the task.

It should be noted that ω is obtained by learning based on tasks and data, and based on the initial meta-knowledge and the initial task knowledge (information related to T), the task knowledge related to the tasks is learned first, and then based on the learned task knowledge, the initial meta-knowledge is optimized, and finally the optimal meta-knowledge is obtained.

In the embodiment of the disclosure, the training device of the neural network may perform iterative training on the initial network by using a sample of each task in the first data set based on the initial meta knowledge and the initial task knowledge of each task, determine a loss based on a task loss function, and obtain the task knowledge of each task based on iterative update of the loss. Wherein the first data set is a source domain data set comprising an initial task category.

In some embodiments of the present disclosure, a training apparatus of a neural network iteratively trains an initial network using a sample of each task in a first data set based on initial meta-knowledge and initial task knowledge of each task to obtain a first training result; and determining task knowledge of each task when the sum of task losses corresponding to each sample is determined to be minimum based on the task loss function and the first training result.

It should be noted that the initial task category may be a basic category, and the disclosed embodiment is not limited thereto.

In an embodiment of the present disclosure, during the meta-training phase, the initial network or model is trained on a dataset D (corresponding to the first dataset), which represents the source domain dataset containing a large number of base class samples with a sufficient number.

It should be noted that the training device of the neural network may implement training of the initial network for the sample of each task in the first data set according to the task knowledge model, so as to obtain the task knowledge of each task.

For example, the task knowledge model may be shown in equation (3):

wherein, theta^*(i)(ω) task knowledge for the ith task, D_ijFor the corresponding jth sample in data set D (corresponding to the first data set) of the ith task, L^taskAs a function of task loss, θⁱIs the initial task knowledge of the ith task, and omega is the initial meta knowledge.

It should be noted that, the training apparatus of the neural network trains the initial network separately by using each sample of each task in the first data set according to the initial meta-knowledge of a certain task and the initial task knowledge task of the certain task based on the principle of formula (3) to obtain a first training result corresponding to each sample, determines the loss of each sample based on the actual value of the sample, the respective first training results and the task loss function, adds up the loss sum of all samples corresponding to each task to obtain the current task loss (i.e. the loss sum of the samples corresponding to each task), and continues to learn the task knowledge of the next round until the task loss is determined to be the minimum. Wherein the principle of the training process of task knowledge for each task is consistent.

In the embodiment of the present disclosure, the training device of the neural network may perform iterative training on the initial network respectively for samples corresponding to various tasks in the first data set according to the obtained task knowledge of each task in combination with the initial meta-knowledge, so as to obtain the optimal meta-knowledge.

In some embodiments of the present disclosure, the training device of the neural network may perform iterative training on the initial network by using samples corresponding to various tasks in the first data set based on task knowledge and initial meta knowledge of each task, to obtain a second training result; and determining the optimal meta-knowledge when the sum of the meta-losses corresponding to the various tasks is determined to be minimum based on the meta-loss function and the second training result.

In this embodiment of the present disclosure, the training device of the neural network may perform iterative training on the initial network according to the obtained task knowledge of each task and by combining the initial meta knowledge, and perform the meta loss of each task to obtain the sum of the meta losses of all tasks, and continue the iterative training to obtain the sum of the meta losses of each training, and use the meta knowledge determined when the sum of the meta losses corresponding to each task is the minimum as the optimal meta knowledge.

It should be noted that the training device of the neural network may implement training of the initial network according to the meta-knowledge model and according to the task knowledge of all tasks and the initial meta-knowledge task, and aiming at the samples of various tasks in the first data set, thereby obtaining the optimal meta-knowledge.

For example, the meta-knowledge model may be shown in equation (4):

where ω is the best meta-knowledge, ω is the initial meta-knowledge, p (T) represents the distribution of the source tasks, where T consists of an infinite number of basic tasks, p (Di) is the distribution of the training data set Di for task i, E represents expectation, L^metaIs a function of the element loss, theta^*(i)And (omega) is task knowledge of the ith task, and i is one of the tasks in p (T).

In some embodiments of the disclosure, the task knowledge model and the meta knowledge model may also be represented in the form of integrals, and embodiments of the disclosure are not limited.

Illustratively, the common task knowledge model (equation (5)) and the meta knowledge model (equation (6)) are proposed in the form of integrals, as follows:

it should be noted that ω is the initial meta knowledge, and ω is the best meta knowledge learned during the task iteration. ThetaⁱRepresenting task knowledge, θ^*(i)Is the task knowledge obtained by learning and is used for optimizing omega to obtain the optimal omega. L is^taskIs used to optimize task specific information, called task-loss is an internal objective, which represents which model or network is appropriate for a particular task; l is^metaIs used to generate the best meta knowledge, L^metaIs meta-loss, is an external target, indicating which w is good.

In the embodiment of the present disclosure, the meta-training phase has two optimization phases, including task optimization and meta-optimization, and the meta-optimization is performed depending on the result of the task optimization. Wherein in the formulas (3) and (5), based on L^taskFor the ith task, one iteration training can be carried out, and task knowledge can be obtained once in each iteration until the task knowledge with the minimum sum of task losses is obtained, namely the task knowledge is subjected to theta₀→θ₁→θ₂A process of → θ. Similarly, in equations (4) and (6), based on L^metaFor all samples of all tasks, one iteration training can be performed, each iteration can obtain one meta-knowledge until the optimal meta-knowledge with the minimum sum of the meta-losses is obtained, namely the w is experienced₀→w₁→w₂A process of → w.

In some embodiments of the present disclosure, the training apparatus of the neural network may implement the little-sample training based on the epicode mode and the little-sample training process based on the Pre-train-fine.

In this disclosure, the data set used by the training apparatus of the neural network includes a support data set and a query data set, and the support data set and the query data set are used as the first data set to implement a meta-training phase of the few-sample training method based on the Epis ode, and the meta-testing phase may be consistent with the meta-testing phase described later, which is not limited in this disclosure. The meta-training phase of the Episode-based small sample training method is mainly described below. Wherein the support dataset and the query dataset are each constructed based on a dataset containing a base category (i.e., an initial task category).

In some embodiments of the present disclosure, during training based on a small number of samples of a fragment, the first dataset includes a query dataset and a support dataset corresponding to the fragment; the support data set is used in the training process of task knowledge; the query dataset is used in a training process for meta-knowledge. Wherein a segment characterizes a task.

In the embodiment of the disclosure, the query dataset comprises M tasks, and each task corresponds to N samples; the support data set comprises H tasks, and each task corresponds to K samples; the types of the M tasks are different from those of the H tasks; m is a positive integer greater than or equal to 1, and N is a positive integer greater than or equal to 1; h is a positive integer greater than or equal to 1, and K is a positive integer greater than or equal to 1.

It should be noted that the types of tasks in the support dataset are different from the types or types of tasks in the query dataset, and each task corresponds to multiple samples. The number of the M tasks may be the same as or different from that of the H tasks, and the embodiment of the present disclosure is not limited. The number of N samples and the number of K samples may be the same or different, and the embodiments of the present disclosure are not limited. M and H denote the limited number of tasks and N and K denote the number of samples under the respective tasks, respectively.

In the case where the support dataset is constructed using a new dataset, some samples of the M categories may be randomly selected, while H samples different from the category of the support dataset are selected from the remaining data to construct the query dataset.

For example, assuming that M is equal to H, that is, M tasks are included in the support dataset and the query dataset (the tasks corresponding to the support dataset and the query dataset are different), and each task corresponds to N samples, what the support dataset and the query dataset may be represented by formula (7):

wherein D is_sourceIn order to be able to create a new data set,

in order to support the data set(s),

for querying the data set, M is the number of tasks in each of the support data set and the query data set.

In the disclosed embodiment, the training device of the neural network uses the support data set in the training process of the task knowledge; and the query data set is used in the training process of the meta-knowledge, and the Episode-based few-sample training method is realized.

For example, assuming that M is H and N is K, the solution is solved by using the support data set to substitute equation (5), and the task knowledge in the Episode-based low-sample training is obtained, as shown in equation (8).

Wherein, theta^*(i)(ω) task knowledge learned for the i-th task of the M tasks (number of tasks in the support dataset), L^taskIn order to be a function of the task loss,

for supporting jth task corresponding to ith task in data setSample, θⁱIs the initial task knowledge, and ω is the initial meta knowledge.

That is, in the embodiment of the present disclosure, the training device of the neural network may perform iterative training on the initial network by using a sample of each task in the support data set based on the initial meta knowledge and the initial task knowledge task of each task, determine a loss based on the task loss function, and obtain task knowledge of each task based on iterative update of the loss. Wherein the support data set is a limited data set containing basic category samples.

In some embodiments of the present disclosure, a training apparatus of a neural network iteratively trains an initial network with a sample of each task in a support dataset based on initial meta-knowledge and an initial task knowledge task of each task to obtain a first training result; and determining task knowledge of each task when the sum of task losses corresponding to each sample is determined to be minimum based on the task loss function and the first training result.

In the embodiment of the present disclosure, the training device of the neural network may perform iterative training on the initial network respectively for samples corresponding to all types of tasks in the query dataset according to the obtained task knowledge of each type of task in combination with the initial meta knowledge, so as to obtain the optimal meta knowledge.

In some embodiments of the present disclosure, the training device of the neural network may perform iterative training on the initial network by using samples corresponding to various tasks (M tasks) in the query dataset based on task knowledge and initial meta knowledge of each task, to obtain a second training result; and determining the optimal meta-knowledge when the sum of the meta-losses corresponding to the various tasks is determined to be minimum based on the meta-loss function and the second training result.

In the embodiment of the present disclosure, the training device of the neural network may perform iterative training on the initial network according to the obtained task knowledge of each task and by combining the initial meta knowledge, and perform the meta loss of each task to obtain the sum of the meta losses of all tasks, and continue the iterative training to obtain the sum of the meta losses of each training, and use the meta knowledge determined when the sum of the meta losses corresponding to each task is the minimum as the optimal meta knowledge.

For example, assuming that M is equal to H and N is equal to K, the task knowledge obtained by querying the data set and equation (8) is substituted into equation (6) to solve, so as to obtain the optimal meta knowledge in the Episode-based low-sample training, as shown in equation (9).

Wherein, ω is the best element knowledge, ω is the initial element knowledge, θ^*(i)Is the task knowledge learned by the ith task of the M learned tasks (the number of tasks in the query dataset), L^metaIn order to be a function of the element loss,

the sample is the jth sample corresponding to the ith task in the query dataset.

It can be understood that the training device of the neural network can be realized under a unified small-amount sample learning framework by using the support data set and the query data set, firstly training the number distribution of samples based on each task in the support data set to determine the task knowledge, and then training the samples of all tasks in the query data set based on the task knowledge and the task distribution to obtain the optimal meta knowledge, thereby realizing the subsequent training of the training data set of a small amount of samples based on the optimal meta knowledge. And an Episode-based small sample training method is realized under the framework.

In the embodiment of the present disclosure, in the Pre-train finetune-based small sample training process, the used data set is a limited data set of the initial task category, and the limited data set includes: m tasks are adopted, and each task corresponds to N samples.

In some embodiments of the present disclosure, in the pre-training-fine-tuning-based small sample training process, the first data set comprises: m tasks are adopted, and each task corresponds to N samples; m is a positive integer greater than or equal to 1, and N is a positive integer greater than or equal to 1.

It should be noted that, in the embodiment of the present disclosure, a limited data set may be D_sourceAnd (4) showing. Hypothesis data set D_sourceInfinite, a good initial model can be provided for the meta-test phase. Therefore, it is very important to improve the model accuracy to improve the number of M and N in the embodiment of the present disclosure.

In an embodiment of the present disclosure, the Pre-train finetune method includes a Pre-training phase and a fine-tuning phase. In the meta-learning framework of the present disclosure, the pre-training phase is presented by the meta-training phase and the fine-tuning phase is presented by the meta-testing phase. The implementation of the pre-training phase is mainly described below.

In the disclosed embodiment, the pre-trained optimization process may be derived based on the general frameworks (5) and (6) by specifying the limited dataset size with M and N and using D_sourceAnd (3) replacing the infinite data set D with the complete data set to obtain an optimized framework of the Pre-train fin finetune, namely obtaining the optimal meta knowledge finally obtained by the Pre-train fin finetune method.

In some embodiments of the present disclosure, the training apparatus of the neural network employs D based on the initial meta knowledge and the initial task knowledge task of each task_sourcePerforming iterative training on the initial network by using a sample of each task in the M tasks to obtain a first training result; and determining task knowledge of each task when the task loss sum corresponding to each sample corresponding to each task is determined to be minimum based on the task loss function and the first training result.

In the embodiment of the disclosure, the training device of the neural network may combine the initial meta-knowledge and the meta-loss function according to the obtained task knowledge of each task of the M tasks, and aim at D_sourceAnd respectively carrying out iterative training on the initial network by using samples corresponding to the M tasks to obtain the optimal meta-knowledge.

Illustratively, by solving for (5) and (6), D is combined_sourceThe process of deriving task knowledge is shown in equation (10), and the process of deriving the best meta knowledge is shown in equation (11).

Wherein, theta^*(i)(ω) is the task knowledge learned by the i-th task of the M tasks, L^taskAs a function of task loss, D_source(ij)For the jth sample, θ, corresponding to the ith task in the finite data setⁱIs the initial task knowledge, and ω is the initial meta knowledge.

Wherein, ω is the best element knowledge, ω is the initial element knowledge, θ^*(i)Is the task knowledge learned by the ith task in the M learned tasks, L^metaAs a function of element loss, D_source(ij)The j sample corresponding to the ith task in the limited data set.

It can be understood that the training device of the neural network can be realized under a unified small-amount sample learning framework by using a limited data set, firstly train the number distribution of samples based on each task of the M tasks to determine task knowledge, and then train the samples of all the tasks based on the task knowledge and the task distribution to obtain optimal meta knowledge, thereby realizing the subsequent training of the training data set of a small amount of samples based on the optimal meta knowledge. And implementing a Pre-train finetune-based small sample training method under the framework.

S103, performing iterative training on a network corresponding to the optimal meta-knowledge by adopting a sample of each task in a second data set based on the optimal meta-knowledge, the task knowledge of each task and a task loss function to obtain the optimal task knowledge of each task and a target network corresponding to the optimal task knowledge; the second data set is a target domain data set containing a target task category, and the sample size of each task category in the initial task category is larger than that of each task category in the target task category.

In the embodiment of the present disclosure, after the training device of the neural network completes the meta-training phase, the meta-testing phase may be performed. In the meta-test phase, two phases are included: a training phase for a small number of samples is performed, and a data reasoning phase. The data reasoning stage is a stage of testing by using a test data set or a stage of forward reasoning by using actual data. The following mainly describes the training phase for a small number of samples.

In the embodiment of the present disclosure, the training apparatus of the neural network performs a small sample training process using a second data set, where the second data set is a training data set of a small sample, and includes a target domain data set of a target task class, and a sample size of each task class in an initial task class of the first data set is greater than a sample size of each task class in the target task class.

It should be noted that the second data set can be constructed from a new data set with small scale species and samples. The test data set may also be constructed from the new data set. The new data set is constructed based on training data of a small number of samples, and is used for learning the parameter theta which is most suitable for the task^**(i). Wherein the category in the new dataset may be a new category of data that is different from the base category (initial task category).

In this disclosure, the initial task analogy corresponding to the first data set and the target task category in the second data set may be different categories or partially different categories, or may be the same category, and this disclosure is not limited in this disclosure.

During the testing process of the small quantity of samples based on the fragments and the fine-tuning process of the small quantity of samples based on the pre-training-fine-tuning, the second data set is a data set of a target task class different from the initial task class of the first data set; wherein, the second data set comprises a total number of Q tasks, and each task corresponds to a small number of samples (i.e. the sample size of each task category in the initial task category is larger than that of each task category in the target task category); q is a positive integer of 1 or more.

Exemplary, suppose D_targetIn order to be able to create a new data set,

for the second data set (i.e. the training data set),

to test the data set, the relationship between the three can then be expressed by equation (12).

Wherein Q is the total number of tasks, and i is one of the tasks in Q.

It should be noted that, the second data sets corresponding to the Q tasks may be constructed from new data sets, and the embodiments of the present disclosure are not limited.

For a specific task, the training device of the neural network may use data in the second data set, and combine the task loss function under the conditions of the optimal meta-knowledge and the parameters of the task knowledge of the specific task to implement iterative training on the network corresponding to the optimal meta-knowledge, so as to obtain the optimal task knowledge of the specific task and a target network corresponding to the optimal task knowledge.

For example, the training device of the neural network may derive the optimal task knowledge based on the task model and in combination with the optimal meta-knowledge and the task knowledge of the specific task. Wherein the task model may characterize a training or optimization process that yields optimal task knowledge for a task, as shown in equation (13).

Wherein, theta^**(i)For the best task knowledge of the ith task,

the ith task corresponds toTwo data sets.

Note that the data in the second data set is annotated data.

It can be understood that, in the training method of the neural network, the training can be realized under a unified small sample learning framework, the task knowledge is determined by training the sample number distribution based on each task, and then the samples of all the tasks in the first data set are trained based on the task knowledge and the task distribution to obtain the optimal meta knowledge, so that the subsequent training of the training data set of a small number of samples based on the optimal meta knowledge is realized, and further, the target network corresponding to the optimal task knowledge is obtained to realize the test and reasoning of the specific task. Therefore, no matter which method of the small sample school is adopted, the small sample training can be realized under the framework, and the generalization of the small sample training is improved.

In some embodiments of the present disclosure, referring to fig. 3, fig. 3 is an optional flowchart of a training method of a neural network provided in an embodiment of the present disclosure, and will be described with reference to the steps shown in fig. 3.

S201, performing iterative training on the initial network by adopting a sample of each task in the first data set based on the initial meta-knowledge, the initial task knowledge of each task and the task loss function in combination with parameter compression processing to obtain the task knowledge of each task.

S202, based on task knowledge, initial meta knowledge and a meta loss function of each task, combining parameter compression processing, and respectively performing iterative training on an initial network by adopting samples corresponding to various tasks in a first data set to obtain optimal meta knowledge.

In the embodiment of the present disclosure, the training device of the neural network may continue training by performing random disturbance (parameter compression) on the network obtained in the middle in each training process of the Meta-training phase (Meta-training phase) of the Meta-learning to remove the structure of the network obtained in the middle, and train to obtain the task knowledge and the optimal Meta-knowledge in the case of adding the Meta-random disturbance (Meta-dropout).

It should be noted that, based on the implementation of the neural network training method provided in the above embodiment, the neural network training device may perform meta-random perturbation during task knowledge training, or perform meta-random perturbation during meta knowledge training, or add meta-random perturbation during both task knowledge and meta knowledge training, which is not limited in the embodiment of the present disclosure.

Based on this, the overall goal of the small sample learning framework in the meta-training process may be: for the distribution of all the active tasks, assuming that the number of the tasks is infinite, and the number of data in the ith task is also infinite, it is expected that a meta-knowledge is learned under the condition of increasing meta-random disturbance when the loss of the data based on the tasks and various tasks is minimum, so that the meta-knowledge can have good performance on each task.

Illustratively, the possibility of Meta-dropout is increased on the basis of the goal optimization or training goal exemplified by equation (1), as shown in equation (14).

Where p (T) represents the distribution of source tasks, where T consists of an infinite number of basic tasks, p (Dt) is the distribution of a training data set Dt for task T, E represents expectation, L is a loss function, w is meta-knowledge, θ_tFor task knowledge of task t, x is a sample or data, and O (ω) represents a meta-random perturbation.

It should be noted that, based on the above overall optimization objective, it may be determined that the optimization objective in the meta-training phase may be based on a task, and in the initial state of the initial meta-knowledge, the optimal meta-knowledge is obtained through an iterative process in combination with meta-random disturbance.

Exemplarily, on the basis of the optimization process of Meta-training phase of Meta-training, namely formula (2), the possibility of Meta-dropout is increased, as shown in formula (15):

wherein ω is the best meta knowledge, ω is the initial meta knowledge, T is the task, and O (ω) represents the meta-stochastic perturbation.

In the disclosed embodiment, during the meta-training phase, the initial network or model is trained on a dataset D (corresponding to the first dataset), which represents a base class dataset containing a large number of base class samples with a sufficient number.

It should be noted that the training device of the neural network may implement training of the initial network by combining meta-stochastic perturbation with respect to the sample of each task in the first data set according to the task knowledge model, so as to obtain task knowledge of each task.

For example, the task knowledge model may be a possibility of increasing Meta-dropout based on the implementation of equation (3), as shown in equation (16):

wherein, theta^*(i)(ω) task knowledge for the ith task, D_ijFor the corresponding jth sample in data set D (corresponding to the first data set) of the ith task, L^taskAs a function of task loss, θⁱFor the initial task knowledge of the ith task, ω is the initial meta knowledge, and O (ω) represents the meta-stochastic perturbation.

It should be noted that the training device of the neural network may implement training of the initial network according to the meta-knowledge model, and according to task knowledge of all tasks and the initial meta-knowledge task, in combination with meta-random perturbation, for samples of various tasks in the first data set, thereby obtaining the optimal meta-knowledge.

For example, the Meta-knowledge model may be based on equation (4) to increase the possibility of Meta-dropout, as shown in equation (17):

where ω is the best meta-knowledge, ω is the initial meta-knowledge, p (T) represents the distribution of the source tasks, where T consists of an infinite number of basic tasks, p (Di) is the distribution of the training data set Di for task i, E represents expectation, L^metaIs a function of the element loss, theta^*(i)(ω) is task knowledge of the ith task, i is one of p (T), and O (ω) represents meta-stochastic perturbation.

Based on this, when the Episode-based small sample training process and the Pre-train fine-tune-based small sample training process are realized, the possibility of Meta-drop is increased on the basis of realization, and O (omega) is adopted to replace omega to realize the training process of task knowledge and Meta-knowledge, which is not detailed herein.

In some embodiments of the present disclosure, in the training process based on task knowledge, the training device of the neural network may train the initial network with a sample of each task in the first data set based on the initial meta-knowledge and the initial task knowledge of each task to obtain a first sub-training result; obtaining a first sub-loss and a first sub-network based on the first sub-training result and the task loss function; under the condition that the first sub-loss is larger than a preset loss threshold value, performing element-random disturbance processing on the first sub-network to obtain a first disturbance sub-network; the first perturbation sub-network is a network obtained by compressing parameters of an initial network; and continuing to adopt the sample of each task in the first data set to carry out iterative training on the first perturbation sub-network until the obtained loss is less than or equal to a preset loss threshold value, and obtaining task knowledge of each task.

In the training process based on the meta-knowledge, the training device of the neural network can train the initial network by adopting a sample of each task in the first data set based on the initial meta-knowledge and the task knowledge of each task to obtain a second sub-training result; obtaining a second sub-loss and a second sub-network based on the second sub-training result and the meta-loss function; under the condition that the second sub-loss is larger than a preset loss threshold value, performing element-random disturbance processing on a second sub-network to obtain a second disturbance sub-network; the second perturbation sub-network is a network obtained by compressing parameters of the initial network; and continuing to adopt the samples of each task in the first data set to carry out iterative training on the second perturbation sub-network until the obtained loss is less than or equal to a preset loss threshold value, and obtaining the optimal meta-knowledge.

It should be noted that Meta-dropout is to remove the network structure to some extent, increases the dynamic property of the network, and is applied in Meta-tracing stage. The training device of the model can continue the iterative training after performing the random perturbation of the elements on the network generated in the middle in the task knowledge training process, namely the first sub-network, and/or on the network generated in the middle in the element knowledge training process, namely the second sub-network.

In the embodiment of the present disclosure, the parameter compression processing manner includes at least one of the following:

discarding the network neurons corresponding to each task;

discarding the area structure of the network corresponding to each task;

and discarding the minimum structure of the network corresponding to each task.

In this embodiment of the present disclosure, Meta-dropout may discard structures in a network corresponding to each task, where a single neuron in the network corresponding to each task may be discarded by using a normal dropout (normal dropout), a spatial-dropout may also be used to discard a neuron in a certain dimension, a droplock may also be used to discard an area structure of a convolutional neural network corresponding to each task, or a minimum structure area of the convolutional neural network corresponding to each task may be discarded. Among these, in general, Meta-dropout is realized by using droplock more effectively than normal dropout, and by using normal dropout more effectively than spatial-dropout.

It should be noted that, for a network neuron, each discarding may determine that a certain network neuron needs not to be discarded with a certain probability, for example, a probability of 0.5 determines that each network neuron needs not to be discarded. For the area structure of the network, the entire area of the convolutional layer may be removed, or another area structure may be discarded or removed, which is not limited in the embodiment of the present disclosure.

In the embodiments of the present disclosure, what discarding manner is adopted to implement Meta-dropout, and at which network layer or which network structure the embodiments of the present disclosure are not limited. The details will be explained in the scenario description of the following embodiment.

S203, based on the optimal meta-knowledge, the task knowledge of each task and the task loss function, performing iterative training on the network corresponding to the optimal meta-knowledge by adopting the sample of each task in the second data set to obtain the optimal task knowledge of each task and a target network corresponding to the optimal task knowledge.

After the acquisition of the optimal meta-knowledge is achieved, the training device of the neural network can iteratively train the network corresponding to the optimal meta-knowledge on the basis of the optimal meta-knowledge, the task knowledge of each task and the task loss function on the sample of each task in the second data set to obtain the optimal task knowledge of each task, so that the target network corresponding to the optimal task knowledge is obtained on the basis of the fewest task knowledge. This is consistent with the implementation of S103 and will not be described here.

It can be understood that the training device of the neural network increases the dynamics of the network in the Meta-training process by increasing Meta-dropout in the Meta-learning process, relieves the overfitting of the network or the model to a large number of basic categories in the Meta-training stage, and improves the generalization of the model.

It should be noted that, in the embodiment of the present disclosure, when the meta-training phase is completed, the obtained optimal meta knowledge may be understood as a default network model or a default network structure applied in different scenarios, and the optimal task knowledge obtained in the meta-test phase may be understood as a parameter of an optimal network model obtained by training in a current scenario, so as to obtain a target network, and a specific task, such as a few-sample classification or a few-sample detection, may be implemented in the current scenario by using the trained target network, which is not limited in the embodiment of the present disclosure.

In the following, a few-sample detection task and a few-sample classification task are implemented in the context of few-sample detection and few-sample classification.

In a few sample detection scenario, evaluation was performed using the VOC 2007 test dataset, and training was performed using the VOC 2007 support dataset and the VOC 2012 query dataset. Assume that 5 out of its 20 object classes are selected as new classes (for meta-testing) while the remaining 15 are retained as base classes (for meta-training). The two sets of heterogeneous test data sets (split 1 and split 2) in the VOC 2007 test data set were used for inferential evaluation, for example, split 1 ("bird", "bus", "cow", "mbike", "sofa"/rest). split 2 ("aeroo," "botle," "cow," "horse," "sofa"/rest).

In the Episode-based training process of few samples, Meta R-CNN is used, wherein ResNet-101 is used as a backbone network of Faster R-CNN, and the optimization strategy follows Meta R-CNN.

Pre-train fine (Pre-training-fine) based low sample Pre-training procedure, TFA is used, which uses fast R-CNN as detector, and ResNet-101 is the backbone network.

It should be noted that in a few-sample detection scenario, dropout and normal dropout can be used to implement Meta-dropout, but higher accuracy can be achieved by using dropout.

Illustratively, Meta-dropout can be applied to layer 4 of Reset-101 in Meta R-CNN, based on the large model represented by ResNet 101. In TFA, Meta-dropout can be applied at layer 3 or layer 4 of Resnet-101. Wherein, keep prob and block size are two important hyper-parameters in drop-block, and are set to 0.9 and 7, respectively, and the embodiments of the present disclosure are not limited. Where Meta-drop is used for layer 3 or layer 4 of the backbone network, it is applied to the last convolutional layer of each bottleeck block.

TABLE 1

As shown in Table 1, Meta R-CNN with lot size 1 was obtained by using Meta-dropout based on Meta R-CNN. Application of Meta-dropout on TFA gave TFA. The method for implementing Episode's few-sample training in embodiments of the present disclosure based on Meta R-CNN and the method for implementing Pre-train finetune's few-sample training in embodiments of the present disclosure based on TFA are represented by the Our Impl. (no added-random perturbation) in Table 1.

In the Episode-based few sample training method, the few sample trained model implemented by the Meta R-CNN has a more significant improvement in accuracy, except that the mAP in the new category of 10-shot settings is comparable to the baseline. And Meta R-CNN can achieve a 10% improvement in the split 1 with 1-shot setup.

In the pre-training-fine-tuning based small sample training method TFA can achieve higher accuracy in all settings. Generally, the trained model achieves the greatest improvement in the 1-shot setting followed by the 3-shot, while the trained model achieves the least improvement in the 10-shot setting.

Therefore, the Meta-dropout is used for improving the generalization capability of the Meta-knowledge, and the model can be improved more and more under the condition of less sample detection.

In the few samples classification scenario, Caltech-UCSD Birds-200-. In the embodiment of the present disclosure, 200 classes are divided into 100 basic classes, 50 verification classes, and 50 new classes, respectively. Where mini-ImageNet, which contains 100 classes, is a subset of ImageNet, each class containing 600 images of 84X 84 size.

The 100 classes selected in the mini-ImageNet can be divided into 64 training classes (corresponding to the support data set), 16 verification classes (corresponding to the query data set) and 20 test classes (corresponding to the test data set).

In the disclosed embodiment, Baseline + + is selected as a network used in an effective few-sample classification method.

In the disclosed embodiment, Baseline + + trains epochs 200 times based on the CUB dataset, 400 times based on the mini-ImageNet dataset, and uses a four-layer convolutional network (Conv-4) as the backbone network.

As shown in Table 2, it can be seen that Baseline + +, was constructed by applying Meta-dropout to Baseline + +. When testing on the CUB dataset, a droplock of block size 7 was used, applied to the last convolutional layer of a four-layer convolutional network (Conv-4), with a batch size of 16.

When testing on the mini-ImageNet dataset, Meta-dropout was implemented using a normal dropout, applied to the flat layer (one-dimensional feature) after the last convolutional layer, and the batch size was 32.

TABLE 2

On the premise of a small network represented by Conv4, Meta-dropout is applied to the convolutional layer of the last layer and the one-dimensional feature of the last layer.

As can be seen from table 2, Baseline + + (Our Impl.) obtained by training with the neural network training method provided in the embodiment of the present disclosure is better than Baseline + +, and better than other most advanced small sample image classification algorithms (or models). The accuracy of Baseline + +, which is obtained by training with the neural network training method provided by the embodiment of the disclosure, is better than Baseline + + (Our Impl.).

In the disclosed embodiment, the 5-shot of the CUB dataset is significantly improved over the Baseline + + (Our Impl.) of Meta-dropout based on the 16 lot size of Baseline + +.

In the disclosed embodiment, Meta-dropout and dropout are applied to demonstrate the importance of improving the generalization capability of Meta-knowledge. In the case where Meta-dropout and dropout are implemented using normal dropout, application of Meta-dropout may achieve best performance in all other settings.

In the case where a normal dropout is used as the implementation Meta-dropout and applied on the last convolutional layer and the last scatter layer, the performance of applying Meta-dropout on the last one-dimensional feature is better than on the last convolutional layer, due to the influence of the specific location of applying Meta-dropout in the backbone network.

In the disclosed embodiment, Meta-dropout is implemented using droplock with different block sizes and normal dropout with different locations. Since the flatten layer is 1-dimensional, droplock can only be used for the last convolutional layer. While normal dropout may be applied to the last convolution or the last flat layer. Wherein, the best performance can be obtained by using droplock with block size of 7 to be suitable for the CUB data set. Based on the mini-ImageNet dataset, the best performance can be obtained using normal dropout on the flatten layer.

In the disclosed embodiment, using 32 as the batch size may optimize the performance of Baseline + +.

Embodiments of the present disclosure introduce a unified meta-learning framework that integrates two distinct small sample learning streams, namely Episode-based and pre-training-fine-tuning-based small sample learning. Second, a simple, versatile and efficient Meta-dropout was proposed to improve the generalization ability of Meta-knowledge in this framework and explore the significant impact of batch size on small sample learning. In the tasks of few-sample detection and few-sample image classification, the model trained by the framework provided by the embodiment of the disclosure has the performance advantage of higher precision.

Fig. 4 is a schematic diagram illustrating an alternative structural configuration of a training apparatus for a neural network according to an embodiment of the present disclosure, and as shown in fig. 4, the training apparatus 1 for a neural network includes:

the training unit 10 is configured to perform iterative training on the initial network by using a sample of each task in the first data set based on the initial meta-knowledge, the initial task knowledge of each task, and the task loss function, so as to obtain task knowledge of each task; the first dataset is a source domain dataset comprising an initial task category; based on the task knowledge, the initial meta knowledge and the meta loss function of each task, respectively performing iterative training on the initial network by adopting samples corresponding to each task in the first data set to obtain optimal meta knowledge; performing iterative training on a network corresponding to the optimal meta-knowledge by adopting a sample of each task in a second data set based on the optimal meta-knowledge, the task knowledge of each task and the task loss function to obtain the optimal task knowledge of each task and a target network corresponding to the optimal task knowledge; wherein the second data set is a target domain data set comprising target task categories, and the sample size of each task category in the initial task category is greater than the sample size of each task category in the target task category.

In some embodiments of the present disclosure, the apparatus 1 further comprises: a determination unit 11;

the training unit 10 is further configured to perform iterative training on the initial network by using a sample of each task in the first data set based on the initial meta-knowledge and the initial task knowledge of each task to obtain a first training result;

the determining unit 11 is configured to determine task knowledge of each task when determining that the sum of task losses corresponding to each sample is minimum based on the task loss function and the first training result.

the training unit 10 is further configured to perform iterative training on the initial network by using samples corresponding to various tasks in the first data set based on the task knowledge and the initial meta knowledge of each task, so as to obtain a second training result;

the determining unit 11 is configured to determine the optimal meta-knowledge when determining that a sum of the meta-losses corresponding to the various tasks is minimum based on the meta-loss function and the second training result.

In some embodiments of the present disclosure, in a training process based on a small number of samples of a segment, the first data set includes a query data set and a support data set corresponding to the segment; the support data set is used in the training process of task knowledge; the query data set is used in the training process of the meta-knowledge; wherein a segment characterizes a task;

In some embodiments of the present disclosure, in a pre-training process based on a pre-training-fine tuned small number of samples, the first data set comprises: m tasks are adopted, and each task corresponds to N samples; m is a positive integer greater than or equal to 1, and N is a positive integer greater than or equal to 1;

In some embodiments of the present disclosure, the training unit 10 is further configured to perform iterative training on the initial network by using a sample of each task in the first data set based on the initial meta-knowledge, the initial task knowledge of each task, and a task loss function in combination with parameter compression processing, so as to obtain the task knowledge of each task.

In some embodiments of the present disclosure, the apparatus 1 further comprises: an acquisition unit 12 and a parameter compression unit 13;

the training unit 10 is further configured to train the initial network by using a sample of each task in a first data set based on the initial meta-knowledge and the initial task knowledge of each task to obtain a first sub-training result;

the obtaining unit 12 is further configured to obtain a first sub-loss and a first sub-network based on the first sub-training result and the task loss function;

the parameter compression unit 13 is configured to perform meta-random perturbation processing on the first sub-network to obtain a first perturbation sub-network when the first sub-loss is greater than a preset loss threshold; the first perturbation sub-network is a network obtained by compressing parameters of an initial network;

the training unit 10 is further configured to continue to perform iterative training on the first perturbation sub-network by using a sample of each task in the first data set until the obtained loss is less than or equal to the preset loss threshold, so as to obtain task knowledge of each task.

In some embodiments of the present disclosure, the training unit 10 is further configured to perform iterative training on the initial network by using samples corresponding to various tasks in the first data set to obtain the optimal meta-knowledge, based on the task knowledge of each task, the initial meta-knowledge, and the meta-loss function, in combination with parameter compression processing.

the training unit 10 is further configured to train the initial network by using a sample of each task in the first data set based on the initial meta-knowledge and the task knowledge of each task to obtain a second sub-training result;

the obtaining unit 12 is configured to obtain a second sub-loss and a second sub-network based on the second sub-training result and the meta-loss function;

the parameter compression unit 13 is configured to perform meta-random perturbation processing on the second sub-network to obtain a second perturbation sub-network when the second sub-loss is greater than a preset loss threshold; the second perturbation sub-network is a network obtained by compressing parameters of an initial network;

the training unit 10 is further configured to continue to perform iterative training on the second perturbation sub-network by using the sample of each task in the first data set until the obtained loss is less than or equal to the preset loss threshold, so as to obtain the optimal meta-knowledge.

In some embodiments of the present disclosure, the manner of the parameter compression process includes at least one of:

discarding the network neurons corresponding to each task;

discarding the area structure of the network corresponding to each task;

Fig. 5 is a schematic structural composition diagram of a training apparatus of a neural network according to an embodiment of the present disclosure, and as shown in fig. 5, the training apparatus of the neural network includes:

a memory 14 for storing a computer program;

the processor 15 is configured to implement the above-mentioned training method for the neural network when executing the computer program stored in the memory 14.

It will be appreciated that the training means of the neural network also comprise a bus system 16; the various components in the training apparatus of the neural network are coupled together by a bus system 16. It will be appreciated that the bus system 16 is used to enable communications among the components. The bus system 16 includes a power bus, a control bus, and a status signal bus in addition to a data bus.

The Memory 14 is configured to store computer programs and applications that are executed by the processor 15, and may also buffer data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or already processed by the processor and modules in the target detection device, and may be implemented by a FLASH Memory (FLASH) or a Random Access Memory (RAM).

The processor 15, when executing the program, implements the steps of any of the above neural network training methods. The processor 15 generally controls the overall operation of the training apparatus of the neural network.

The Processor may be at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a Central Processing Unit (CPU), a controller, a microcontroller, and a microprocessor. It is understood that the electronic device implementing the above processor function may be other, and the embodiments of the present disclosure are not limited.

The computer-readable storage medium/Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a magnetic Random Access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical Disc, or a Compact Disc Read-Only Memory (CD-ROM), and the like; but may also be various terminals such as mobile phones, computers, tablet devices, personal digital assistants, etc., that include one or any combination of the above-mentioned memories.

Embodiments of the present disclosure provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the training method of the neural network according to the embodiment of the disclosure.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

Here, it should be noted that: the above description of the storage medium and device embodiments is similar to the description of the method embodiments above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the storage medium and apparatus of the present disclosure, reference is made to the description of the embodiments of the method of the present disclosure.

The above description is only an example of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present disclosure are included in the protection scope of the present disclosure.

Claims

1. A method of training a neural network, the method comprising:

performing iterative training on an initial network by adopting a sample of each task in a first data set based on initial meta-knowledge, initial task knowledge of each task and a task loss function to obtain the task knowledge of each task; wherein the first data set is a source domain data set comprising an initial task category;

performing iterative training on a network corresponding to the optimal meta-knowledge by adopting a sample of each task in a second data set based on the optimal meta-knowledge, the task knowledge of each task and the task loss function to obtain the optimal task knowledge of each task and a target network corresponding to the optimal task knowledge; wherein the second data set is a target domain data set comprising target task categories, and the sample size of each task category in the initial task category is greater than the sample size of each task category in the target task category.

2. The method of claim 1, wherein iteratively training the initial network with samples of each task in the first data set based on the initial meta-knowledge, the initial task knowledge of each task, and a task loss function to obtain the task knowledge of each task comprises:

3. The method of claim 1, wherein the iteratively training the initial network using samples corresponding to various tasks in the first data set based on the task knowledge of each task, the initial meta knowledge, and a meta-loss function to obtain an optimal meta knowledge comprises:

4. The method according to any one of claims 1 to 3,

in a training process based on a small number of samples of the fragments, the first data set comprises a query data set and a support data set corresponding to the fragments; the support data set is used in the training process of task knowledge; the query data set is used in the training process of the meta-knowledge; wherein a segment characterizes a task;

5. The method according to any one of claims 1 to 3,

in a pre-training process based on a pre-training-fine-tuned small number of samples, the first data set comprises: m tasks are adopted, and each task corresponds to N samples; m is a positive integer greater than or equal to 1, and N is a positive integer greater than or equal to 1;

6. The method of any one of claims 1 to 5, wherein iteratively training the initial network with samples of each task in the first data set based on the initial meta-knowledge, the initial task knowledge of each task, and a task loss function to obtain the task knowledge of each task comprises:

7. The method of claim 6, wherein the iteratively training the initial network using the samples of each task in the first data set based on the initial meta-knowledge, the initial task knowledge of each task, and the task loss function in conjunction with a parameter compression process to obtain the task knowledge of each task comprises:

8. The method according to any one of claims 1 to 6, wherein the iteratively training the initial network using samples corresponding to various tasks in the first data set based on the task knowledge, the initial meta knowledge and the meta-loss function of each task to obtain an optimal meta knowledge comprises:

9. The method of claim 8, wherein the iteratively training the initial network using samples corresponding to various tasks in the first data set to obtain the optimal meta-knowledge based on the task knowledge of each task, the initial meta-knowledge, and a meta-loss function in combination with a parameter compression process comprises:

training the initial network by adopting a sample of each task in the first data set based on the initial meta-knowledge and the task knowledge of each task to obtain a second sub-training result;

10. The method according to any one of claims 6 to 9, wherein the parameter compression process comprises at least one of:

discarding the network neurons corresponding to each task;

discarding the area structure of the network corresponding to each task;

11. An apparatus for training a neural network, the apparatus comprising:

12. An apparatus for training a neural network, comprising:

a memory for storing a computer program;

a processor for implementing the method of any one of claims 1 to 10 when executing the computer program stored in the memory.

13. A computer-readable storage medium, having stored thereon a computer program for implementing the method of any one of claims 1 to 10 when executed by a processor.