WO2023040147A1 - 神经网络的训练方法及装置、存储介质和计算机程序 - Google Patents

神经网络的训练方法及装置、存储介质和计算机程序 Download PDF

Info

Publication number
WO2023040147A1
WO2023040147A1 PCT/CN2022/071370 CN2022071370W WO2023040147A1 WO 2023040147 A1 WO2023040147 A1 WO 2023040147A1 CN 2022071370 W CN2022071370 W CN 2022071370W WO 2023040147 A1 WO2023040147 A1 WO 2023040147A1
Authority
WO
WIPO (PCT)
Prior art keywords
task
knowledge
meta
data set
initial
Prior art date
Application number
PCT/CN2022/071370
Other languages
English (en)
French (fr)
Inventor
林少波
曾星宇
陈大鹏
赵瑞
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023040147A1 publication Critical patent/WO2023040147A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present disclosure relates to computer vision technology, and in particular to a neural network training method and device, a storage medium and a computer program.
  • CNN-based object detectors have achieved great success due to the rapid development of Convolutional Neural Networks (CNNs).
  • CNNs Convolutional Neural Networks
  • main few-sample learning methods include: a meta-learning model based on Episode training method and a transfer learning method based on Pre-train finetune.
  • Embodiments of the present disclosure provide a neural network training method and device, a storage medium, and a computer program, which improve the generalization of training with a small number of samples.
  • An embodiment of the present disclosure provides a neural network training method, the method comprising:
  • the initial network is iteratively trained using the samples of each task in the first data set, and the task knowledge of each task is obtained;
  • the first A data set is a source domain data set including initial task categories;
  • the initial meta-knowledge Based on the task knowledge of each task, the initial meta-knowledge, and a meta-loss function, iteratively train the initial network using samples corresponding to various tasks in the first data set to obtain optimal meta-knowledge;
  • the network corresponding to the optimal meta-knowledge is iteratively trained using the samples of each task in the second data set, and each of the tasks is obtained.
  • the sample size of the category is larger than the sample size of each task category in the target task category.
  • An embodiment of the present disclosure provides a neural network training device, the device comprising:
  • the training part is configured to iteratively train the initial network using samples of each task in the first data set based on the initial meta-knowledge, the initial task knowledge of each task, and the task loss function to obtain the task knowledge of each task ;
  • the first data set is a source domain data set containing the initial task category; based on the task knowledge of each task, the initial meta-knowledge and the meta-loss function, the various tasks in the first data set are used to correspond to
  • the samples of the initial network are respectively iteratively trained to obtain the best meta-knowledge; based on the best meta-knowledge, the task knowledge of each task and the task loss function, each task in the second data set is used Iteratively train the network corresponding to the best meta-knowledge of the sample to obtain the best task knowledge of each task, and the target network corresponding to the best task knowledge; wherein, the second data set contains the target The target domain data set of the task category, and the sample size of each task category in the initial task category is greater than the sample size of each task category in
  • An embodiment of the present disclosure provides a neural network training device, including:
  • the processor is configured to implement the above neural network training method when executing the computer program stored in the memory.
  • An embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, and is used to implement the above neural network training method when executed by a processor.
  • An embodiment of the present disclosure provides a computer program, including computer readable codes.
  • a processor in the electronic device executes to implement the above neural network training method.
  • the neural network training method and device, storage medium, and computer program provided by the embodiments of the present disclosure use the sample pairs of each task in the first data set based on the initial meta-knowledge, the initial task knowledge of each task, and the task loss function.
  • the initial network performs iterative training to obtain the task knowledge of each task; the first data set is the source domain data set containing the initial task category; based on the task knowledge, initial meta-knowledge and meta-loss function of each task, the first data set is used Samples corresponding to various tasks in the initial network are iteratively trained to obtain the best meta-knowledge; based on the best meta-knowledge, the task knowledge of each task and the task loss function, the sample pair of each task in the second data set is used The network corresponding to the best meta-knowledge is iteratively trained to obtain the best task knowledge for each task, and the target network corresponding to the best task knowledge; wherein, the second data set is a target domain data set containing the target task category, and The sample size of each task category in
  • the task knowledge is determined by first training based on the distribution of the number of samples under each task, and then based on the task knowledge and Task distribution, training samples of all kinds of various tasks in the first data set to obtain the best meta-knowledge, so as to realize the subsequent training of a small number of samples of training data sets based on the best meta-knowledge, and then obtain the best meta-knowledge.
  • Target networks against task knowledge to enable task-specific testing and inference. In this way, no matter which few-sample schooling method is adopted, few-sample training can be realized under this framework, thereby improving the generalization of few-sample training.
  • FIG. 1 is a schematic diagram of an exemplary target task training method provided by an embodiment of the present disclosure
  • FIG. 2 is a flowchart of an optional neural network training method provided by an embodiment of the present disclosure
  • FIG. 3 is a flowchart of another optional neural network training method provided by an embodiment of the present disclosure.
  • FIG. 4 is a schematic structural diagram of an optional neural network training device provided by an embodiment of the present disclosure.
  • FIG. 5 is a schematic structural diagram of another optional neural network training device provided by an embodiment of the present disclosure.
  • first ⁇ second ⁇ third is only used to distinguish similar objects, and does not represent a specific ordering of objects. Understandably, “first ⁇ second ⁇ third” Where permitted, the specific order or sequencing may be interchanged such that the embodiments of the disclosure described herein can be practiced in sequences other than those illustrated or described herein.
  • DNN deep neural network
  • the few-sample learning method includes: a meta-learning model based on an episode training method and a transfer learning method based on pre-training + fine-tuning (Pre-train finetune).
  • the meta-learning method includes two stages of meta-training and meta-testing.
  • the meta-training stage is to train the model on a large amount of basic category data.
  • Meta-testing is responsible for introducing a small number of categories on the basis of the basic categories. samples for further training.
  • the above two stages adopt the training method based on Episode.
  • An Episode refers to randomly obtaining a small N-way, K-shot training set (N categories, K samples for each category), and randomly selecting a certain number of samples on the corresponding category as a small test set, while the entire The training process consists of many Episodes.
  • the pre-train finetune method is a common domain transfer method that can be applied to the field of few-shot learning.
  • the pre-training phase refers to basic training on a large number of basic categories to provide a good initialization model
  • fine-tuning refers to fine-tuning parameters on few-sample tasks and transferring parameters to corresponding tasks.
  • different source tasks (1, 2, ..., N) perform task knowledge and transferable knowledge through generalization to obtain target tasks.
  • the source task 1 is used to identify flowers
  • the source task 2 is used to identify dogs
  • the source task N is used to identify cups
  • the target task is used to identify whether it is a boat
  • the generalization ability of different sources is learned by small samples key, where transferable knowledge is adapted to the target task.
  • the above two neural network training methods are implemented based on a meta-learning framework.
  • a unified mathematical framework is constructed to explain two mainstream few-sample learning methods based on Episode and Pre-train finetune.
  • the above framework consists of a new refactoring meta-training phase and a meta-testing phase.
  • this framework jointly considers the distribution of tasks and the distribution of data, highlighting the common elements of few-shot learning, including meta-knowledge, task knowledge, meta-loss, task loss, and the distribution of datasets, in order to introduce effective Strategies to improve the overall generalization ability of the model.
  • CNNs convolutional neural networks
  • the embodiments of the present disclosure also propose a simple and general meta-learning strategy (Meta-dropout) to alleviate the over-fitting of the basic categories, thereby improving the generalization of the model sex.
  • Meta-dropout can improve the model accuracy in a small number of sample target detection and image classification tasks, and has strong versatility.
  • Meta-dropout metal-random perturbation
  • the model shows great advantages over current few-shot learning methods on few-shot object detection and few-shot image classification tasks.
  • Embodiments of the present disclosure provide a neural network training method and device, a storage medium, and a computer program, which can improve the generalization of training with a small number of samples and prevent overfitting problems.
  • the following describes the exemplary application of the neural network training device provided by the embodiment of the present disclosure.
  • the neural network training device provided by the embodiment of the present disclosure can be an electronic device, such as a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device, etc.
  • Various types of user terminals may also be implemented as servers, which are not limited in this embodiment of the present disclosure.
  • the server can be an independent physical server, or a server cluster or a distributed system composed of multiple physical servers, and can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network Cloud servers for basic cloud computing services such as cloud services, cloud communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms.
  • FIG. 2 is a schematic flowchart of an optional neural network training method provided by an embodiment of the present disclosure, which will be described in conjunction with the steps shown in FIG. 2 .
  • the initial network is iteratively trained using the samples of each task in the first data set to obtain the task knowledge of each task; the first data set is A source domain dataset containing the initial task categories.
  • initial meta-knowledge and meta-loss function of each task iteratively train the initial network using samples corresponding to various tasks in the first data set to obtain optimal meta-knowledge.
  • a neural network training method is implemented based on a meta-learning framework.
  • a meta-learning framework is constructed to explain two mainstream few-sample learning methods based on Episode and Pre-train finetune.
  • the above framework consists of a new refactored meta-training phase (meta-training phase) and a meta-testing phase (meta-testing phase).
  • meta-learning aims to learn general learning algorithms that can generalize across tasks and enable each new task to learn better.
  • the overall goal of the few-shot learning framework in the meta-training process may be: for the distribution of all source tasks (corresponding to the initial task category), assuming that the number of tasks is infinite, each task uses the i-th When the amount of data in the i-th task is also infinite, it is expected that when the loss of task-based and various task data is the smallest, a meta-knowledge can be learned, which makes it possible to have a lot of knowledge on each task. good performance.
  • p(T) represents the distribution of the source task
  • T consists of an infinite number of basic tasks
  • p(Dt) is the distribution of the training data set Dt for task t
  • E represents the expectation
  • L is the loss function
  • w is the meta-knowledge
  • ⁇ t is the task knowledge of task t
  • x is the sample or data.
  • w is used to represent meta-knowledge, which can be transferred between different tasks.
  • Meta-knowledge is a broad concept, which can be many components in the learning process, such as parameter initialization, black-box model and network structure, etc., which is not limited in the embodiments of the present disclosure.
  • ⁇ or ⁇ t is task-specific task knowledge.
  • the optimization goal in the meta-training stage can be based on the initial task, and in the initial state of the initial meta-knowledge, the best meta-knowledge can be obtained through an iterative process.
  • ⁇ * is the optimal meta-knowledge
  • is the initial meta-knowledge
  • T is the task.
  • ⁇ * needs to be learned based on tasks and data.
  • the task-related task knowledge should be learned first, and then based on The learned task knowledge is used to optimize the initial meta-knowledge, and finally the best meta-knowledge is obtained.
  • the neural network training device can use the samples of each task in the first data set to iteratively train the initial network based on the initial meta-knowledge and the initial task knowledge of each task, and then based on the task loss function, Determine the loss, based on the iterative update of the loss, to obtain the task knowledge of each task.
  • the first data set is a source domain data set containing initial task categories.
  • the neural network training device uses the samples of each task in the first data set to iteratively train the initial network based on the initial meta-knowledge and the initial task knowledge of each task, and obtains the first training result ; Based on the task loss function and the first training result, when the sum of the task losses corresponding to each sample is determined to be minimum, determine the task knowledge of each task.
  • the initial task category may be a basic category, which is not limited in this embodiment of the present disclosure.
  • the initial network or model is trained on the data set D (corresponding to the first data set), D represents the source domain data set, which contains a large number of basic categories with a sufficient number sample.
  • the training device of the neural network can realize the training of the initial network for the samples of each task in the first data set, so as to obtain the task knowledge of each task.
  • the task knowledge model can be shown as formula (3):
  • ⁇ *(i) ( ⁇ ) is the task knowledge of the i-th task
  • D ij is the j-th sample in the data set D (corresponding to the first data set) corresponding to the i-th task
  • L task is Task loss function
  • ⁇ i is the initial task knowledge of the i-th task
  • is the initial meta-knowledge.
  • the neural network training device is based on the principle of formula (3), and in the case of the initial meta-knowledge of a certain task and the initial task knowledge task of this task, each of the tasks in the first data set is used.
  • Each sample is trained on the initial network, and the first training result corresponding to each sample is obtained.
  • the loss of each sample is determined, and the corresponding task
  • the sum of the losses of all samples is added to obtain the current task loss (that is, the sum of the losses of the samples corresponding to each task), and the next round of task knowledge learning is continued until the task loss is determined to be the smallest.
  • Task knowledge for the task is consistent.
  • the training device of the neural network can perform iterative training on the initial network respectively for the samples corresponding to various tasks in the first data set according to the obtained task knowledge of each task, combined with the initial meta-knowledge, to obtain the optimal Jiayuan knowledge.
  • the neural network training device can use the samples corresponding to various tasks in the first data set to iteratively train the initial network based on the task knowledge and initial meta-knowledge of each task, and obtain the first Two training results: based on the meta-loss function and the second training result, when the sum of the meta-losses corresponding to various tasks is determined to be the smallest, determine the best meta-knowledge.
  • the training device of the neural network can perform iterative training on the initial network for the samples corresponding to various tasks in the first data set according to the obtained task knowledge of each task, combined with the initial meta-knowledge.
  • the meta-loss of the task so as to obtain the sum of the meta-loss of all tasks, continue iterative training, obtain the sum of the meta-loss of each training, and take the meta-knowledge determined when the sum of the meta-loss corresponding to various tasks is the minimum as the best meta-knowledge Knowledge.
  • the training device of the neural network can realize the training of the initial network for the samples of various tasks in the first data set according to the meta-knowledge model, based on the task knowledge of all tasks and the initial meta-knowledge task, So as to get the best meta-knowledge.
  • meta-knowledge model can be shown as formula (4):
  • ⁇ * is the optimal meta-knowledge
  • is the initial meta-knowledge
  • p(T) represents the distribution of source tasks
  • T consists of an infinite number of basic tasks
  • p(Di) is the distribution of training dataset Di for task i
  • E represents the expectation
  • L meta is the meta-loss function
  • ⁇ *(i) ( ⁇ ) is the task knowledge of the i-th task
  • i is a task in p(T).
  • the task knowledge model and the meta-knowledge model may also be expressed in the form of integrals, which is not limited in the embodiments of the present disclosure.
  • is the initial meta-knowledge and ⁇ * is the best meta-knowledge learned during task iterations.
  • ⁇ i represents the task knowledge
  • ⁇ *(i) is the learned task knowledge, which is used to optimize ⁇ to get the best ⁇ *.
  • L task is used to optimize task-specific information, called task-loss is an internal goal, indicating which model or network is suitable for a specific task;
  • L meta is used to generate the best meta-knowledge, L meta is meta-loss, is the external target, indicating which w is better.
  • the meta-training phase has two optimization phases, including task optimization and meta-optimization, and the meta-optimization is performed depending on the result of task optimization.
  • iterative training can be performed one by one, and each iteration will obtain a task knowledge until the task knowledge with the minimum sum of task losses is obtained So far, it will go through the process of ⁇ 0 ⁇ 1 ⁇ 2 ⁇ ... ⁇ *.
  • the neural network training device can implement the episode-based few-shot training and the pre-train finetune (pre-training-fine-tuning) few-shot training process as described above.
  • the data set used by the neural network training device includes a support data set and a query data set, and the support data set and the query data set are used as the first data set to realize the element of the Episode-based few-shot training method.
  • the training phase and the meta-testing phase may be consistent with the meta-testing phase described later, which is not limited in this embodiment of the present disclosure.
  • the following mainly introduces the meta-training phase of the Episode-based few-shot training method.
  • both the supporting data set and the querying data set are constructed based on the data set containing the basic category (ie, the initial task category).
  • the first data set in the training process of a small number of samples based on fragments, includes a query data set and a support data set corresponding to the fragment; the support data set is used in the training process of task knowledge; the query data The set is used in the training process of meta-knowledge.
  • a segment represents a task.
  • the query data set includes M types of tasks, and each type of task corresponds to N samples;
  • the support data set includes H types of tasks, and each type of task corresponds to K samples;
  • the types or types of tasks in the support dataset are different from those in the query dataset, and each task corresponds to multiple samples.
  • the numbers of the M types of tasks and the H types of tasks may be the same or different, which is not limited in this embodiment of the present disclosure.
  • the numbers of N samples and K samples may be the same or different, which is not limited in this embodiment of the present disclosure.
  • M and H represent the number of limited tasks
  • N and K represent the number of samples under their respective tasks.
  • some M types of samples can be randomly selected, and H types of samples different from the supporting data set are selected from the remaining data to construct a query data set.
  • the support data set and the query data set both contain M kinds of tasks (the respective tasks corresponding to the support data set and the query data set are different), and each task corresponds to N samples
  • the support data set and query data set can be shown by formula (7):
  • D source is a new data set
  • query dataset is the query dataset
  • M is the respective number of tasks in the support dataset and the query dataset.
  • the training device of the neural network uses the supporting data set in the training process of task knowledge; the query data set is used in the training process of meta-knowledge, and implements an Episode-based few-sample training method.
  • ⁇ * (i) ( ⁇ ) is the task knowledge learned by the i-th task in M tasks (the number of tasks in the supporting data set), L task is the task loss function, To support the j-th sample corresponding to the i-th task in the dataset, ⁇ i is the initial task knowledge, and ⁇ is the initial meta-knowledge.
  • the neural network training device can use the samples supporting each task in the data set to iteratively train the initial network based on the initial meta-knowledge and the initial task knowledge task of each task, and then based on The task loss function determines the loss, based on the iterative update of the loss, and obtains the task knowledge of each task.
  • the supporting data set is a limited data set containing samples of basic categories.
  • the training device of the neural network is based on the initial meta-knowledge and the initial task knowledge task of each task, and uses the samples supporting each task in the data set to iteratively train the initial network to obtain the first training result ; Based on the task loss function and the first training result, when the sum of the task losses corresponding to each sample is determined to be minimum, determine the task knowledge of each task.
  • the training device of the neural network can iteratively train the initial network for samples corresponding to all types of tasks in the query data set according to the obtained task knowledge of each task, combined with the initial meta-knowledge, to obtain the optimal Jiayuan knowledge.
  • the neural network training device can use the samples corresponding to various tasks (M types of tasks) in the query data set to iterate the initial network based on the task knowledge and initial meta-knowledge of each task. training to obtain a second training result; based on the meta-loss function and the second training result, when the sum of meta-losses corresponding to various tasks is determined to be minimum, determine the best meta-knowledge.
  • the training device of the neural network can perform iterative training on the initial network for the samples corresponding to various tasks in the query data set according to the obtained task knowledge of each task, combined with the initial meta-knowledge, and various tasks
  • the meta-loss of each task so as to obtain the sum of meta-losses of all tasks, continue iterative training, obtain the sum of meta-losses of each training, and determine the meta-knowledge when the sum of meta-losses corresponding to various tasks is minimized as the best meta-knowledge .
  • ⁇ * is the optimal meta-knowledge
  • is the initial meta-knowledge
  • ⁇ *(i) is the task knowledge learned by the i-th task among the learned M tasks (the number of tasks in the query data set)
  • L meta is meta loss function
  • j-th sample corresponding to the i-th task in the query dataset is the optimal meta-knowledge
  • ⁇ *(i) is the task knowledge learned by the i-th task among the learned M tasks (the number of tasks in the query data set)
  • L meta is meta loss function
  • the training device of the neural network can be realized under a unified learning framework of a small number of samples by using the support data set and the query data set. , to determine the task knowledge, and then based on the task knowledge and task distribution, train the samples of all tasks in the query data set to obtain the best meta-knowledge, so as to realize the subsequent training data set with a small number of samples based on the best meta-knowledge trained. Under this framework, an Episode-based few-shot training method is implemented.
  • the data set used is a data set of limited initial task categories.
  • the limited data set includes: M kinds of tasks, and each task corresponds to N samples.
  • the first data set includes: M kinds of tasks, and each task corresponds to N samples; M is a positive integer greater than or equal to 1 , N is a positive integer greater than or equal to 1.
  • the limited data set may be represented by D source .
  • D source Assuming that the data set D source is infinitely large, it can provide a good initial model for the meta-test phase. Therefore, increasing the numbers of M and N in the embodiments of the present disclosure is very important for improving the accuracy of the model.
  • the Pre-train finetune method includes a pre-training stage and a fine-tuning stage.
  • the pre-training phase is represented by the meta-training phase
  • the fine-tuning phase is represented by the meta-testing phase. The following mainly introduces the implementation of the pre-training phase.
  • the pre-training optimization process can be derived based on the general frameworks (5) and (6), by using M and N to specify a limited data set size and using D source complete data set to replace the infinite data set D, the optimization framework of Pre-train finetune can be obtained, that is, the best meta-knowledge finally obtained by the Pre-train finetune method can be obtained.
  • the training device of the neural network is based on the initial meta-knowledge and the initial task knowledge task of each task, and iteratively trains the initial network using the samples of each task in the M tasks in the D source , and obtains The first training result: based on the task loss function and the first training result, when the sum of the task losses corresponding to each sample corresponding to each task is determined to be minimum, determine the task knowledge of each task.
  • the training device of the neural network can be based on the obtained task knowledge of each of the M tasks, combined with the initial meta-knowledge and the meta-loss function, for the samples corresponding to the M tasks in the D source .
  • the initial network is iteratively trained to obtain the best meta-knowledge.
  • ⁇ *(i) ( ⁇ ) is the task knowledge learned by the i-th task in the M tasks
  • L task is the task loss function
  • D source(ij) is the i-th task corresponding to the limited data set.
  • j samples
  • ⁇ i is the initial task knowledge
  • is the initial meta-knowledge.
  • ⁇ * is the optimal meta-knowledge
  • is the initial meta-knowledge
  • ⁇ *(i) is the task knowledge learned by the i-th task among the M tasks learned
  • L meta is the meta-loss function
  • D source(ij ) is the j-th sample corresponding to the i-th task in the finite data set.
  • the training device of the neural network can be realized under a unified learning framework of a small number of samples by using a limited data set, and the training of the distribution of the number of samples based on each of the M tasks is carried out first.
  • the determination of knowledge, and then based on the task knowledge and task distribution, the samples of all tasks are trained to obtain the best meta-knowledge, so as to realize the subsequent training of a training data set with a small number of samples based on the best meta-knowledge.
  • a few-sample training method based on Pre-train finetune is implemented.
  • the meta-testing phase can be performed.
  • the training phase with a small number of samples there are two phases: the training phase with a small number of samples, and the data inference phase.
  • the data reasoning stage is the stage of using the test data set for testing, or the stage of using actual data for forward reasoning. The following mainly introduces the training phase with a small number of samples.
  • the training device of the neural network uses the second data set to perform a small number of sample training process, wherein the second data set is a training data set of a small number of samples, which includes the target domain data set of the target task category, and is the same as
  • the sample size of each task category in the initial task category of the first data set is greater than the sample size of each task category in the target task category.
  • the second data set can be constructed from a new data set with small-scale species and samples.
  • a test dataset can also be constructed from this new dataset.
  • the new dataset is constructed from a few-shot training data to learn the parameters ⁇ **(i) most suitable for the task.
  • the category in the new data set may be data of a new category different from the basic category (initial task category).
  • the initial task analogy corresponding to the first data set and the target task category in the second data set may be different or partially different categories, or may be the same category, which is not limited in this embodiment of the disclosure.
  • the second data set is a data set of a target task category different from the initial task category of the first data set;
  • the total number of Q tasks in the second data set, and each task corresponds to a small number of samples (that is, the sample size of each task category in the initial task category is greater than the sample size of each task category in the target task category);
  • Q is A positive integer greater than or equal to 1.
  • D target is a new data set
  • second data set i.e. training data set
  • the relationship between the three can be expressed by formula (12).
  • Q is the total number of tasks, and i is a task in Q.
  • the respective second data sets corresponding to the Q types of tasks can be constructed from new data sets, which is not limited in this embodiment of the present disclosure.
  • the training device of the neural network can use the data in the second data set, in the case of the best meta-knowledge and the parameters of the task knowledge of the specific task, combined with the task loss function, to achieve the best meta-knowledge correspondence
  • the network is iteratively trained to obtain the best task knowledge for the specific task, and the target network corresponding to the best task knowledge.
  • the neural network training device can obtain the optimal task knowledge based on the task model and by combining the optimal meta-knowledge and the task knowledge of the specific task.
  • the task model can represent the training or optimization process of obtaining the best task knowledge of a task, as shown in formula (13).
  • ⁇ **(i) is the best task knowledge of the i-th task, is the second data set corresponding to the i-th task.
  • the data in the second data set is data with annotations.
  • the task knowledge is determined by first training the distribution of samples based on each task, and then based on the task knowledge and task distribution, train the samples of all tasks in the first data set to obtain the best meta-knowledge, so as to realize the subsequent training of the training data set with a small number of samples based on the best meta-knowledge, and then obtain the best task knowledge corresponding to target network for task-specific testing and inference.
  • few-sample schooling method is adopted, few-sample training can be realized under this framework, thereby improving the generalization of few-sample training.
  • FIG. 3 is an optional flowchart of a neural network training method provided in an embodiment of the present disclosure, which will be described in conjunction with the steps shown in FIG. 3 .
  • the initial network is iteratively trained using samples of each task in the first data set to obtain the task knowledge of each task.
  • initial meta-knowledge and meta-loss function of each task combined with parameter compression processing, iteratively trains the initial network using samples corresponding to various tasks in the first data set to obtain optimal meta-knowledge.
  • the training device of the neural network can randomly perturb (parameter compression) the network obtained in the middle during each training process of the meta-training stage (meta-training stage) of the meta-learning, and remove the intermediate In the case of the obtained network structure, continue training, and in the case of adding meta-random perturbation (Meta-dropout), the training obtains task knowledge and optimal meta-knowledge.
  • the parameter compression in the embodiments of the present disclosure can be realized by meta-random perturbation.
  • the neural network training device can be used in the task knowledge training Meta-random perturbation can be performed during the process, meta-random perturbation can also be performed during the training process of meta-knowledge, and meta-random perturbation can also be added during the training process of task knowledge and meta-knowledge at the same time, which is not done in the embodiments of the present disclosure. limited.
  • the overall goal of the few-sample learning framework in the meta-training process can be: for the distribution of all source tasks, assuming that the number of tasks is infinite and the number of data in the i-th task is also infinite, it is expected that task-based When the loss of data and various task data is minimized, a meta-knowledge is learned in the case of adding meta-random perturbation, which enables good performance on each task.
  • p(T) represents the distribution of the source task
  • T consists of an infinite number of basic tasks
  • p(Dt) is the distribution of the training data set Dt for task t
  • E represents the expectation
  • L is the loss function
  • w is the meta-knowledge
  • ⁇ t is the task knowledge of task t
  • x is the sample or data
  • O( ⁇ ) represents meta-random perturbation.
  • the optimization objective in the meta-training stage can be task-based.
  • the optimal meta-knowledge can be obtained through an iterative process.
  • ⁇ * is the optimal meta-knowledge
  • is the initial meta-knowledge
  • T is the task
  • O( ⁇ ) is the meta-random perturbation
  • the initial network or model is trained on the data set D (corresponding to the first data set), D represents the basic category data set, which contains a large number of basic categories with a sufficient number sample.
  • the training device of the neural network can realize the training of the initial network for the samples of each task in the first data set combined with meta-random perturbation, so as to obtain the task knowledge of each task.
  • the task knowledge model can increase the possibility of Meta-dropout based on the realization of formula (3), as shown in formula (16):
  • ⁇ *(i) ( ⁇ ) is the task knowledge of the i-th task
  • D ij is the j-th sample in the data set D (corresponding to the first data set) corresponding to the i-th task
  • L task is Task loss function
  • ⁇ i is the initial task knowledge of the i-th task
  • is the initial meta-knowledge
  • O( ⁇ ) represents the meta-random perturbation.
  • the neural network training device can be based on the meta-knowledge model, based on the task knowledge of all tasks and the initial meta-knowledge task, combined with meta-random perturbation, for the samples of various tasks in the first data set, realize The initial network is trained to obtain optimal meta-knowledge.
  • meta-knowledge model can increase the possibility of Meta-dropout on the basis of formula (4), as shown in formula (17):
  • ⁇ * is the optimal meta-knowledge
  • is the initial meta-knowledge
  • p(T) represents the distribution of source tasks
  • T consists of an infinite number of basic tasks
  • p(Di) is the distribution of training dataset Di for task i
  • E represents the expectation
  • L meta is the meta-loss function
  • ⁇ *(i) ( ⁇ ) is the task knowledge of the i-th task
  • i is a task in p(T)
  • O( ⁇ ) represents the meta-random perturbation .
  • the neural network training device in the training process based on task knowledge, can use the sample pairs of each task in the first data set to initially The network is trained to obtain the first sub-training result; based on the first sub-training result and the task loss function, the first sub-loss and the first sub-network are obtained; when the first sub-loss is greater than the preset loss threshold, the first sub-loss The sub-network is subjected to meta-random perturbation processing to obtain the first perturbation sub-network; the first perturbation sub-network is the network after parameter compression of the initial network; continue to use the samples of each task in the first data set to perform Iterative training until the obtained loss is less than or equal to the preset loss threshold, and the task knowledge of each task is obtained.
  • the training device of the neural network can use the samples of each task in the first data set to train the initial network based on the initial meta-knowledge and task knowledge of each task, and obtain the second sub-training result ; Based on the second sub-training result and the meta-loss function, the second sub-loss and the second sub-network are obtained; when the second sub-loss is greater than the preset loss threshold, the second sub-network is subjected to meta-random perturbation processing to obtain The second perturbation sub-network; the second perturbation sub-network is the network after parameter compression of the initial network; continue to use the samples of each task in the first data set to iteratively train the second perturbation sub-network until the obtained loss is less than or equal to the preset The optimal meta-knowledge is obtained until the loss threshold is set.
  • Meta-dropout removes the network structure to a certain extent, increases the dynamics of the network, and is applied in the meta-training stage.
  • the training device of the model can aim at the intermediate generated network in the task knowledge training process, that is, the first sub-network, and/or for the intermediate generated network in the meta-knowledge training process, that is, the second sub-network, perform meta-random perturbation, and then continue Do iterative training.
  • the method of parameter compression processing includes at least one of the following:
  • the minimum structure of the network corresponding to each task is discarded.
  • Meta-dropout can be discarded for the structure in the network corresponding to each task, wherein, ordinary dropout (normal dropout) can be used to discard a single neuron in the network corresponding to each task , you can also use spatial-dropout to discard the neurons in a certain dimension, or use dropblock to discard the regional structure of the convolutional neural network corresponding to each task, or use the convolution corresponding to each task.
  • the minimum structure area of the neural network is discarded, and the embodiment of the present disclosure does not limit the discarding method, which can be applied to any network structure corresponding to each task.
  • the effect of using dropblock is better than that of using normal dropout
  • the effect of using normal dropout is better than that of using spatial-dropout.
  • each discarding may determine whether a certain network neuron needs to be discarded with a certain probability, for example, whether each network neuron needs to be discarded is determined with a probability of 0.5.
  • the area structure of the network it may be removed in units of the entire area of the convolutional layer, or may be discarded or removed in units of other area structures, which are not limited in the embodiments of the present disclosure.
  • the discarding manner adopted to implement the Meta-dropout and at which network layer or network structure are not limited in the embodiment of the present disclosure. The details will be described in the scene description of the subsequent embodiments.
  • the training device of the neural network can pair the samples of each task in the second data set with the best meta-knowledge
  • the corresponding network is iteratively trained to obtain the best task knowledge for each task, so that based on the most false task knowledge, the target network corresponding to the best task knowledge is obtained. This is consistent with the implementation of S103 and will not be repeated here.
  • the training device of the neural network increases the dynamics of the network during the meta-training process by adding Meta-dropout in the process of meta-learning, and alleviates the network or model in the meta-training stage for a large number of basic categories. overfitting and improve the generalization of the model.
  • the best meta-knowledge obtained can be understood as the default network model or network structure applied in different scenarios, and the best meta-knowledge obtained in the meta-test stage
  • the best task knowledge can be understood as the parameters of the best network model trained in the current scene, so as to obtain the target network, and use the trained target network to achieve specific tasks in the current scene, such as few-shot classification or Small sample detection, etc., are not limited by the embodiments of the present disclosure.
  • the implementation of the few-sample detection task and the few-sample classification task in the scenario of few-sample detection and few-sample classification is illustrated as an example.
  • the VOC 2007 test dataset is used for evaluation, and the VOC 2007 support dataset and VOC 2012 query dataset are used for training. Assume that 5 of its 20 object categories are selected as new classes (for meta-testing), while keeping the remaining 15 as base classes (for meta-training). Two sets of different test datasets (split 1 and split 2) in the VOC 2007 test dataset are used for inference evaluation, for example, split 1("bird", “bus”, “cow”, “mbike”, “ sofa”/rest). split 2("aero", “bottle”, “cow”, “horse”, “sofa”/rest).
  • Meta R-CNN In the episode-based few-sample training process, Meta R-CNN is used, in which ResNet-101 is used as the backbone network of Faster R-CNN, and the optimization strategy follows Meta R-CNN.
  • TFA is used, which uses Faster R-CNN as the detector, and ResNet-101 is the backbone network.
  • dropblock and normal dropout can be used to implement Meta-dropout, but higher accuracy can be achieved by using dropblock.
  • Meta-dropout can be applied to the fourth layer of Reset-101.
  • Meta-dropout can be applied to layer 3 or layer 4 of Resnet-101.
  • keep prob and block size are two important hyperparameters in drop-block, which are set to 0.9 and 7 respectively, which are not limited in the embodiments of the present disclosure.
  • the 3rd or 4th layer of the backbone network uses Meta-dropout, it is applied to the last convolutional layer of each bottleneck block.
  • Meta-dropout is applied to obtain Meta R-CNN* with a batch size of 1. Apply Meta-dropout on TFA to get TFA*.
  • Meta R-CNN realize the few-sample training method of Episode in the embodiment of the present disclosure and realize the few-sample training method of Pre-train finetune in the embodiment of the present disclosure based on TFA, by Our Impl. -random perturbation) representation.
  • Meta R-CNN* In the few-shot training method based on Episode, except that the mAP in the new category set by 10-shot is comparable to the baseline, the model trained by the few-shot training implemented by Meta R-CNN* in this disclosure has a more obvious improvement in accuracy . And Meta R-CNN* can get 10% improvement in split 1with 1-shot setting.
  • TFA* is able to achieve higher accuracy in all settings.
  • models trained in the 1-shot setting achieve the greatest improvement, followed by 3-shot, while models trained in the 10-shot setting achieve the smallest improvement.
  • Meta-dropout to improve the generalization ability of meta-knowledge can help the model achieve higher improvement in the setting of few-sample detection.
  • Caltech-UCSD Birds-200-2011 (CUB) is used for fine-grained classification, which contains a total of 200 classes and 11788 images.
  • 200 classes are divided into 100 basic classes, 50 verification classes and 50 new classes.
  • the mini-ImageNet containing 100 categories is a subset of ImageNet, and each category contains 600 images of size 84 ⁇ 84.
  • the 100 classes selected in mini-ImageNet can be divided into 64 training classes (corresponding to the support dataset), 16 validation classes (corresponding to the query dataset) and 20 test classes (corresponding to the test dataset).
  • Baseline++ is chosen as the network used in an efficient few-shot classification method.
  • Baseline++ trained 200 epochs based on the CUB dataset, trained 400 epochs based on the mini-ImageNet dataset, and used a four-layer convolutional network (Conv-4) as the backbone network.
  • Baseline++* is constructed by applying Meta-dropout on Baseline++.
  • a dropblock with a block size of 7 is applied to the last convolutional layer of a four-layer convolutional network (Conv-4) with a batch size of 16.
  • Meta-dropout is implemented using normal dropout, applied to the flat layer (one-dimensional feature) after the last convolutional layer, and the batch size is 32.
  • Meta-dropout is applied to the convolutional layer of the last layer and the one-dimensional feature of the last layer.
  • the Baseline++ (Our Impl.) obtained by the training method of the neural network provided by the embodiment of the present disclosure is better than Baseline++ in test accuracy, and better than other state-of-the-art small-sample image classification algorithms (or models) .
  • the Baseline++* trained by the neural network training method provided by the embodiment of the present disclosure is better than Baseline++ (Our Impl.) in accuracy.
  • Baseline++ (Our Impl.) based on a batch size of 16 has a significant performance improvement in the 5-shot of the CUB dataset than the Meta-dropout Baseline++*.
  • Meta-dropout and dropout are applied to show the importance of improving the generalization ability of meta-knowledge.
  • applying Meta-dropout achieves the best performance among all other settings.
  • Meta-dropout is implemented by using dropblocks with different block sizes and normal dropouts with different positions. Since the flatten layer is 1-dimensional, dropblock can only be used for the last convolutional layer. And normal dropout can be applied to the last convolution or the last flattening layer. Among them, using a dropblock with a block size of 7 is suitable for the CUB dataset and can achieve the best performance. Based on the mini-ImageNet dataset, the best performance can be obtained by using normal dropout on the flatten layer.
  • the embodiment of the present disclosure introduces a unified meta-learning framework, which integrates two distinct small-sample learning flows, namely, Episode-based and pre-training-fine-tuning-based small-sample learning.
  • Second, a simple, general and effective Meta-dropout is proposed to improve the generalization ability of meta-knowledge in this framework, and explore the significant impact of batch size on few-shot learning.
  • the model pair trained by the framework provided by the embodiments of the present disclosure has a performance advantage of higher accuracy.
  • FIG. 4 is a schematic diagram of an optional structural composition of a neural network training device provided in an embodiment of the present disclosure. As shown in FIG. 4 , the neural network training device 1 includes:
  • the training part 10 is configured to use the samples of each task in the first data set to iteratively train the initial network based on the initial meta-knowledge, the initial task knowledge of each task and the task loss function, and obtain the task of each task Knowledge;
  • the first data set is a source domain data set containing the initial task category; based on the task knowledge of each task, the initial meta-knowledge and the meta-loss function, using various tasks in the first data set
  • Corresponding samples respectively perform iterative training on the initial network to obtain optimal meta-knowledge; based on the optimal meta-knowledge, the task knowledge of each task and the task loss function, adopt each The sample of the task iteratively trains the network corresponding to the best meta-knowledge to obtain the best task knowledge of each task and the target network corresponding to the best task knowledge;
  • the second data set is a target domain data set including a target task category, and the sample size of each task category in the initial task category is greater than the sample size of each task category in the target task category.
  • the device 1 further includes: a determining unit 11;
  • the training part 10 is further configured to iteratively train the initial network using samples of each task in the first data set based on the initial meta-knowledge and the initial task knowledge of each task to obtain a first training result;
  • the determining part 11 is configured to determine the task knowledge of each task when determining the minimum sum of task losses corresponding to each sample based on the task loss function and the first training result.
  • the device 1 further includes: a determining unit 11;
  • the training part 10 is further configured to iteratively train the initial network using samples corresponding to various tasks in the first data set based on the task knowledge of each task and the initial meta-knowledge, obtaining a second training result;
  • the determining part 11 is configured to determine the optimal meta-knowledge when the sum of meta-losses corresponding to various tasks is determined to be minimum based on the meta-loss function and the second training result.
  • the first data set in the training process of a small number of samples based on fragments, includes a query data set and a support data set corresponding to the fragment; the support data set is used for the training process of task knowledge In; the query data set is used in the training process of meta-knowledge; wherein, a segment represents a task;
  • the query data set includes M types of tasks, and each task corresponds to N samples; the support data set includes H types of tasks, and each task corresponds to K samples; the M types of tasks and the H types The types of tasks are different; M is a positive integer greater than or equal to 1, N is a positive integer greater than or equal to 1; H is a positive integer greater than or equal to 1, and K is a positive integer greater than or equal to 1;
  • the second data set is a data set of a target task category different from the initial task category of the first data set; wherein, the total number of Q tasks in the second data set , and each task corresponds to a small number of samples; Q is a positive integer greater than or equal to 1.
  • the first data set in the pre-training process based on pre-training-fine-tuning with a small number of samples, includes: M types of tasks, and each task corresponds to N samples; M is greater than or equal to A positive integer of 1, N is a positive integer greater than or equal to 1;
  • the second data set is a data set of a target task category different from the initial task category of the first data set; wherein, the second data set Q
  • the training part 10 is further configured to use the first data set based on the initial meta-knowledge, the initial task knowledge of each task, and the task loss function in combination with parameter compression processing. Iteratively train the initial network on the samples of each task to obtain the task knowledge of each task.
  • the device 1 further includes: an acquisition part 12 and a parameter compression part 13;
  • the training part 10 is further configured to use the samples of each task in the first data set to train the initial network based on the initial meta-knowledge and the initial task knowledge of each task, and obtain the first A sub-training result;
  • the obtaining part 12 is further configured to obtain a first sub-loss and a first sub-network based on the first sub-training result and the task loss function;
  • the parameter compression part 13 is configured to perform meta-random perturbation processing on the first sub-network to obtain a first perturbed sub-network when the first sub-loss is greater than a preset loss threshold;
  • a perturbation sub-network is a network after parameter compression of the initial network;
  • the training part 10 is further configured to continue to iteratively train the first perturbation sub-network using samples of each task in the first data set until the obtained loss is less than or equal to the preset loss threshold , to obtain the task knowledge of each task.
  • the training part 10 is also configured to adopt the first data
  • the collected samples corresponding to various tasks respectively perform iterative training on the initial network to obtain the optimal meta-knowledge.
  • the device 1 further includes: an acquisition part 12 and a parameter compression part 13;
  • the training part 10 is further configured to use samples of each task in the first data set to train the initial network based on the initial meta-knowledge and the task knowledge of each task, to obtain Second sub-training result;
  • the obtaining part 12 is configured to obtain a second sub-loss and a second sub-network based on the second sub-training result and the meta-loss function;
  • the parameter compression part 13 is configured to perform meta-random perturbation processing on the second sub-network to obtain a second perturbed sub-network when the second sub-loss is greater than a preset loss threshold;
  • the second perturbation sub-network is the network after parameter compression of the initial network;
  • the training part 10 is further configured to continue to iteratively train the second perturbation sub-network using samples of each task in the first data set until the obtained loss is less than or equal to the preset loss threshold , to obtain the optimal meta-knowledge.
  • the method of parameter compression processing includes at least one of the following:
  • a "part" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a unit, a module or a non-modular one.
  • the task knowledge is determined by first training the distribution of samples based on each task, and then based on the task knowledge and task distribution, train the samples of all tasks in the first data set to obtain the best meta-knowledge, so as to realize the subsequent training of the training data set with a small number of samples based on the best meta-knowledge, and then obtain the best task knowledge corresponding to target network for task-specific testing and inference.
  • a small number of sample training can be realized under this framework, thereby improving the generalization of small-sample training.
  • Fig. 5 is a schematic diagram of the structural composition of the training device of the neural network according to the embodiment of the present disclosure. As shown in Fig. 5, the training device of the neural network includes:
  • memory 14 for storing computer programs
  • the processor 15 is configured to implement the above neural network training method when executing the computer program stored in the memory 14 .
  • An embodiment of the present disclosure provides a computer-readable storage medium, on which a computer program is stored, and is used to implement the above neural network training method when executed by a processor.
  • the neural network training device also includes a bus system 16 ; various components in the neural network training device are coupled together through the bus system 16 . It can be understood that the bus system 16 is used to realize connection and communication between these components. In addition to the data bus, the bus system 16 also includes a power bus, a control bus and a status signal bus.
  • the memory 14 is configured to store computer programs and applications by the processor 15, and can also cache data to be processed or processed by each module in the target detection device (for example, image data, audio data, voice communication data and video communication data) ), which can be implemented by flash memory (FLASH) or random access memory (Random Access Memory, RAM).
  • FLASH flash memory
  • RAM Random Access Memory
  • Processor 15 When the processor 15 executes the program, the steps of any one of the neural network training methods mentioned above are realized.
  • Processor 15 generally controls the overall operation of the training device of the neural network.
  • the above-mentioned processor can be an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a digital signal processor (Digital Signal Processor, DSP), a digital signal processing device (Digital Signal Processing Device, DSPD), a programmable logic device (Programmable Logic At least one of Device, PLD), Field Programmable Gate Array (Field Programmable Gate Array, FPGA), Central Processing Unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor. Understandably, the electronic device that implements the above processor function may also be other, which is not limited in this embodiment of the present disclosure.
  • the above-mentioned computer-readable storage medium/memory can be a read-only memory (Read Only Memory, ROM), a programmable read-only memory (Programmable Read-Only Memory, PROM), an erasable programmable read-only memory (Erasable Programmable Read-Only Memory, EPROM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), Magnetic Random Access Memory (Ferromagnetic Random Access Memory, FRAM), Flash Memory (Flash Memory), magnetic surface Memory, CD, or CD-ROM (Compact Disc Read-Only Memory, CD-ROM) and other storage; it can also be a variety of terminals including one or any combination of the above storage, such as mobile phones, computers, tablet devices, personal digital Assistant etc.
  • An embodiment of the present disclosure provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the neural network training method described above in the embodiments of the present disclosure.
  • executable instructions may take the form of programs, software, software modules, scripts, or code written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and its Can be deployed in any form, including as a stand-alone program or as a module, component, subroutine or other unit suitable for use in a computing environment.
  • executable instructions may, but do not necessarily correspond to files in a file system, may be stored as part of a file that holds other programs or data, for example, in a Hyper Text Markup Language (HTML) document in one or more scripts, in a single file dedicated to the program in question, or in multiple cooperating files (for example, files that store one or more modules, subroutines, or sections of code).
  • HTML Hyper Text Markup Language
  • executable instructions may be deployed to be executed on one computing device, or on multiple computing devices located at one site, or alternatively, on multiple computing devices distributed across multiple sites and interconnected by a communication network. to execute.
  • the task knowledge is determined by first training the distribution of samples based on each task, and then based on the task Knowledge and task distribution, training samples of all kinds of various tasks in the first data set to obtain the best meta-knowledge, so as to realize the subsequent training of a small number of samples of training data sets based on the best meta-knowledge, and then obtain the same Optimal task-knowledge-corresponding target network for task-specific testing and inference.
  • few-sample schooling method is adopted, few-sample training can be realized under this framework, thereby improving the generalization of few-sample training.
  • the terminal Based on the target business analysis type, perform character recognition on the video frame sequence to be processed, and obtain the recognition result of the character state corresponding to the target business analysis type, where the character state represents at least one of the human body state and expression state corresponding to the target business analysis type; and then According to the recognition result, the financial business scene data representing the change of the person's state is recorded; in this way, the terminal can analyze the target business state according to the financial business scene data; thereby improving the efficiency and accuracy of the business state analysis.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

本公开实施例提供了一种神经网络的训练方法及装置、存储介质和计算机程序,方法包括:基于初始元知识、每种任务的初始任务知识和任务损失函数,采用第一数据集中的每种任务的样本对初始网络进行迭代训练,得到每种任务的任务知识;第一数据集为包含初始任务类别的源域数据集;基于每种任务的任务知识、初始元知识和元损失函数,采用第一数据集中的各种任务对应的样本分别对初始网络进行迭代训练,得到最佳元知识;基于最佳元知识、每种任务的任务知识和任务损失函数,对第二数据集中的每种任务的样本对最佳元知识对应的网络进行迭代训练,得到每种任务的最佳任务知识,以及与最佳任务知识对应的目标网络。本公开提高了进行少量样本训练的泛化性。

Description

神经网络的训练方法及装置、存储介质和计算机程序
相关申请的交叉引用
本公开基于申请号为202111083941.2、申请日为2021年9月14日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本公开作为参考。
技术领域
本公开涉及计算机视觉技术,尤其涉及一种神经网络的训练方法及装置、存储介质和计算机程序。
背景技术
少量样本学习是计算机视觉重要的新兴研究领域之一。目前,由于卷积神经网络(CNNs)的迅速发展,许多基于CNN的目标检测器取得了巨大的成功。其中,少量样本学习旨在发展深度学习模型在少量样本场景下的能力。主要的少量样本学习方法包括:一种基于片段(Episode)训练方法的元学习(meta-learning)模型和一种基于预训练-微调(Pre-train finetune)的迁移学习方法。
然而,目前的两类少量样本学习方法在优化过程上十分相似,但却各自有对应的实现方式,并不能相互兼容,具有局限性。
发明内容
本公开实施例提供一种神经网络的训练方法及装置、存储介质和计算机程序,提高了进行少量样本训练的泛化性。
本公开的技术方案是这样实现的:
本公开实施例提供一种神经网络的训练方法,所述方法包括:
基于初始元知识、每种任务的初始任务知识和任务损失函数,采用第一数据集中的每种任务的样本对初始网络进行迭代训练,得到所述每种任务的任务知识;其中,所述第一数据集为包含初始任务类别的源域数据集;
基于所述每种任务的任务知识、所述初始元知识和元损失函数,采用所述第一数据集中的各种任务对应的样本分别对所述初始网络进行迭代训练,得到最佳元知识;
基于所述最佳元知识、所述每种任务的任务知识和所述任务损失函数,采用第二数据集中的每种任务的样本对最佳元知识对应的网络进行迭代训练,得到所述每种任务的最佳任务知识,以及与所述最佳任务知识对应的目标网络;其中,所述第二数据集为包含目标任务类别的目标域数据集,且所述初始任务类别中每种任务类别的样本量大于所述目标任务类别中每种任务类别的样本量。
本公开实施例提供一种神经网络的训练装置,所述装置包括:
训练部分,被配置为基于初始元知识、每种任务的初始任务知识和任务损失函数,采用第一数据集中的每种任务的样本对初始网络进行迭代训练,得到所述每种任务的任务知识;所述第一数据集为包含初始任务类别的源域数据集;基于所述每种任务的任务知识、所述初始元知识和元损失函数,采用所述第一数据集中的各种任务对应的样本分别对所述初始网络进行迭代训练,得到最佳元知识;基于所述最佳元知识、所述每种任务的任务知识和所述任务损失函数,采用第二数据集中的每种任务的样本对最佳元知识对应的网络进 行迭代训练,得到所述每种任务的最佳任务知识,以及与所述最佳任务知识对应的目标网络;其中,所述第二数据集为包含目标任务类别的目标域数据集,且所述初始任务类别中每种任务类别的样本量大于所述目标任务类别中每种任务类别的样本量。
本公开实施例提供一种神经网络的训练装置,包括:
存储器,用于存储计算机程序;
处理器,用于执行所述存储器中存储的计算机程序时,实现上述神经网络的训练方法。
本公开实施例提供一种计算机可读存储介质,其上存储有计算机程序,用于被处理器执行时,实现上述神经网络的训练方法。
本公开实施例提供一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行时实现上述神经网络的训练方法。
本公开实施例提供的神经网络的训练方法及装置、存储介质和计算机程序,通过基于初始元知识、每种任务的初始任务知识和任务损失函数,采用第一数据集中的每种任务的样本对初始网络进行迭代训练,得到每种任务的任务知识;第一数据集为包含初始任务类别的源域数据集;基于每种任务的任务知识、初始元知识和元损失函数,采用第一数据集中的各种任务对应的样本分别对初始网络进行迭代训练,得到最佳元知识;基于最佳元知识、每种任务的任务知识和任务损失函数,采用第二数据集中的每种任务的样本对最佳元知识对应的网络进行迭代训练,得到每种任务的最佳任务知识,以及与最佳任务知识对应的目标网络;其中,第二数据集为包含目标任务类别的目标域数据集,且所述初始任务类别中每种任务类别的样本量大于所述目标任务类别中每种任务类别的样本量。采用上述方案,由于在神经网络的训练方法中,可以在统一的少量样本学习框架下实现,通过先对基于每种任务下的样本数量分布的训练,进行任务知识的确定,再基于任务知识和任务分布,对所有种类的各种任务在第一数据集中的样本进行训练,得到最佳元知识,以便实现后续基于最佳元知识进行少量样本的训练数据集的训练的,进而得到与最佳任务知识对应的目标网络,以实现特定任务的测试和推理。这样,无论采用哪种少量样本学校方法,均可在此框架下实现少量样本训练,从而提高了进行少量样本训练的泛化性。
附图说明
图1为本公开实施例提供的一个示例性的目标任务的训练方法示意图;
图2为本公开实施例提供的一个可选的神经网络的训练方法流程图;
图3为本公开实施例提供的另一个可选的神经网络的训练方法流程图;
图4为本公开实施例提供的一个可选的神经网络的训练装置的结构示意图;
图5为本公开实施例提供的另一个可选的神经网络的训练装置的结构示意图。
具体实施方式
为了使本公开的目的、技术方案和优点更加清楚,下面将结合附图对本公开作进一步地详细描述,所描述的实施例不应视为对本公开的限制,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本公开保护的范围。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。
在以下的描述中,所涉及的术语“第一\第二\第三”仅仅是是区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本公开实施例能够以除了在这里图示或描述的以外的顺序实施。
除非另有定义,本文所使用的所有的技术和科学术语与属于本公开的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本公开实施例的目的,不是旨在限制本公开。
目前,深度神经网络(DNN)模型取得了巨大的成就在许多计算机视觉任务中取得了进展。然而,这些深度神经网络模型的性能很大程度上依赖于大量的数据以及昂贵的人工注释。当注释数据为稀缺,DNN就不能很好地适用于测试数据,尤其是当测试数据属于不同类别时训练数据的情况。因此,可以学会识别或仅使用少量数据标记来快速检测新对象。由于某些对象类别是很少的样本或其注释极难获得,因此,常规神经网络的泛化能力满足不了,从而通过少量样本学习获得更好的泛化能力的方式应运而生。
在本公开实施例中,少量样本学习方法包括:一种基于片段(Episode)训练方法的元学习(meta-learning)模型和一种基于预训练+微调(Pre-train finetune)的迁移学习方法。其中,meta-learning方法包括meta-training和meta-testing两个阶段,meta-training阶段是将模型在大量的基础类别数据上进行训练,meta-testing是负责在基础类别的基础上引入少量类别的样本进行进一步训练。上述两个阶段均采用基于Episode的训练方法。一个Episode指的是随机获取一个N-way,K-shot的小型训练集(N种类别,每种类别K个样本),同时在对应类别上随机抽取一定数量的样本作为小型测试集,而整个训练过程由很多个Episode组成。Pre-train finetune方法是一种常见的域迁移方法,可以应用到少量样本学习领域。其中,预训练阶段指的是在大量基础类别上进行基础训练,提供一个良好的初始化模型,微调指的是在少样本任务上进行参数微调,将参数迁移到对应任务上。
如图1所示,不同的源任务(1、2、……、N)进行任务知识和可迁移知识通过泛化,得到目标任务。示例性的,源任务1用于识别花,源任务2用于识别狗,源任务N用于识别杯子,而目标任务用于识别是否为船;不同的源的泛化能力是小样本学习的关键,其中可迁移知识适应目标任务。
在本公开实施例中,基于元学习(meta-learning)的框架实现上述两种的神经网络的训练方法。基于meta-learning,构建一种统一的数学框架,来解释基于Episode和基于Pre-train finetune的两种主流的少量样本学习方法。上述框架由一个新的重构元训练阶段和一个元测试阶段组成。在元训练阶段,此框架联合考虑了任务的分布和数据的分布,突出了少量样本学习的共同要素,包括元知识、任务知识、元损失、任务损失和数据集的分布,以便于引入有效的策略来提高整体泛化能力的模型。
同时,由于卷积神经网络(CNNs)的迅速发展,当前的CNN模型对大量数据的依赖十分明显,如果只有少量的训练数据,模型往往会出现过拟合的现象。
也就是说,上述两种方法在第一阶段(meta-training)的基础训练时,对基础类别样本存在明显的过拟合现象,导致第二阶段模型在少量类别(新类别)上无法达到最优效果。针对在meta-training阶段存在的过拟合问题,本公开实施例还提出了一类简单且通用的元学习策略(Meta-dropout)来缓解对基础类别的过拟合,从而提升模型的泛化性。其中,Meta-dropout在少量样本目标检测和图像分类任务上均可以提升模型精度,具有很强的通用性。也就是说,Meta-dropout(元-随机扰动)应用于深度神经网络模型的可转移知识。通过利用元-随机扰动,模型在少样本对象检测和少样本图像分类任务上表现出相对于当前少样本学习方法的巨大优势。
本公开实施例提供了一种神经网络的训练方法及装置、存储介质和计算机程序,能够提高少量样本训练的泛化性和防止过拟合的问题。下面说明本公开实施例提供的神经网络的训练装置的示例性应用,本公开实施例提供的神经网络的训练装置可以是电子设备,如笔记本电脑,平板电脑,台式计算机,机顶盒,移动设备等各种类型的用户终端,也可以实施为服务器,本公开实施例不作限制。
在一些实施例中,服务器可以是独立的物理服务器,也可以是多个物理服务器构成的 服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN、以及大数据和人工智能平台等基础云计算服务的云服务器。
参见图2,图2是本公开实施例提供的神经网络的训练方法的一个可选的流程示意图,将结合图2示出的步骤进行说明。
S101、基于初始元知识、每种任务的初始任务知识和任务损失函数,采用第一数据集中的每种任务的样本对初始网络进行迭代训练,得到每种任务的任务知识;第一数据集为包含初始任务类别的源域数据集。
S102、基于每种任务的任务知识、初始元知识和元损失函数,采用第一数据集中的各种任务对应的样本分别对初始网络进行迭代训练,得到最佳元知识。
在本公开实施例中,基于元学习(meta-learning)的框架实现神经网络的训练方法。基于meta-learning,构建一种统一的数学框架,来解释基于Episode和基于Pre-train finetune的两种主流的少量样本学习方法。上述框架由一个新的重构元训练阶段(meta-training阶段)和一个元测试阶段(meta-testing阶段)组成。
需要说明的是,元学习旨在学习通用的学习算法,是可以跨任务进行泛化并使每个新任务都能更好地学习。
在本公开实施例中,少量样本学习框架在meta-training过程中的整体目标可以是:针对所有源任务(对应初始任务类别)的分布,假设任务数量为无限大,每种任务采用第i中任务来表示,第i种任务中的数据的数量也是无限大时,期望基于任务和各种任务数据的损失都最小时,学习到一个元知识,其使得在每一种任务上都可以有很好的表现。
示例性的,meta-training的整体优化过程可以如公式(1)所示:
Figure PCTCN2022071370-appb-000001
其中,p(T)表示源任务的分布,其中,T由无限个基本任务组成,p(Dt)是任务t的训练数据集Dt的分布,E表示期望,L是损失函数,w为元知识,θ t为任务t的任务知识,x为样本或数据。
需要说明的是,学习元知识的过程中的训练数据的数量是无限的。w用于表示元知识,可以在不同的任务之间转移。元知识是一个广义的概念,它可以是学习过程中的许多组成部分,如参数初始化、黑盒模型和网络结构等,本公开实施例不作限定。θ或θ t是特定于任务的任务知识。
需要说明的是,基于上述整体优化目标,可以确定在元训练阶段的优化目标可以是基于初始任务,在初始元知识的初始状态下,经过迭代过程,得到最佳元知识。
示例性的,meta-training的元训练阶段的优化过程可以如公式(2)所示:
Figure PCTCN2022071370-appb-000002
其中,ω*为最佳元知识,ω为初始元知识,T为任务。
需要说明的是,ω*是需要基于任务和数据的基础上学习得到的,先要基于初始元知识和初始任务知识(与T相关的信息),先进行任务相关的任务知识的学习,再基于学习到的任务知识,再去优化初始元知识,最终得到了最佳元知识。
在本公开实施例中,神经网络的训练装置可以基于初始元知识和每种任务的初始任务知识,采用第一数据集中的每种任务的样本对初始网络进行迭代训练,再基于任务损失函数,确定损失,基于损失的迭代更新,得到每种任务的任务知识。其中,第一数据集为包含初始任务类别的源域数据集。
在本公开的一些实施例中,神经网络的训练装置基于初始元知识和每种任务的初始任务知识,采用第一数据集中的每种任务的样本对初始网络进行迭代训练,得到第一训练结果;基于任务损失函数和第一训练结果,在确定每个样本对应的任务损失之和最小时,确 定每种任务的任务知识。
需要说明的是,初始任务类别可以为基础类别,本公开实施例不作限制。
在本公开实施例中,在meta-training阶段,对初始网络或模型在数据集D(对应第一数据集)上进行训练,D表示源域数据集,其中包含大量的具有充足数量的基础类别样本。
需要说明的是,神经网络的训练装置可以根据任务知识模型,针对第一数据集中的每种任务的样本,实现对初始网络的训练,从而得到每种任务的任务知识。
示例性的,任务知识模型可以为公式(3)所示:
Figure PCTCN2022071370-appb-000003
其中,θ *(i)(ω)为第i种任务的任务知识,D ij为第i种任务的对应的在数据集D(对应第一数据集)中的第j个样本,L task为任务损失函数,θ i为第i种任务的初始任务知识,ω为初始元知识。
需要说明的是,神经网络的训练装置基于公式(3)的原则,依据某种任务的初始元知识和该种任务的初始任务知识任务的情况下,采用第一数据集中的每种任务的每个样本对初始网络分别进行训练,得到每个样本对应的第一训练结果,基于样本的真实值、各自的第一训练结果以及任务损失函数,确定每个样本的损失,将每种任务对应的所有样本的损失之和加起来,得到当前次的任务损失(即每种任务对应的样本的损失之和),继续进行下一轮的任务知识的学习,直至确定出任务损失最小时为止,得到该任务的任务知识。其中,每种任务的任务知识的训练过程的原理是一致的。
在本公开实施例中,神经网络的训练装置可以根据得到的每种任务的任务知识,结合初始元知识,针对第一数据集中的各种任务对应的样本分别对初始网络进行迭代训练,得到最佳元知识。
在本公开的一些实施例中,神经网络的训练装置是可以基于每种任务的任务知识和初始元知识,采用第一数据集中的各种任务对应的样本分别对初始网络进行迭代训练,得到第二训练结果;基于元损失函数和第二训练结果,在确定各种任务对应的元损失之和最小时,确定最佳元知识。
在本公开实施例中,神经网络的训练装置可以根据得到的每种任务的任务知识,结合初始元知识,针对第一数据集中的各种任务对应的样本分别对初始网络进行迭代训练,各种任务的元损失,从而得到所有任务的元损失之和,继续进行迭代训练,得到每次训练的元损失之和,将各种任务对应的元损失之和最小时确定的元知识作为最佳元知识。
需要说明的是,神经网络的训练装置可以根据元知识模型,依据所有任务的任务知识和初始元知识任务的情况下,针对第一数据集中的各种任务的样本,实现对初始网络的训练,从而得到最佳元知识。
示例性的,元知识模型可以为公式(4)所示:
Figure PCTCN2022071370-appb-000004
其中,ω*为最佳元知识,ω为初始元知识,p(T)表示源任务的分布,其中,T由无限个基本任务组成,p(Di)是任务i的训练数据集Di的分布,E表示期望,L meta是元损失函数,θ *(i)(ω)为第i种任务的任务知识,i为p(T)中的一种任务。
在本公开的一些实施例中,任务知识模型和元知识模型还可以通过积分的形式表示出来,本公开实施例不作限制。
示例性的,通过积分形式提出通用的任务知识模型(公式(5))和元知识模型(公式(6)),如下所示:
Figure PCTCN2022071370-appb-000005
Figure PCTCN2022071370-appb-000006
需要说明的是,ω是初始元知识,ω*是在任务迭代期间学到的最佳元知识。θ i代表任务知识,θ *(i)是学习得到的任务知识,用于优化ω得到最佳ω*。L task是用于优化任务特定信息的,称为task-loss是内部目标,表示哪个模型或者网络适合特定任务;L meta是用于生成最佳元知识的,L meta是meta-loss,是外部目标,表示哪个w好。
在本公开实施例中,元训练阶段有两个优化阶段,包括任务优化和元优化,元优化依赖于任务优化的结果来进行。其中,公式(3)和(5)中,基于L task,对第i种任务,可以进行一次一次的迭代训练,每次迭代都会得到一次任务知识,直至得到任务损失之和最小时的任务知识为止,即会经历θ 0→θ 1→θ 2→...→θ*的过程。同样的,公式(4)和(6)中,基于L meta,对所有任务的所有样本,可以进行一次一次的迭代训练,每次迭代都会得到一次元知识,直至得到元损失之和最小时的最佳元知识为止,即会经历ω 0→ω 1→ω 2→...→ω*的过程。
在本公开的一些实施例中,神经网络的训练装置可以通过上述实现基于Episode方式的少样本训练以及基于Pre-train finetune(预训练-微调)的少样本训练过程。
在本公开实施例中,神经网络的训练装置使用的数据集包括支持数据集和查询数据集,采用支持数据集和查询数据集作为第一数据集,来实现基于Episode的少样本训练方法的元训练阶段,而元测试阶段与后续描述的元测试阶段可以一致,本公开实施例不作限制。下面主要介绍基于Episode的少样本训练方法的元训练阶段。其中,该支持数据集和查询数据集均是基于包含基础类别(即初始任务类别)的数据集构建的。
在本公开的一些实施例中,在基于片段的少量样本的训练过程中,第一数据集包括片段对应的查询数据集和支持数据集;支持数据集用于任务知识的训练过程中;查询数据集用于元知识的训练过程中。其中,一个片段表征一种任务。
在本公开实施例中,查询数据集中包括M种任务,且每种任务下对应N个样本;支持数据集中包括H种任务,且每种任务下对应K个样本;M种任务与H种任务的种类不同;M为大于等于1的正整数,N为大于等于1的正整数;H为大于等于1的正整数,K为大于等于1的正整数。
需要说明的是,支持数据集中的任务的种类跟查询数据集中的任务的类型或种类是不同的,且每种任务都对应多个样本。其中,M种任务和H种任务的数量可以一样,也可以不一样,本公开实施例不作限制。N个样本和K个样本的数量可以一样,也可以不一样,本公开实施例不作限制。M和H表示有限的任务的数量,N和K分别表示各自任务下样本的数量。
在使用新的数据集构建支持数据集的情况下,可以随机选择一些M种类别的样本,而从剩余的数据中选择H种与支持数据集的类别不同的样本来构建查询数据集。
示例性的,假设M=H,即支持数据集和查询数据集中均包含M种任务(支持数据集和查询数据集各自对应的任务不相同),且各自的每种任务对应N个样本的情况下,支持数据集和查询数据集的可以通过公式(7)所示:
Figure PCTCN2022071370-appb-000007
其中,D source为新的数据集,
Figure PCTCN2022071370-appb-000008
为支持数据集,
Figure PCTCN2022071370-appb-000009
为查询数据集,M为支持数据集和查询数据集中各自的任务数量。
在本公开实施例中,神经网络的训练装置将支持数据集用于任务知识的训练过程中;查询数据集用于元知识的训练过程中,实现基于Episode的少样本训练方法。
示例性的,假设M=H,N=K的情况下,采用支持数据集代入公式(5)求解,得到基于Episode的少样本训练中的任务知识,如公式(8)所示。
Figure PCTCN2022071370-appb-000010
其中,θ *(i)(ω)为M种任务(支持数据集中的任务数量)中的第i种任务学习到的任务知识,L task为任务损失函数,
Figure PCTCN2022071370-appb-000011
为支持数据集中的第i种任务对应的第j个样本,θ i为初始任务知识,ω为初始元知识。
也就是说,在本公开实施例中,神经网络的训练装置可以基于初始元知识和每种任务的初始任务知识任务,采用支持数据集中的每种任务的样本对初始网络进行迭代训练,再基于任务损失函数,确定损失,基于损失的迭代更新,得到每种任务的任务知识。其中,支持数据集为有限的包含基础类别样本的数据集。
在本公开的一些实施例中,神经网络的训练装置基于初始元知识和每种任务的初始任务知识任务,采用支持数据集中的每种任务的样本对初始网络进行迭代训练,得到第一训练结果;基于任务损失函数和第一训练结果,在确定每个样本对应的任务损失之和最小时,确定每种任务的任务知识。
在本公开实施例中,神经网络的训练装置可以根据得到的每种任务的任务知识,结合初始元知识,针对查询数据集中的所有种类的任务对应的样本分别对初始网络进行迭代训练,得到最佳元知识。
在本公开的一些实施例中,神经网络的训练装置是可以基于每种任务的任务知识和初始元知识,采用查询数据集中的各种任务(M种任务)对应的样本分别对初始网络进行迭代训练,得到第二训练结果;基于元损失函数和第二训练结果,在确定各种任务对应的元损失之和最小时,确定最佳元知识。
在本公开实施例中,神经网络的训练装置可以根据得到的每种任务的任务知识,结合初始元知识,针对查询数据集中的各种任务对应的样本分别对初始网络进行迭代训练,各种任务的元损失,从而得到所有任务的元损失之和,继续进行迭代训练,得到每次训练的元损失之和,将各种任务对应的元损失之和最小时确定的元知识作为最佳元知识。
示例性的,假设M=H,N=K的情况下,采用查询数据集和公式(8)得到的任务知识,代入公式(6)求解,得到基于Episode的少样本训练中的最佳元知识,如公式(9)所示。
Figure PCTCN2022071370-appb-000012
其中,ω*为最佳元知识,ω为初始元知识,θ *(i)是学习得到的M种任务(查询数据集中的任务数量)中第i种任务学习到的任务知识,L meta为元损失函数,
Figure PCTCN2022071370-appb-000013
为查询数据集中的第i种任务对应的第j个样本。
可以理解的是,神经网络的训练装置可以通过使用支持数据集和查询数据集就可以在统一的少量样本学习框架下实现下,先对基于支持数据集中的每种任务下的样本数量分布的训练,进行任务知识的确定,再基于任务知识和任务分布,对在查询数据集中的所有任务的样本进行训练,得到最佳元知识,从而实现后续基于最佳元知识进行少量样本的训练数据集的训练的。在此框架下实现基于Episode的少样本训练方法。
在本公开实施例中,基于Pre-train finetune的少样本训练过程中,使用的数据集为有限的初始任务类别的数据集,该有限的数据集中包括:M种任务,且每种任务下对应N个样本。
在本公开的一些实施例中,在基于预训练-微调的少量样本训练过程中,第一数据集包括:M种任务,且每种任务下对应N个样本;M为大于等于1的正整数,N为大于等于1的正整数。
需要说明的是,在本公开实施例中,有限的数据集可以用D source表示。假设数据集D source 无限大,就可以为元测试阶段提供良好的初始的模型。所以本公开实施例中提升M和N的数量对于提升模型精度十分重要。
在本公开实施例中,Pre-train finetune方法包括预训练阶段和微调阶段。在本公开的元学习框架中,预训练阶段由元训练阶段呈现,微调阶段是由元测试阶段呈现。下面主要介绍预训练阶段的实现。
在本公开实施例中,预训练的优化过程可以基于通用框架(5)和(6)得出,通过用M和N指定有限的数据集规模以及使用D source完整的数据集去替代无穷大数据集D,就可以得到Pre-train finetune的优化框架,即得到Pre-train finetune方法最终得到的最佳元知识。
在本公开的一些实施例中,神经网络的训练装置基于初始元知识和每种任务的初始任务知识任务,采用D source中的M种任务中每种任务的样本对初始网络进行迭代训练,得到第一训练结果;基于任务损失函数和第一训练结果,在确定每种任务对应的每个样本对应的任务损失之和最小时,确定每种任务的任务知识。
在本公开实施例中,神经网络的训练装置可以根据得到的M种任务中的每种任务的任务知识,结合初始元知识和元损失函数,针对D source中的M种任务对应的样本分别对初始网络进行迭代训练,得到最佳元知识。
示例性的,通过求解(5)和(6),结合D source得出任务知识的过程如公式(10)所示,得到最佳元知识的过程如公式(11)所示。
Figure PCTCN2022071370-appb-000014
其中,θ *(i)(ω)为M种任务中的第i种任务学习到的任务知识,L task为任务损失函数,D source(ij)为有限数据集中的第i种任务对应的第j个样本,θ i为初始任务知识,ω为初始元知识。
Figure PCTCN2022071370-appb-000015
其中,ω*为最佳元知识,ω为初始元知识,θ *(i)是学习得到的M种任务中第i种任务学习到的任务知识,L meta为元损失函数,D source(ij)为有限数据集中的第i种任务对应的第j个样本。
可以理解的是,神经网络的训练装置可以通过使用有限数据集就可以在统一的少量样本学习框架下实现下,先对基于M种任务中的每种任务下的样本数量分布的训练,进行任务知识的确定,再基于任务知识和任务分布,对所有任务的样本进行训练,得到最佳元知识,从而实现后续基于最佳元知识进行少量样本的训练数据集的训练的。在此框架下实现基于Pre-train finetune的少样本训练方法。
S103、基于最佳元知识、每种任务的任务知识和任务损失函数,采用第二数据集中的每种任务的样本对最佳元知识对应的网络进行迭代训练,得到每种任务的最佳任务知识,以及与最佳任务知识对应的目标网络;其中,第二数据集为包含目标任务类别的目标域数据集,且初始任务类别中每种任务类别的样本量大于目标任务类别中每种任务类别的样本量。
在本公开实施例中,神经网络的训练装置完成了元训练阶段之后,就可以进行元测试阶段。在元测试阶段中,包括两个阶段:进行少量样本的训练阶段,以及数据推理阶段。其中,数据推理阶段就是采用测试数据集进行测试的阶段,或者采用实际的数据进行前向推理的阶段。下面主要介绍进行少量样本的训练阶段。
在本公开实施例中,神经网络的训练装置采用第二数据集进行少量样本训练过程,其中,第二数据集为少量样本的训练数据集,其包含目标任务类别的目标域数据集,且与第 一数据集的初始任务类别中每种任务类别的样本量大于目标任务类别中每种任务类别的样本量。
需要说明的是,第二数据集是可以从具有小规模的种类和样本的新的数据集中构建得到的。测试数据集也可以从该新的数据集中构建得到。新的数据集是基于少量样本的训练数据构建,用于学习最适用于该任务的参数θ **(i)。其中,新的数据集中的类别可以为与基础类别(初始任务类别)不同的新类别的数据。
在本公开实施例中,第一数据集对应的初始任务类比与第二数据集中的目标任务类别可以是不相同或部分不相同的类别,也可以是相同的类别,本公开实施例不作限制。
在基于片段的少量样本的测试过程中和在基于预训练-微调的少量样本的微调过程中,所述第二数据集为与第一数据集的初始任务类别不同的目标任务类别的数据集;其中,所述第二数据集Q种任务总数,且每种任务下对应少量样本(即初始任务类别中每种任务类别的样本量大于目标任务类别中每种任务类别的样本量);Q为大于等于1的正整数。
示例性的,假设D target为新的数据集,
Figure PCTCN2022071370-appb-000016
为第二数据集(即训练数据集),
Figure PCTCN2022071370-appb-000017
为测试数据集,那么可以通过公式(12)来表达这三者之间的关系。
Figure PCTCN2022071370-appb-000018
其中,Q为任务总数,i为Q中的一种任务。
需要说明的是,Q种任务所对应的各自的第二数据集均可以从新的数据集中构建得到,本公开实施例不作限制。
针对一个特定的任务,神经网络的训练装置可以采用第二数据集中的数据,在最佳元知识、该特定任务的任务知识的参数情况下,结合任务损失函数,来实现对最佳元知识对应的网络进行迭代训练,得到该特定任务的最佳任务知识,以及与最佳任务知识对应的目标网络。
示例性的,神经网络的训练装置可以基于任务模型,以及结合最佳元知识、该特定任务的任务知识,得到最佳任务知识。其中,任务模型可以表征得到一种任务的最佳任务知识的训练或优化过程,如公式(13)所示。
Figure PCTCN2022071370-appb-000019
其中,θ **(i)为第i种任务的最佳任务知识,
Figure PCTCN2022071370-appb-000020
为第i种任务对应的第二数据集。
需要说明的是,第二数据集中的数据为带注释的数据。
可以理解的是,由于在神经网络的训练方法中,可以在统一的少量样本学习框架下实现,通过先对基于每种任务下的样本数量分布的训练,进行任务知识的确定,再基于任务知识和任务分布,对所有任务在第一数据集中的样本进行训练,得到最佳元知识,以便实现后续基于最佳元知识进行少量样本的训练数据集的训练的,进而得到与最佳任务知识对应的目标网络,以实现特定任务的测试和推理。这样,无论采用哪种少量样本学校方法,均可在此框架下实现少量样本训练,从而提高了进行少量样本训练的泛化性。
在本公开的一些实施例中,参见图3,图3是本公开实施例提供的神经网络的训练方法的一个可选的流程示意图,将结合图3示出的步骤进行说明。
S201、基于初始元知识、每种任务的初始任务知识和任务损失函数,结合参数压缩处理,采用第一数据集中的每种任务的样本对初始网络进行迭代训练,得到每种任务的任务知识。
S202、基于每种任务的任务知识、初始元知识和元损失函数,结合参数压缩处理,采用第一数据集中的各种任务对应的样本分别对初始网络进行迭代训练,得到最佳元知识。
在本公开实施例中,神经网络的训练装置可以在元学习的元训练阶段(meta-training阶段)的每次训练过程中可以通过对中间得到的网络进行随机的扰动(参数压缩),去除 中间得到的网络的结构的情况下,继续进行训练,在增加了元-随机扰动(Meta-dropout)的情况下,训练得到任务知识和最佳元知识。
需要说明的是,本公开实施例中的参数压缩可以通过元-随机扰动来实现,基于上述实施例提出的神经网络的训练方法的实现的基础上,神经网络的训练装置可以在任务知识的训练过程中进行元-随机扰动,也可以在元知识的训练过程中进行元-随机扰动,还可以同时在任务知识和元知识的训练过程中,都增加元-随机扰动,本公开实施例不做限定。
基于此,少量样本学习框架在meta-training过程中的整体目标可以是:针对所有源任务的分布,假设任务数量为无限大,第i种任务中的数据的数量也是无限大时,期望基于任务和各种任务数据的损失都最小时,在增加元-随机扰动的情况下学习到一个元知识,其使得在每一种任务上都可以有很好的表现。
示例性的,在公式(1)示例的目标优化或训练目标的基础上,增加了Meta-dropout的可能,如公式(14)所示。
Figure PCTCN2022071370-appb-000021
其中,p(T)表示源任务的分布,其中,T由无限个基本任务组成,p(Dt)是任务t的训练数据集Dt的分布,E表示期望,L是损失函数,w为元知识,θ t为任务t的任务知识,x为样本或数据,O(ω)表示元-随机扰动。
需要说明的是,基于上述整体优化目标,可以确定在元训练阶段的优化目标可以是基于任务,在初始元知识的初始状态下,结合元-随机扰动,经过迭代过程,得到最佳元知识。
示例性的,在meta-training的元训练阶段的优化过程即公式(2)的基础上,增加了Meta-dropout的可能,如公式(15)所示:
Figure PCTCN2022071370-appb-000022
其中,ω*为最佳元知识,ω为初始元知识,T为任务,O(ω)表示元-随机扰动。
在本公开实施例中,在meta-training阶段,对初始网络或模型在数据集D(对应第一数据集)上进行训练,D表示基础类别数据集,其中包含大量的具有充足数量的基础类别样本。
需要说明的是,神经网络的训练装置可以根据任务知识模型,针对第一数据集中的每种任务的样本,结合元-随机扰动实现对初始网络的训练,从而得到每种任务的任务知识。
示例性的,任务知识模型可以为在公式(3)的实现基础上,增加Meta-dropout的可能,如公式(16)所示:
Figure PCTCN2022071370-appb-000023
其中,θ *(i)(ω)为第i种任务的任务知识,D ij为第i种任务的对应的在数据集D(对应第一数据集)中的第j个样本,L task为任务损失函数,θ i为第i种任务的初始任务知识,ω为初始元知识,O(ω)表示元-随机扰动。
需要说明的是,神经网络的训练装置可以根据元知识模型,依据所有任务的任务知识和初始元知识任务的情况下,结合元-随机扰动,针对第一数据集中的各种任务的样本,实现对初始网络的训练,从而得到最佳元知识。
示例性的,元知识模型可以为公式(4)的基础上,增加Meta-dropout的可能,如公式(17)所示:
Figure PCTCN2022071370-appb-000024
其中,ω*为最佳元知识,ω为初始元知识,p(T)表示源任务的分布,其中,T由无限个基本任务组成,p(Di)是任务i的训练数据集Di的分布,E表示期望,L meta是元损失函 数,θ *(i)(ω)为第i种任务的任务知识,i为p(T)中的一种任务,O(ω)表示元-随机扰动。
基于此,在实现基于Episode的少样本训练以及基于Pre-train finetune(预训练-微调)的少样本训练过程中时,均在实现的基础上增加了Meta-dropout的可能,采用O(ω)来代替ω来实现任务知识和元知识的训练过程,此处不再详述。
在本公开的一些实施例中,在基于任务知识的训练过程中,神经网络的训练装置可以基于初始元知识和每种任务的初始任务知识,采用第一数据集中的每种任务的样本对初始网络进行训练,得到第一子训练结果;基于第一子训练结果和任务损失函数,得到第一子损失和第一子网络;在第一子损失大于预设损失阈值的情况下,对第一子网络进行元-随机扰动处理,得到第一扰动子网络;第一扰动子网络为初始网络经过参数压缩后的网络;继续采用第一数据集中的每种任务的样本对第一扰动子网络进行迭代训练,直至得到的损失小于等于预设损失阈值的情况下为止,得到每种任务的任务知识。
在基于元知识的训练过程中,神经网络的训练装置可以基于初始元知识和每种任务的任务知识,采用第一数据集中的每种任务的样本对初始网络进行训练,得到第二子训练结果;基于第二子训练结果和元损失函数,得到第二子损失和第二子网络;在第二子损失大于预设损失阈值的情况下,对第二子网络进行元-随机扰动处理,得到第二扰动子网络;第二扰动子网络为初始网络经过参数压缩后的网络;继续采用第一数据集中的每种任务的样本对第二扰动子网络进行迭代训练,直至得到的损失小于等于预设损失阈值的情况下为止,得到最佳元知识。
需要说明的是,Meta-dropout是对网络结构进行一定的去除,增加了网络的动态性,应用在meta-training阶段。模型的训练装置可以针对任务知识训练过程中的中间生成的网络即第一子网络,和/或针对元知识训练过程中的中间生成的网络即第二子网络,进行元随机扰动后,再继续进行迭代训练。
在本公开实施例中,参数压缩处理的方式包括以下至少一种:
对每种任务对应的网络神经元进行丢弃;
对每种任务对应的网络的区域结构进行丢弃;
对每种任务对应的网络的最小结构进行丢弃。
在本公开实施例中,Meta-dropout可以是针对每种任务对应的网络中的结构进行丢弃的,其中,可以采用普通dropout(normal dropout)对每种任务对应的网络中的单个神经元进行丢弃,还可以采用spatial-dropout在某个维度上对进行神经元进行丢弃,也可以是采用dropblock针对每种任务对应的卷积神经网络的区域结构进行丢弃,或者是对每种任务对应的卷积神经网络的最小结构区域进行丢弃,本公开实施例不作限制丢弃方式,可以作用在每种任务对应的任何网络结构上。其中,一般实现Meta-dropout时,使用dropblock效果优于使用normal dropout,使用normal dropout优于使用spatial-dropout的效果的。
需要说明的是,针对网络神经元,每次丢弃可以以一定的概率来决定某个网络神经元需不需要丢弃,例如,以0.5的概率来决定每个网络神经元要不要丢弃。针对网络的区域结构,可以以卷积层的整个区域为单位进行去除,也可以以其他的区域结构为单位进行丢弃或去除,本公开实施例不作限制。
在本公开实施例中,采用什么丢弃方式实现Meta-dropout,以及在哪个网络层或者哪个网络结构本公开实施例也不作限制。详细的将在后续实施例的场景描述中进行说明。
S203、基于最佳元知识、每种任务的任务知识和任务损失函数,采用第二数据集中的每种任务的样本对最佳元知识对应的网络进行迭代训练,得到每种任务的最佳任务知识,以及与最佳任务知识对应的目标网络。
在实现了最佳元知识的获取后,神经网络的训练装置可以基于最佳元知识、每种任务的任务知识和任务损失函数,对第二数据集中的每种任务的样本对最佳元知识对应的网络进行迭代训练,得到每种任务的最佳任务知识,从而基于最假任务知识,得到与最佳任务 知识对应的目标网络。这与S103的实现一致,此处不再赘述。
可以理解的是,神经网络的训练装置在进行元学习的过程中通过增加Meta-dropout,在元训练过程中增加了网络的动态性,缓解网络或者模型在meta-training阶段中对大量的基础类别的过度拟合,提升模型的泛化性。
需要说明的是,在本公开实施例中,在meta-training阶段完成时,得到的最佳元知识可以理解为应用于不同场景中的默认的网络模型或网络结构,meta-test阶段得到的最佳任务知识则可以理解为当前场景下训练得到的最佳网络模型的参数,从而得到了目标网络,使用该训练好的目标网络就可以在当前场景下实现特定的任务了,例如少样本分类或者少样本检测等,本公开实施例不作限制。
下面以少样本检测和少样本分类的场景下,实现少样本检测任务和少样本分类任务为例进行说明。
在少样本检测场景中,采用使用VOC 2007测试数据集进行评估,使用VOC 2007支持数据集和VOC 2012查询数据集进行训练。假设从其20个对象类别中选择5个作为新类(用于meta-testing),同时保留其余15个作为基类(用于meta-training)。在使用VOC 2007测试数据集中的两组种类不同的测试数据集(split 1和split 2)来进行推理评估,例如,split 1(“bird”,“bus”,“cow”,“mbike”,“sofa”/rest)。split 2(“aero”,“bottle”,“cow”,“horse”,“sofa”/rest)。
基于Episode的少样本的训练过程中,使用Meta R-CNN,其中,使用ResNet-101作为Faster R-CNN的主干网络,优化策略遵循Meta R-CNN。
基于Pre-train finetune(预训练-微调)的少样本的预训练过程,使用TFA,其采用Faster R-CNN用作检测器,ResNet-101是主干网络。
需要说明的是,在少样本检测场景中,可以采用dropblock和normal dropout来实现Meta-dropout,但使用dropblock可以达到更高的准确性。
示例性的,基于ResNet101为代表的大模型的前提下,在Meta R-CNN中,Meta-dropout可以应用于Reset-101的第4层。在TFA中,Meta-dropout可以应用于Resnet-101的第3层或第4层。其中,keep prob和block size是drop-block中两个重要的超参数,分别设置为0.9和7,本公开实施例不作限制。其中,骨干网络的第3层或第4层的使用Meta-dropout的情况下,其应用于每个bottleneck block的最后一个卷积层。
表1
Figure PCTCN2022071370-appb-000025
如表1所示可知,基于Meta R-CNN,应用Meta-dropout得到批大小为1的Meta R-CNN*。在TFA上应用Meta-dropout得到TFA*。基于Meta R-CNN实现本公开实施例中的Episode的少样本训练方法和基于TFA实现本公开实施例中的Pre-train finetune的少样本训练方法,由表1中的Our Impl.(不加元-随机扰动)表示。
在基于Episode的少样本训练方法中,除了10-shot设置的新类别中的mAP与基线相当之外,本公开采用Meta R-CNN*实现的少样本训练出来的模型具有更明显的精度的 改进。并且Meta R-CNN*在split 1with 1-shot的设置中可以获得10%的改进。
在基于预训练-微调的少样本训练方法中,TFA*能够在所有设置中获得更高的准确度。一般来说,训练得到的模型在1-shot设置中获得了最大的改进,其次是3-shot,而训练得到的模型在10-shot设置中获得的改进最小。
由此可知,使用Meta-dropout来提高元知识的泛化能力,可以帮助模型在少样本检测的设置下实现更高的提升。
在少样本分类场景中,Caltech-UCSD Birds-200-2011(CUB)被用于细粒度分类中,其共包含200个类和11788张图像。本公开实施例中将200个类分别分为100个基础类、50个验证类和50个新类。其中,包含100个类别的mini-ImageNet是ImageNet的一个子集,其每个类别包含600张大小为84×84的图像。
在mini-ImageNet中选取的100个类可以分为64个训练类(对应支持数据集)、16个验证类(对应查询数据集)和20个测试类(对应测试数据集)。
在本公开实施例中,选择Baseline++作为一种有效的少样本分类方法中使用的网络。
在本公开实施例中,Baseline++基于CUB数据集训练了200次epoch,基于mini-ImageNet数据集训练400次epoch,以及采用四层卷积网络(Conv-4)用作主干网络。
如表2所示可知,通过在Baseline++上应用Meta-dropout构建出Baseline++*。在CUB数据集上测试时,使用块大小为7的dropblock,应用于四层卷积网络(Conv-4)的最后一个卷积层,批大小为16。
在mini-ImageNet数据集上进行测试时,使用normal dropout实现Meta-dropout,应用于最后一个卷积层之后的平坦层(一维特征),并且批大小为32。
表2
Figure PCTCN2022071370-appb-000026
需要说明的是,基于以Conv4为代表的小网络的前提下,Meta-dropout应用到最后一层的卷积层和最后一层的一维度的特征上。
由表2可知,采用本公开实施例提供的神经网络的训练方法训练得到的Baseline++(Our Impl.)在测试精度上优于Baseline++,并优于其他最先进的小样本图像分类算法(或模型)。采用本公开实施例提供的神经网络的训练方法训练得到的Baseline++*在精度上则更优于Baseline++(Our Impl.)。
在本公开实施例中,基于批大小为16的Baseline++(Our Impl.)在CUB数据集的5-shot比Meta-dropout的Baseline++*有显著的性能提高。
在本公开实施例中,应用Meta-dropout和dropout来表明提高元知识泛化能力的重要性。在使用normal dropout实现Meta-dropout和dropout的情况下,应用Meta-dropout可以实现在所有其他设置中的最佳性能。
在主干网络中应用Meta-dropout的具体位置的影响,使用normal dropout作为实现 Meta-dropout,并将其应用在最后一个卷积层和最后一个flatten层上的情况下,在最后一个一维特征上应用Meta-dropout比在最后一个卷积层上的性能更好。
在本公开实施例中,通过使用具有不同块大小的dropblock和具有不同位置的normal dropout来实现Meta-dropout的情况下。由于flatten层是1维的,dropblock只能用于最后一个卷积层。而normal dropout可以应用于最后卷积或最后一个平坦层。其中,使用块大小为7的dropblock适用于CUB数据集,可以获得最佳的性能。基于mini-ImageNet数据集的情况下,在flatten层上使用normal dropout可以获得最佳的性能。
在本公开实施例中,使用32作为批大小可以使Baseline++的性能最佳。
本公开实施例介绍了一个统一的元学习框架,集成了两种截然不同的小样本学习流,即基于Episode的和基于预训练-微调的小样本学习。其次,提出了一个简单、通用且有效的Meta-dropout来改进元知识在此框架中的泛化能力,并探索批量大小对小样本学习的显着影响。在少样本检测和少样本图像分类的任务中,采用本公开实施例提供的框架训练的模型对具有更高精度的性能优势。
图4为本公开实施例的提供的神经网络的训练装置的可选的结构组成示意图,如图4所示,所述神经网络的训练装置1包括:
训练部分10,被配置为基于初始元知识、每种任务的初始任务知识和任务损失函数,采用第一数据集中的每种任务的样本对初始网络进行迭代训练,得到所述每种任务的任务知识;所述第一数据集为包含初始任务类别的源域数据集;基于所述每种任务的任务知识、所述初始元知识和元损失函数,采用所述第一数据集中的各种任务对应的样本分别对所述初始网络进行迭代训练,得到最佳元知识;基于所述最佳元知识、所述每种任务的任务知识和所述任务损失函数,采用第二数据集中的每种任务的样本对所述最佳元知识对应的网络进行迭代训练,得到所述每种任务的最佳任务知识,以及与所述最佳任务知识对应的目标网络;其中,所述第二数据集为包含目标任务类别的目标域数据集,且所述初始任务类别中每种任务类别的样本量大于所述目标任务类别中每种任务类别的样本量。
在本公开的一些实施例中,所述装置1还包括:确定单元11;
所述训练部分10,还被配置为基于初始元知识和每种任务的初始任务知识,采用第一数据集中的每种任务的样本对所述初始网络进行迭代训练,得到第一训练结果;
所述确定部分11,被配置为基于所述任务损失函数和所述第一训练结果,在确定每个样本对应的任务损失之和最小时,确定所述每种任务的任务知识。
在本公开的一些实施例中,所述装置1还包括:确定单元11;
所述训练部分10,还被配置为基于所述每种任务的任务知识和所述初始元知识,采用所述第一数据集中的各种任务对应的样本分别对所述初始网络进行迭代训练,得到第二训练结果;
所述确定部分11,被配置为基于所述元损失函数和所述第二训练结果,在确定各种任务对应的元损失之和最小时,确定所述最佳元知识。
在本公开的一些实施例中,在基于片段的少量样本的训练过程中,所述第一数据集包括片段对应的查询数据集和支持数据集;所述支持数据集用于任务知识的训练过程中;所述查询数据集用于元知识的训练过程中;其中,一个片段表征一种任务;
所述查询数据集中包括M种任务,且每种任务下对应N个样本;所述支持数据集中包括H种任务,且每种任务下对应K个样本;所述M种任务与所述H种任务的种类不同;M为大于等于1的正整数,N为大于等于1的正整数;H为大于等于1的正整数,K为大于等于1的正整数;
在基于片段的少量样本的测试过程中,所述第二数据集为与所述第一数据集的初始任务类别不同的目标任务类别的数据集;其中,所述第二数据集Q种任务总数,且每种任务下对应少量样本;Q为大于等于1的正整数。
在本公开的一些实施例中,在基于预训练-微调的少量样本的预训练过程中,所述第一数据集包括:M种任务,且每种任务下对应N个样本;M为大于等于1的正整数,N为大于等于1的正整数;
在基于预训练-微调的少量样本的微调过程中,所述第二数据集为与所述第一数据集的初始任务类别不同的目标任务类别的数据集;其中,所述第二数据集Q种任务总数,且每种任务下对应少量样本;Q为大于等于1的正整数。
在本公开的一些实施例中,所述训练部分10,还被配置为所述基于初始元知识、每种任务的初始任务知识和任务损失函数,结合参数压缩处理,采用所述第一数据集中的每种任务的样本对所述初始网络进行迭代训练,得到所述每种任务的任务知识。
在本公开的一些实施例中,所述装置1还包括:获取部分12和参数压缩部分13;
所述训练部分10,还被配置为所述基于所述初始元知识和所述每种任务的初始任务知识,采用第一数据集中的每种任务的样本对所述初始网络进行训练,得到第一子训练结果;
所述获取部分12,还被配置为基于所述第一子训练结果和所述任务损失函数,得到第一子损失和第一子网络;
所述参数压缩部分13,被配置为在所述第一子损失大于预设损失阈值的情况下,对所述第一子网络进行元-随机扰动处理,得到第一扰动子网络;所述第一扰动子网络为初始网络经过参数压缩后的网络;
所述训练部分10,还被配置为继续采用第一数据集中的每种任务的样本对所述第一扰动子网络进行迭代训练,直至得到的损失小于等于所述预设损失阈值的情况下为止,得到所述每种任务的任务知识。
在本公开的一些实施例中,所述训练部分10,还被配置为基于所述每种任务的任务知识、所述初始元知识和元损失函数,结合参数压缩处理,采用所述第一数据集中的各种任务对应的样本分别对所述初始网络进行迭代训练,得到所述最佳元知识。
在本公开的一些实施例中,所述装置1还包括:获取部分12和参数压缩部分13;
所述训练部分10,还被配置为所述基于所述初始元知识和所述每种任务的任务知识,采用所述第一数据集中的每种任务的样本对所述初始网络进行训练,得到第二子训练结果;
所述获取部分12,被配置为基于所述第二子训练结果和所述元损失函数,得到第二子损失和第二子网络;
所述参数压缩部分13,被配置为在所述第二子损失大于预设损失阈值的情况下,对所述第二子网络进行元-随机扰动处理,得到第二扰动子网络;所述第二扰动子网络为初始网络经过参数压缩后的网络;
所述训练部分10,还被配置为继续采用第一数据集中的每种任务的样本对所述第二扰动子网络进行迭代训练,直至得到的损失小于等于所述预设损失阈值的情况下为止,得到所述最佳元知识。
在本公开的一些实施例中,所述参数压缩处理的方式包括以下至少一种:
对每种任务对应的网络神经元进行丢弃;
对每种任务对应的网络的区域结构进行丢弃;
对每种任务对应的网络的最小结构区域进行丢弃。
在本公开实施例以及其他的实施例中,“部分”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是单元,还可以是模块也可以是非模块化的。
可以理解的是,由于在神经网络的训练方法中,可以在统一的少量样本学习框架下实现,通过先对基于每种任务下的样本数量分布的训练,进行任务知识的确定,再基于任务知识和任务分布,对所有任务在第一数据集中的样本进行训练,得到最佳元知识,以便实现后续基于最佳元知识进行少量样本的训练数据集的训练的,进而得到与最佳任务知识对应的目标网络,以实现特定任务的测试和推理。这样,无论采用哪种少量样本学校方法, 均可在此框架下实现少量样本训练,从而提高了进行少量样本训练的泛化性。
图5为本公开实施例的神经网络的训练装置的结构组成示意图,如图5所示,神经网络的训练装置,包括:
存储器14,用于存储计算机程序;
处理器15,用于执行所述存储器14中存储的计算机程序时,实现上述神经网络的训练方法。
本公开实施例提供一种计算机可读存储介质,其上存储有计算机程序,用于被处理器执行时,实现上述神经网络的训练方法。
可以理解,神经网络的训练装置还包括总线系统16;神经网络的训练装置中的各个组件通过总线系统16耦合在一起。可理解,总线系统16用于实现这些组件之间的连接通信。总线系统16除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。
存储器14配置为存储由处理器15计算机程序和应用,还可以缓存待处理器以及目标检测设备中各模块待处理或已经处理的数据(例如,图像数据、音频数据、语音通信数据和视频通信数据),可以通过闪存(FLASH)或随机访问存储器(Random Access Memory,RAM)实现。
处理器15执行程序时实现上述任一项神经网络的训练方法的步骤。处理器15通常控制神经网络的训练装置的总体操作。
上述处理器可以为特定用途集成电路(Application Specific Integrated Circuit,ASIC)、数字信号处理器(Digital Signal Processor,DSP)、数字信号处理装置(Digital Signal Processing Device,DSPD)、可编程逻辑装置(Programmable Logic Device,PLD)、现场可编程门阵列(Field Programmable Gate Array,FPGA)、中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器中的至少一种。可以理解地,实现上述处理器功能的电子器件还可以为其它,本公开实施例不作限制。
上述计算机可读存储介质/存储器可以是只读存储器(Read Only Memory,ROM)、可编程只读存储器(Programmable Read-Only Memory,PROM)、可擦除可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM)、电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、磁性随机存取存储器(Ferromagnetic Random Access Memory,FRAM)、快闪存储器(Flash Memory)、磁表面存储器、光盘、或只读光盘(Compact Disc Read-Only Memory,CD-ROM)等存储器;也可以是包括上述存储器之一或任意组合的各种终端,如移动电话、计算机、平板设备、个人数字助理等。
本公开实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行本公开实施例上述的神经网络的训练方法。
在一些实施例中,可执行指令可以采用程序、软件、软件模块、脚本或代码的形式,按任意形式的编程语言(包括编译或解释语言,或者声明性或过程性语言)来编写,并且其可按任意形式部署,包括被部署为独立的程序或者被部署为模块、组件、子例程或者适合在计算环境中使用的其它单元。
作为示例,可执行指令可以但不一定对应于文件系统中的文件,可以可被存储在保存其它程序或数据的文件的一部分,例如,存储在超文本标记语言(HTML,Hyper Text Markup Language)文档中的一个或多个脚本中,存储在专用于所讨论的程序的单个文件中,或者,存储在多个协同文件(例如,存储一个或多个模块、子程序或代码部分的文件)中。
作为示例,可执行指令可被部署为在一个计算设备上执行,或者在位于一个地点的多个计算设备上执行,又或者,在分布在多个地点且通过通信网络互连的多个计算设备上执 行。
这里需要指出的是:以上存储介质和设备实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本公开存储介质和设备实施例中未披露的技术细节,请参照本公开方法实施例的描述而理解。
以上所述,仅为本公开的实施例而已,并非用于限定本公开的保护范围。凡在本公开的精神和范围之内所作的任何修改、等同替换和改进等,均包含在本公开的保护范围之内。
工业实用性
本公开实施例中,由于在神经网络的训练方法中,可以在统一的少量样本学习框架下实现,通过先对基于每种任务下的样本数量分布的训练,进行任务知识的确定,再基于任务知识和任务分布,对所有种类的各种任务在第一数据集中的样本进行训练,得到最佳元知识,以便实现后续基于最佳元知识进行少量样本的训练数据集的训练的,进而得到与最佳任务知识对应的目标网络,以实现特定任务的测试和推理。这样,无论采用哪种少量样本学校方法,均可在此框架下实现少量样本训练,从而提高了进行少量样本训练的泛化性。基于目标业务分析类型,对待处理视频帧序列进行人物识别,得到与目标业务分析类型对应的人物状态的识别结果,人物状态表征与目标业务分析类型对应的人体状态和表情状态中的至少一个;再根据识别结果,记录表征人物状态的变化的金融业务场景数据;如此,终端可以根据金融业务场景数据进行目标业务状态分析;从而提高了业务状态分析的效率和准确性。

Claims (23)

  1. 一种神经网络的训练方法,所述方法包括:
    基于初始元知识、每种任务的初始任务知识和任务损失函数,采用第一数据集中的每种任务的样本对初始网络进行迭代训练,得到所述每种任务的任务知识;其中,所述第一数据集为包含初始任务类别的源域数据集;
    基于所述每种任务的任务知识、所述初始元知识和元损失函数,采用所述第一数据集中的各种任务对应的样本分别对所述初始网络进行迭代训练,得到最佳元知识;
    基于所述最佳元知识、所述每种任务的任务知识和所述任务损失函数,采用第二数据集中的每种任务的样本对最佳元知识对应的网络进行迭代训练,得到所述每种任务的最佳任务知识,以及与所述最佳任务知识对应的目标网络;其中,所述第二数据集为包含目标任务类别的目标域数据集,且所述初始任务类别中每种任务类别的样本量大于所述目标任务类别中每种任务类别的样本量。
  2. 根据权利要求1所述的方法,其中,所述基于初始元知识、每种任务的初始任务知识和任务损失函数,采用第一数据集中的每种任务的样本对初始网络进行迭代训练,得到所述每种任务的任务知识,包括:
    基于初始元知识和每种任务的初始任务知识,采用第一数据集中的每种任务的样本对所述初始网络进行迭代训练,得到第一训练结果;
    基于所述任务损失函数和所述第一训练结果,在确定每个样本对应的任务损失之和最小时,确定所述每种任务的任务知识。
  3. 根据权利要求1所述的方法,其中,所述基于所述每种任务的任务知识、所述初始元知识和元损失函数,采用所述第一数据集中的各种任务对应的样本分别对所述初始网络进行迭代训练,得到最佳元知识,包括:
    基于所述每种任务的任务知识和所述初始元知识,采用所述第一数据集中的各种任务对应的样本分别对所述初始网络进行迭代训练,得到第二训练结果;
    基于所述元损失函数和所述第二训练结果,在确定各种任务对应的元损失之和最小时,确定所述最佳元知识。
  4. 根据权利要求1至3任一项所述的方法,其中,
    在基于片段的少量样本的训练过程中,所述第一数据集包括片段对应的查询数据集和支持数据集;所述支持数据集用于任务知识的训练过程中;所述查询数据集用于元知识的训练过程中;其中,一个片段表征一种任务;
    所述查询数据集中包括M种任务,且每种任务下对应N个样本;所述支持数据集中包括H种任务,且每种任务下对应K个样本;所述M种任务与所述H种任务的种类不同;M为大于等于1的正整数,N为大于等于1的正整数;H为大于等于1的正整数,K为大于等于1的正整数;
    在基于片段的少量样本的测试过程中,所述第二数据集为与所述第一数据集的初始任务类别不同的目标任务类别的数据集;其中,所述第二数据集种任务总数,且每种任务下对应少量样本;为大于等于1的正整数。
  5. 根据权利要求1至3任一项所述的方法,其中,
    在基于预训练-微调的少量样本的预训练过程中,所述第一数据集包括:M种任务,且每种任务下对应N个样本;M为大于等于1的正整数,N为大于等于1的正整数;
    在基于预训练-微调的少量样本的微调过程中,所述第二数据集为与所述第一数据集的初始任务类别不同的目标任务类别的数据集;其中,所述第二数据集种任务总数,且每 种任务下对应少量样本;为大于等于1的正整数。
  6. 根据权利要求1至5任一项所述的方法,其中,所述基于初始元知识、每种任务的初始任务知识和任务损失函数,采用第一数据集中的每种任务的样本对初始网络进行迭代训练,得到所述每种任务的任务知识,包括:
    所述基于初始元知识、每种任务的初始任务知识和任务损失函数,结合参数压缩处理,采用所述第一数据集中的每种任务的样本对所述初始网络进行迭代训练,得到所述每种任务的任务知识。
  7. 根据权利要求6所述的方法,其中,所述基于初始元知识、每种任务的初始任务知识和任务损失函数,结合参数压缩处理,采用所述第一数据集中的每种任务的样本对所述初始网络进行迭代训练,得到所述每种任务的任务知识,包括:
    所述基于所述初始元知识和所述每种任务的初始任务知识,采用第一数据集中的每种任务的样本对所述初始网络进行训练,得到第一子训练结果;
    基于所述第一子训练结果和所述任务损失函数,得到第一子损失和第一子网络;
    在所述第一子损失大于预设损失阈值的情况下,对所述第一子网络进行元-随机扰动处理,得到第一扰动子网络;所述第一扰动子网络为初始网络经过参数压缩后的网络;
    继续采用第一数据集中的每种任务的样本对所述第一扰动子网络进行迭代训练,直至得到的损失小于等于所述预设损失阈值的情况下为止,得到所述每种任务的任务知识。
  8. 根据权利要求1至6任一项所述的方法,其中,所述基于所述每种任务的任务知识、所述初始元知识和元损失函数,采用所述第一数据集中的各种任务对应的样本分别对所述初始网络进行迭代训练,得到最佳元知识,包括:
    基于所述每种任务的任务知识、所述初始元知识和元损失函数,结合参数压缩处理,采用所述第一数据集中的各种任务对应的样本分别对所述初始网络进行迭代训练,得到所述最佳元知识。
  9. 根据权利要求8所述的方法,其中,所述基于所述每种任务的任务知识、所述初始元知识和元损失函数,结合参数压缩处理,采用所述第一数据集中的各种任务对应的样本分别对所述初始网络进行迭代训练,得到所述最佳元知识,包括:
    所述基于所述初始元知识和所述每种任务的任务知识,采用所述第一数据集中的每种任务的样本对所述初始网络进行训练,得到第二子训练结果;
    基于所述第二子训练结果和所述元损失函数,得到第二子损失和第二子网络;
    在所述第二子损失大于预设损失阈值的情况下,对所述第二子网络进行元-随机扰动处理,得到第二扰动子网络;所述第二扰动子网络为初始网络经过参数压缩后的网络;
    继续采用第一数据集中的每种任务的样本对所述第二扰动子网络进行迭代训练,直至得到的损失小于等于所述预设损失阈值的情况下为止,得到所述最佳元知识。
  10. 根据权利要求6至9任一项所述的方法,其中,所述参数压缩处理的方式包括以下至少一种:
    对每种任务对应的网络神经元进行丢弃;
    对每种任务对应的网络的区域结构进行丢弃;
    对每种任务对应的网络的最小结构区域进行丢弃。
  11. 一种神经网络的训练装置,所述装置包括:
    训练部分,被配置为基于初始元知识、每种任务的初始任务知识和任务损失函数,采用第一数据集中的每种任务的样本对初始网络进行迭代训练,得到所述每种任务的任务知识;所述第一数据集为包含初始任务类别的源域数据集;基于所述每种任务的任务知识、所述初始元知识和元损失函数,采用所述第一数据集中的各种任务对应的样本分别对所述初始网络进行迭代训练,得到最佳元知识;基于所述最佳元知识、所述每种任务的任务知 识和所述任务损失函数,采用第二数据集中的每种任务的样本对最佳元知识对应的网络进行迭代训练,得到所述每种任务的最佳任务知识,以及与所述最佳任务知识对应的目标网络;其中,所述第二数据集为包含目标任务类别的目标域数据集,且所述初始任务类别中每种任务类别的样本量大于所述目标任务类别中每种任务类别的样本量。
  12. 根据权利要求11所述的装置,其中,所述装置还包括:确定部分;
    所述训练部分,还被配置为基于初始元知识和每种任务的初始任务知识任务,采用第一数据集中的每种任务的样本对所述初始网络进行迭代训练,得到第一训练结果;
    所述确定部分,被配置为基于所述任务损失函数和所述第一训练结果,在确定每个样本对应的任务损失之和最小时,确定所述每种任务的任务知识。
  13. 根据权利要求11所述的装置,其中,所述装置还包括:确定部分;
    所述训练部分,还被配置为基于所述每种任务的任务知识和所述初始元知识,采用所述第一数据集中的各种任务对应的样本分别对所述初始网络进行迭代训练,得到第二训练结果;
    所述确定部分,被配置为基于所述元损失函数和所述第二训练结果,在确定各种任务对应的元损失之和最小时,确定所述最佳元知识。
  14. 根据权利要求11至13任一项所述的装置,其中,在基于片段的少量样本的训练过程中,所述第一数据集包括片段对应的查询数据集和支持数据集;所述支持数据集用于任务知识的训练过程中;所述查询数据集用于元知识的训练过程中;其中,一个片段表征一种任务;
    所述查询数据集中包括M种任务,且每种任务下对应N个样本;所述支持数据集中包括H种任务,且每种任务下对应K个样本;所述M种任务与所述H种任务的种类不同;M为大于等于1的正整数,N为大于等于1的正整数;H为大于等于1的正整数,K为大于等于1的正整数;
    在基于片段的少量样本的测试过程中,所述第二数据集为与所述第一数据集的初始任务类别不同的目标任务类别的数据集;其中,所述第二数据集种任务总数,且每种任务下对应少量样本;为大于等于1的正整数。
  15. 根据权利要求11至13任一项所述的装置,其中,在基于预训练-微调的少量样本的预训练过程中,所述第一数据集包括:M种任务,且每种任务下对应N个样本;M为大于等于1的正整数,N为大于等于1的正整数;
    在基于预训练-微调的少量样本的微调过程中,所述第二数据集为与所述第一数据集的初始任务类别不同的目标任务类别的数据集;其中,所述第二数据集种任务总数,且每种任务下对应少量样本;为大于等于1的正整数。
  16. 根据权利要求11-15任一项所述的装置,其中,所述训练部分,还被配置为所述基于初始元知识、每种任务的初始任务知识和任务损失函数,结合参数压缩处理,采用所述第一数据集中的每种任务的样本对所述初始网络进行迭代训练,得到所述每种任务的任务知识。
  17. 根据权利要求16所述的装置,其中,所述装置还包括:获取部分和参数压缩部分;
    所述训练部分,还被配置为所述基于所述初始元知识和所述每种任务的初始任务知识,采用第一数据集中的每种任务的样本对所述初始网络进行训练,得到第一子训练结果;
    所述获取部分,还被配置为基于所述第一子训练结果和所述任务损失函数,得到第一子损失和第一子网络;
    所述参数压缩部分,被配置为在所述第一子损失大于预设损失阈值的情况下,对所述第一子网络进行元-随机扰动处理,得到第一扰动子网络;所述第一扰动子网络为初始网络 经过参数压缩后的网络;
    所述训练部分,还被配置为继续采用第一数据集中的每种任务的样本对所述第一扰动子网络进行迭代训练,直至得到的损失小于等于所述预设损失阈值的情况下为止,得到所述每种任务的任务知识。
  18. 根据权利要求11-16任一项所述的装置,其中,所述训练部分,还被配置为基于所述每种任务的任务知识、所述初始元知识和元损失函数,结合参数压缩处理,采用所述第一数据集中的各种任务对应的样本分别对所述初始网络进行迭代训练,得到所述最佳元知识。
  19. 根据权利要求18所述的装置,其中,所述装置还包括:获取部分和参数压缩部分;
    所述训练部分,还被配置为所述基于所述初始元知识和所述每种任务的任务知识,采用所述第一数据集中的每种任务的样本对所述初始网络进行训练,得到第二子训练结果;
    所述获取部分,被配置为基于所述第二子训练结果和所述元损失函数,得到第二子损失和第二子网络;
    所述参数压缩部分,被配置为在所述第二子损失大于预设损失阈值的情况下,对所述第二子网络进行元-随机扰动处理,得到第二扰动子网络;所述第二扰动子网络为初始网络经过参数压缩后的网络;
    所述训练部分,还被配置为继续采用第一数据集中的每种任务的样本对所述第二扰动子网络进行迭代训练,直至得到的损失小于等于所述预设损失阈值的情况下为止,得到所述最佳元知识。
  20. 根据权利要求16-19任一项所述的装置,其中,所述参数压缩处理的方式包括以下至少一种:
    对每种任务对应的网络神经元进行丢弃;
    对每种任务对应的网络的区域结构进行丢弃;
    对每种任务对应的网络的最小结构进行丢弃。
  21. 一种神经网络的训练装置,包括:
    存储器,用于存储计算机程序;
    处理器,用于执行所述存储器中存储的计算机程序时,实现权利要求1至10任一项所述的方法。
  22. 一种计算机可读存储介质,其上存储有计算机程序,用于被处理器执行时,实现权利要求1至10任一项所述的方法。
  23. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行时实现权利要求1至10中任意一项所述的方法。
PCT/CN2022/071370 2021-09-14 2022-01-11 神经网络的训练方法及装置、存储介质和计算机程序 WO2023040147A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111083941.2A CN113837379A (zh) 2021-09-14 2021-09-14 神经网络的训练方法及装置、计算机可读存储介质
CN202111083941.2 2021-09-14

Publications (1)

Publication Number Publication Date
WO2023040147A1 true WO2023040147A1 (zh) 2023-03-23

Family

ID=78959362

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/071370 WO2023040147A1 (zh) 2021-09-14 2022-01-11 神经网络的训练方法及装置、存储介质和计算机程序

Country Status (2)

Country Link
CN (1) CN113837379A (zh)
WO (1) WO2023040147A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116503676A (zh) * 2023-06-27 2023-07-28 南京大数据集团有限公司 一种基于知识蒸馏小样本增量学习的图片分类方法及系统
CN116821699A (zh) * 2023-08-31 2023-09-29 山东海量信息技术研究院 一种感知模型训练方法、装置及电子设备和存储介质
CN116821907A (zh) * 2023-06-29 2023-09-29 哈尔滨工业大学 一种基于Drop-MAML的小样本学习入侵检测方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108898218A (zh) * 2018-05-24 2018-11-27 阿里巴巴集团控股有限公司 一种神经网络模型的训练方法、装置、及计算机设备
CN110598842A (zh) * 2019-07-17 2019-12-20 深圳大学 一种深度神经网络超参数优化方法、电子设备及存储介质
CN111724083A (zh) * 2020-07-21 2020-09-29 腾讯科技(深圳)有限公司 金融风险识别模型的训练方法、装置、计算机设备及介质
US20210103812A1 (en) * 2019-08-28 2021-04-08 Tata Consultancy Services Limited Method and system for training a neural network for time series data classification
US20210117776A1 (en) * 2019-10-22 2021-04-22 Baidu Usa Llc Method, electronic device and computer readable medium for information processing for accelerating neural network training

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949750B (zh) * 2021-03-25 2022-09-23 清华大学深圳国际研究生院 一种图像分类方法及计算机可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108898218A (zh) * 2018-05-24 2018-11-27 阿里巴巴集团控股有限公司 一种神经网络模型的训练方法、装置、及计算机设备
CN110598842A (zh) * 2019-07-17 2019-12-20 深圳大学 一种深度神经网络超参数优化方法、电子设备及存储介质
US20210103812A1 (en) * 2019-08-28 2021-04-08 Tata Consultancy Services Limited Method and system for training a neural network for time series data classification
US20210117776A1 (en) * 2019-10-22 2021-04-22 Baidu Usa Llc Method, electronic device and computer readable medium for information processing for accelerating neural network training
CN111724083A (zh) * 2020-07-21 2020-09-29 腾讯科技(深圳)有限公司 金融风险识别模型的训练方法、装置、计算机设备及介质

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116503676A (zh) * 2023-06-27 2023-07-28 南京大数据集团有限公司 一种基于知识蒸馏小样本增量学习的图片分类方法及系统
CN116503676B (zh) * 2023-06-27 2023-09-22 南京大数据集团有限公司 一种基于知识蒸馏小样本增量学习的图片分类方法及系统
CN116821907A (zh) * 2023-06-29 2023-09-29 哈尔滨工业大学 一种基于Drop-MAML的小样本学习入侵检测方法
CN116821907B (zh) * 2023-06-29 2024-02-02 哈尔滨工业大学 一种基于Drop-MAML的小样本学习入侵检测方法
CN116821699A (zh) * 2023-08-31 2023-09-29 山东海量信息技术研究院 一种感知模型训练方法、装置及电子设备和存储介质
CN116821699B (zh) * 2023-08-31 2024-01-19 山东海量信息技术研究院 一种感知模型训练方法、装置及电子设备和存储介质

Also Published As

Publication number Publication date
CN113837379A (zh) 2021-12-24

Similar Documents

Publication Publication Date Title
WO2023040147A1 (zh) 神经网络的训练方法及装置、存储介质和计算机程序
WO2020244261A1 (zh) 高分辨率遥感图像的场景识别系统及模型生成方法
WO2022042002A1 (zh) 一种半监督学习模型的训练方法、图像处理方法及设备
Baldominos et al. Evolutionary convolutional neural networks: An application to handwriting recognition
Lu et al. Brain intelligence: go beyond artificial intelligence
WO2020238293A1 (zh) 图像分类方法、神经网络的训练方法及装置
DE112020003127T5 (de) Erweiterung von dynamischem Verarbeitungselement-Array
WO2021238281A1 (zh) 一种神经网络的训练方法、图像分类系统及相关设备
WO2022042713A1 (zh) 一种用于计算设备的深度学习训练方法和装置
WO2021159714A1 (zh) 一种数据处理方法及相关设备
WO2017124646A1 (zh) 一种用于稀疏连接的人工神经网络计算装置和方法
WO2022001805A1 (zh) 一种神经网络蒸馏方法及装置
WO2021244249A1 (zh) 一种分类器的训练方法、数据处理方法、系统以及设备
CN110532417A (zh) 基于深度哈希的图像检索方法、装置及终端设备
Gikunda et al. State-of-the-art convolutional neural networks for smart farms: A review
WO2021218470A1 (zh) 一种神经网络优化方法以及装置
KR102250728B1 (ko) 샘플 처리 방법, 장치, 기기 및 저장 매체
US20190188526A1 (en) Fusing sparse kernels to approximate a full kernel of a convolutional neural network
CN107463935A (zh) 应用分类方法和应用分类装置
WO2024067884A1 (zh) 一种数据处理方法及相关装置
Bezak Building recognition system based on deep learning
US20200167655A1 (en) Method and apparatus for re-configuring neural network
CN112132281B (zh) 一种基于人工智能的模型训练方法、装置、服务器及介质
Wu et al. Heterogeneous representation learning and matching for few-shot relation prediction
WO2023197857A1 (zh) 一种模型切分方法及其相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22868528

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE