WO2018153201A1 - 深度学习训练方法及装置 - Google Patents

深度学习训练方法及装置 Download PDF

Info

Publication number
WO2018153201A1
WO2018153201A1 PCT/CN2018/073955 CN2018073955W WO2018153201A1 WO 2018153201 A1 WO2018153201 A1 WO 2018153201A1 CN 2018073955 W CN2018073955 W CN 2018073955W WO 2018153201 A1 WO2018153201 A1 WO 2018153201A1
Authority
WO
WIPO (PCT)
Prior art keywords
training data
data instance
instance
training
learning
Prior art date
Application number
PCT/CN2018/073955
Other languages
English (en)
French (fr)
Inventor
高燕
吕达
罗圣美
李伟华
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2018153201A1 publication Critical patent/WO2018153201A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass

Definitions

  • the present disclosure relates to the field of intelligent learning, and in particular, to a deep learning training method and apparatus.
  • deep learning is the focus of research in the field of artificial intelligence, and a large number of scholars and researchers have devoted themselves to promoting its rapid development. Despite the great achievements of deep learning, it still faces many difficulties. Compared with traditional methods, more data and deeper network structure are the biggest features of deep learning and the key to its success. But it also means that deep learning often requires more training storage space and more time. Training a deep learning model usually takes days or even months, so in order to save time costs, how to accelerate the training process has become an important research direction.
  • the present disclosure provides a deep learning training method and apparatus capable of accelerating convergence of a deep learning model.
  • a deep learning training method includes determining a loss value of each training data instance in a batch training data instance during a forward propagation process of each iterative training; Determining all difficult instances from the batch training data instances according to the loss values of the training data instances; and discarding the features of the non-difficult instances, and learning the characteristics of all the difficult instances.
  • a deep learning training apparatus comprising: a loss determination module configured to determine each of the batch training data instances during each forward propagation of the iterative training a loss value of the training data instance; an instance selection module configured to determine, in each iterative training, all difficult instances from the batch training data instance according to the loss value of each training data instance; and a learning module, the construction thereof To abandon the learning of features of non-difficult instances, learn the characteristics of all the difficult examples described.
  • FIG. 1 illustrates a main flowchart of a deep learning training method according to an exemplary embodiment of the present disclosure
  • FIG. 2 illustrates a detailed flowchart of a deep learning training method according to an exemplary embodiment of the present disclosure
  • FIG. 3 is a block diagram showing a structure of a deep learning training device according to an exemplary embodiment of the present disclosure.
  • the present disclosure provides a deep learning training method and apparatus.
  • the present disclosure will be further described in detail below in conjunction with the drawings and exemplary embodiments. It is understood that the exemplary embodiments described herein are merely illustrative of the disclosure and are not intended to be limiting.
  • a deep learning training method As shown in FIG. 1, the method may include:
  • this step S101 may include the following sub-steps:
  • Sub-step S1011 according to the task requirement, acquiring a sufficient amount of training samples (ie, training data instances or data instances), and performing filtering, processing, enhancing, equalizing, labeling, and the like on the acquired training samples to construct a training sample set;
  • a sufficient amount of training samples ie, training data instances or data instances
  • Sub-step S1012 selecting a depth network model structure, setting corresponding training parameters, and initializing a deep network model
  • Sub-step S1013 a certain number of training samples are grouped into a BATCH (Batch Training Data Instance) and sent to the deep network for calculation, and the classified calculation value Xc of each sample data in the BATCH is obtained;
  • BATCH Batch Training Data Instance
  • Sub-step S1014 comparing the real label XT of each sample, calculates the Loss value L of each sample.
  • the method of calculating the Loss value L is:
  • the method may further include:
  • An exemplary embodiment of the present disclosure obtains a data instance (ie, a difficult instance) having a large effect on the iteration by calculating a loss of the training data instance in the training iteration (the difference between the actual output of the data instance and the ideal output), and is used for Training the model, that is, focusing on difficult examples, thus speeding up the convergence of the model.
  • a loss of the training data instance in the training iteration the difference between the actual output of the data instance and the ideal output
  • the problem of imbalance of training data in practical problems is effectively improved.
  • the exemplary embodiment of the present disclosure improves the existing training learning method by analyzing the model training data, and can be used in combination with various existing optimization solving methods, and can be integrated into the current deep learning frameworks.
  • step S102 may include the following sub-steps:
  • Sub-step S1021 comparing the magnitude relationship between the loss value of the training data instance and the preset threshold ⁇ 1 for any training data instance; if the loss value is not less than the preset threshold ⁇ 1 , determining that the training data instance is Difficult instance;
  • Sub-step S1022 traversing the batch training data instance to obtain all difficult instances.
  • the Loss value L of each training sample in the BATCH is compared with the threshold ⁇ 1 . If L exceeds the threshold ⁇ 1 , it is determined that the training sample is a difficult instance for the current learning. Otherwise, discard this training sample.
  • the exemplary embodiments of the present disclosure further accelerate the convergence of the deep learning model.
  • step S103 includes the following sub-steps:
  • Sub-step S1031 determining an average loss value of the batch training data instance
  • Sub-step S1032 comparing the average loss value with a preset threshold ⁇ 2 ;
  • Sub-step S1033 if the average loss value exceeds the preset threshold ⁇ 2 , abandon the feature of learning the non-difficult instance, and learn the features of all the difficult instances, or if the average loss value does not exceed the pre- Setting the threshold ⁇ 2 , the feature of the batch training data instance is discarded.
  • the step of learning the features of all of the difficult instances includes: back-propagating the loss values of the difficult instances at the time of learning; and adjusting the network parameters for training based on the respective loss values.
  • the preset threshold ⁇ 2 is smaller than the preset threshold ⁇ 1 .
  • the average loss value Lavg is the sum of the Loss values of all samples in the BATCH sample divided by the number of samples N in the BATCH. The following formula can be used to calculate the average loss value Lavg.
  • the preset threshold ⁇ 1 and the preset threshold ⁇ 1 are obtained in such a manner that, for any training data instance, a pre-determination of the training data instance is determined according to a class probability of the training data instance.
  • the threshold ⁇ 1 is set; and the preset threshold ⁇ 2 is determined according to any training data instance preset threshold ⁇ 1 .
  • the preset thresholds ⁇ 1 and ⁇ 2 are determined by the selected Loss calculation formula and the size of the BATCH.
  • ⁇ 1 is the single sample evaluation threshold
  • ⁇ 2 is the evaluation threshold of the entire BATCH
  • N is the number of samples in a BATCH.
  • the exemplary embodiments of the present disclosure design a deep learning acceleration convergence method based on data analysis, and are applicable to various deep learning open source frameworks.
  • the method mainly includes data preprocessing and deep learning training.
  • data preprocessing part data enhancement is performed by using various image transformation methods, thereby greatly expanding the data and increasing the diversity of the data.
  • the deep learning training part combined with the support vector idea, the convergence is accelerated by analyzing the loss of data.
  • the exemplary embodiment of the present disclosure is based on analyzing data in the training process, and the training is concentrated on the difficult data instance (large loss) according to the loss of data in each iteration, thereby speeding up the convergence.
  • the exemplary embodiment of the present disclosure distinguishes data according to the magnitude of loss of training data, so that training is more targeted than existing learning methods that do not distinguish training data.
  • the existing network training method uses all data for learning, resulting in a problem of unbalanced training data in actual use, so that the training of the learning model tends to be more data types of data, and exemplary embodiments of the present disclosure The problem has been curbed, and the training effect has been improved to some extent.
  • the experimental data used ImageNet dataset the data set training pictures a total of 1.2 million, divided into 1000 categories, each class of 1200 samples.
  • the classification task for the ImageNet image recognition competition is performed using the deep learning training method according to the present disclosure, and is compared with the existing Caffe (Convolutional Neural Network Framework) open source framework training method.
  • the method is mainly divided into two major processes: data preprocessing, deep learning training. The specific content of each process will be described separately in conjunction with the experiment.
  • Data preprocessing is a necessary process for data analysis and learning tasks.
  • the task data set such as classification and labeling of data has been completed, so the key requirement is data enhancement.
  • Data enhancements to the sample eg, using random cropping, mirroring, etc.
  • the image resolution is adjusted to 256 ⁇ 256, and the data is finally saved to the lmdb file format for Caffe call.
  • the method is mainly directed to an improvement of the process, and iterative learning is performed by discriminating data according to the loss size of the training data instance.
  • This process mainly involves obtaining a depth model through deep network training.
  • the training process of training through a deep network includes the following steps:
  • SoftmaxLoss is an input of the Softmax function as a cross-loss function.
  • the calculation formula is as follows:
  • Softmax calculates the probability that a data instance belongs to each category.
  • the loss of the data instance can be calculated according to the above formula (1).
  • ⁇ 1 is a loss determination threshold based on a single instance, the value of which is determined according to the above formula (3). In this experiment, a is set to 0.99, and the calculated value of ⁇ 1 is 0.01.
  • ⁇ 2 is used to determine the average loss of the batch data, and ⁇ 2 should be less than ⁇ 1 in consideration of preventing the loss of the individual instance from affecting the overall average loss. As the number of samples N increases, the influence gradually decreases, and ⁇ 2 also approaches ⁇ 1 , so ⁇ 2 is determined by the above formula (4), and the calculated value of ⁇ 2 is 9.9 ⁇ 10 -3 .
  • step (3) If the termination condition is not reached, return to step (3) to continue training. The termination condition is reached and the learning process is ended.
  • the deep learning training portion concentrates training learning on difficult instances by controlling a single data instance and a batch data instance.
  • the loss value L calculated by the formula (1) of the single data instance is compared with the threshold ⁇ 1 , and if L is greater than the threshold ⁇ 1 , the data instance is used for training learning, and conversely, in this iteration The data instance is ignored, ie its back propagation gradient is zero.
  • the loss average value Lavg calculated by the formula (2) of the entire batch data is compared with the threshold value ⁇ 2 , and if the Lavg is greater than the threshold value ⁇ 2 , the back propagation is performed, otherwise, the batch data is cancelled. Not for learning.
  • the deep learning training apparatus may include:
  • a loss determination module 310 configured to determine a loss value of each training data instance in the batch training data instance during a forward propagation process of each iterative training
  • An instance selection module 320 configured to determine all difficult instances from the batch training data instances based on a loss value for each training data instance
  • a learning module 330 is configured to abandon the learning of features of non-difficult instances while learning the features of all of the difficult instances.
  • An exemplary embodiment of the present disclosure obtains a large-acting data instance for the iteration by calculating a loss value of the training data instance in the training iteration, and is used to train the model, that is, to focus on training difficult instances, thereby speeding up The convergence speed of the model. At the same time, because the learning training process ignores the useless data instance, it effectively improves the imbalance of training data in practical problems.
  • the exemplary embodiment of the present disclosure improves the existing training learning method by analyzing the model training data, and can be used in combination with various existing optimization solving methods, and can be integrated into the current deep learning frameworks.
  • the instance selection module 320 is configured to: compare, for any training data instance, a magnitude relationship between a loss value of the training data instance and a preset threshold ⁇ 1 ; if the loss value is not less than The preset threshold ⁇ 1 , then the training data instance is a difficult instance; and traversing the batch training data instance to obtain all difficult instances.
  • the apparatus further includes a determining module configured to:
  • the preset threshold ⁇ 2 triggers the learning module to abandon the feature of learning the batch training data instance.
  • the preset threshold ⁇ 2 is smaller than the preset threshold ⁇ 1 .
  • the apparatus further includes a threshold setting module configured to:
  • the apparatus further includes a parameter adjustment module configured to:
  • the network parameters for training are adjusted according to the respective loss values.
  • the deep learning training method exemplarily described above may be implemented by hardware, a software module executed by a processor, or a combination of the two.
  • one or more of the functional block diagrams and/or functional blocks in the functional block diagrams shown in the figures may be corresponding to the various software modules of the computer program flow or to the respective hardware modules.
  • These software modules may correspond to the respective steps shown in the figures.
  • These hardware modules can be implemented, for example, by curing these software modules using a Field Programmable Gate Array (FPGA).
  • FPGA Field Programmable Gate Array
  • the software module can reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable hard drive, CD-ROM, or any other form of storage medium known in the art.
  • a storage medium can be coupled to the processor to enable the processor to read information from, and write information to, the storage medium; or the storage medium can be an integral part of the processor.
  • the processor and the storage medium may be located in an application specific integrated circuit.
  • the software module can be stored in the memory of the mobile terminal or in a memory card that can be inserted into the mobile terminal. For example, if the mobile terminal uses a larger capacity MEGA-SIM card or a large-capacity flash memory device, the software module can be stored in the MEGA-SIM card or a large-capacity flash memory device.
  • One or more of the functional block diagrams depicted in the figures and/or one or more combinations of functional blocks may be implemented as a general purpose processor, digital signal processor (DSP), for performing the functions described herein, An application specific integrated circuit (ASIC), field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, or any suitable combination thereof. It is also possible to implement a combination of one or more of the functional blocks and/or a functional block diagram depicted in the figures as a combination of computer devices, for example, a combination of a DSP and a microprocessor, a plurality of microprocessors One or more microprocessors in conjunction with DSP communications or any other such configuration.

Abstract

本公开提供一种深度学习训练方法及装置。所述深度学习训练方法包括:在每次迭代训练的前向传播过程中,确定批量训练数据实例中的每个训练数据实例的损失值;根据各训练数据实例的损失值,从所述批量训练数据实例中确定出所有困难实例;以及放弃学习非困难实例的特征,而学习所述所有困难实例的特征。摘图1

Description

深度学习训练方法及装置 技术领域
本公开涉及智能学习领域,特别涉及一种深度学习训练方法及装置。
背景技术
目前,深度学习是人工智能领域中研究重点,大量的学者和研究人员投身其中,推动着其迅速发展。尽管深度学习取得了极大的成就,但其依旧面临着很多难题。相比传统方法,更多的数据和更深的网络结构是深度学习最大的特色,也是其取得成功的关键。但这也意味着深度学习往往需要更大的训练存储空间和更多的时间。训练一个深度学习的模型通常需要数天乃至数个月的时间,因而为了节约时间成本,如何加速训练过程成为当下的一个重要研究方向。
对于加速训练,一般在硬件方面采用GPU加速和集群计算,在算法上采用数据并行和模型并行方案。现有方案虽然加快了深度网络的训练迭代速度,但仍然面临着模型收敛较慢的问题。
发明内容
本公开提供一种能够加快深度学习模型收敛的深度学习训练方法及装置。
根据本公开的一个方面,提供一种深度学习训练方法,该深度学习训练方法包括:在每次迭代训练的前向传播过程中,确定批量训练数据实例中的每个训练数据实例的损失值;根据各训练数据实例的损失值,从所述批量训练数据实例中确定出所有困难实例;以及放弃学习非困难实例的特征,而学习所述所有困难实例的特征。
根据本公开的另一方面,提供一种深度学习训练装置,该深度学习训练装置包括:损失确定模块,其构造为在每次迭代训练的前向传播过程中,确定批量训练数据实例中每个训练数据实例的损失值;实例选择模块,其构造为在每次迭代训练中,根据各训练数据实例的损 失值,从所述批量训练数据实例中确定出所有困难实例;以及学习模块,其构造为放弃学习非困难实例的特征,而学习所述所有困难实例的特征。
附图说明
图1示出了根据本公开的示例性实施例的深度学习训练方法的主流程图;
图2示出了根据本公开的示例性实施例的深度学习训练方法的详细流程图;以及
图3是示出了根据本公开的示例性实施例的深度学习训练装置的结构示意图。
具体实施方式
对于深度学习的网络训练而言,加快网络收敛相较于单纯加速更为重要。因此基于训练数据考虑,为了至少解决现有深度学习领域中深度学习模型收敛较慢的问题,本公开提供了一种深度学习训练方法及装置。下文将结合附图以及示例性实施例对本公开进行进一步详细说明。应当理解,本文所描述的示例性实施例仅用以解释本公开,并不限定本公开。
根据本公开的一个方面,提供一种深度学习训练方法。如图1所示,所述方法可以包括:
S101,在每次迭代训练的前向传播过程中,确定批量训练数据实例中的每个训练数据实例的损失值。
具体地说,本步骤S101可以包括以下子步骤:
子步骤S1011,根据任务要求,获取足量的训练样本(即训练数据实例或数据实例),并对所获取的训练样本进行筛选、处理、增强、均衡、标记标签等操作,构建训练样本集;
子步骤S1012,选定深度网络模型结构,设定相应的训练参数,初始化深度网络模型;
子步骤S1013,将一定数量的训练样本组成一个BATCH(批量训 练数据实例)送入深度网络进行计算,得到此BATCH中每个样本数据的分类计算值Xc;
子步骤S1014,对比每个样本的真实标签XT,计算每个样本的Loss(损失)值L。其中,计算Loss值L的方法为:
L=-log[softmax(a k)]k为该实例的真实类别   (1)
其中,a为类别概率,softmax(a k)为交叉损失函数。
如图1所示,所述方法还可以包括:
S102,根据各训练数据实例的损失值,从所述批量训练数据实例中确定出所有困难实例;以及
S103,放弃学习非困难实例的特征,而学习所述所有困难实例的特征。
本公开的示例性实施例通过计算训练迭代中训练数据实例的损失(数据实例的实际输出与理想输出的差距)值,获得对该次迭代具有较大作用数据实例(即困难实例),并用于对模型进行训练,也就是说,集中训练困难实例,由此加快了模型的收敛速度。同时,由于学习训练过程忽略了无用数据实例(即非困难实例),因此有效地改善了实际问题中训练数据不平衡的问题。本公开的示例性实施例通过对模型训练数据的分析,对现有的训练学习方法进行改进,而且可结合现有各种优化求解方法使用,并可以融合进当前的各个深度学习框架中。
在上述示例性实施例的基础上,进一步提出上述示例性实施例的变型实施例,在此需要说明的是,为了使描述简要,在各变型实施例中仅描述与上述示例性实施例的不同之处。
在本公开的示例性实施例中,步骤S102可以包括以下子步骤:
子步骤S1021,针对任一训练数据实例,对比该训练数据实例的损失值和预设阈值θ 1的大小关系;若该损失值不小于所述预设阈值θ 1,则确定该训练数据实例为困难实例;以及
子步骤S1022,遍历所述批量训练数据实例,以获得所有困难实例。
详细地说,将BATCH中的每个训练样本的Loss值L与阈值θ 1进行对比。若L超过阈值θ 1,则确定此训练样本为困难实例,用于本次学习。反之,则舍弃此训练样本。
本公开的示例性实施例进一步加速了深度学习模型的收敛。
在本公开的示例性实施例中,步骤S103包括以下子步骤:
子步骤S1031,确定所述批量训练数据实例的平均损失值;
子步骤S1032,将所述平均损失值与预设阈值θ 2进行比较;以及
子步骤S1033,若所述平均损失值超过所述预设阈值θ 2,则放弃学习非困难实例的特征,而学习所述所有困难实例的特征,或者若所述平均损失值未超过所述预设阈值θ 2,则放弃学习所述批量训练数据实例的特征。
在本公开的示例性实施例中,学习所述所有困难实例的特征的步骤包括:在学习时,将各困难实例的损失值反向传播;以及根据各损失值调整用于训练的网络参数。
在本公开的示例性实施例中,所述预设阈值θ 2小于所述预设阈值θ 1
具体地说,将一个BATCH训练样本的平均损失值Lavg与阈值θ 2进行对比,若Lavg超过阈值,则认为此BATCH中绝大多数训练样本为困难实例,将Loss值进行反向传播,微调网络参数,对模型进行训练;若Lavg未超过阈值,则认为此BATCH训练样本几乎均为非困难实例,所得Loss值不进行反向传播,舍弃该BATCH,阻止模型学习此BATCH中的训练样本特征。由此实现进一步加速。
在本公开的示例性实施例中,平均损失值Lavg为BATCH样本中所有样本的Loss值之和除以BATCH中的样本数量N。以下公式可以用于计算平均损失值Lavg。
Figure PCTCN2018073955-appb-000001
其中,a为类别概率,softmax(a i_k)为交叉损失函数。
在本公开的示例性实施例中,预设阈值θ 1和预设阈值θ 1以这样的方式获得:针对任一训练数据实例,根据该训练数据实例的类别概率, 确定该训练数据实例的预设阈值θ 1;以及根据任一训练数据实例预设阈值θ 1,确定所述预设阈值θ 2
具体地说,预设阈值θ 1和θ 2由选取的Loss计算公式和BATCH的大小确定。
θ 1=-log(a)a∈(0.9,1)    (3)
Figure PCTCN2018073955-appb-000002
其中,a为类别概率,θ 1为单个样本评价阈值,θ 2为整个BATCH的评价阈值,N为一个BATCH中的样本数量。
本公开的示例性实施例设计了基于数据分析的深度学习加速收敛方法,并且可应用于各个深度学习开源框架。该方法主要包括数据预处理和深度学习训练。其中,在数据预处理部分,通过运用各种图像变换方法进行数据增强,从而极大地扩充了数据,并增加了数据的多样性。在深度学习训练部分,结合支持向量思想,通过对数据的损失分析,加速了收敛。
本公开的示例性实施例基于对训练过程中的数据进行分析,根据每次迭代中数据的损失大小,使得训练集中在困难数据实例上(损失大),从而加快了收敛的速度。相比于现有对于训练数据不加区分的学习方法,本公开的示例性实施例根据训练数据的损失大小对数据加以区分,使得训练更具有针对性。而且,现有网络训练方法将所有数据都用于学习,从而导致实际运用中训练数据不平衡的问题,使得学习模型的训练倾向于数据量更多的数据类别,而本公开的示例性实施例则对该问题起到了遏制作用,一定程度上提升了训练效果。
下面将举一具体应用例,详细说明根据本公开的深度学习训练方法。
实验数据采用ImageNet数据集,数据集训练图片共120万张,分为1000类,每类1200张样本。对于ImageNet图像识别竞赛的分类任务,采用根据本公开的深度学习训练方法进行,同时与现有Caffe(卷积神经网络框架)开源框架训练方法进行对比。
具体地说,如图2所示,在本公开的示例性实施例中,方法主要分两大过程:数据预处理、深度学习训练。下面结合该实验分别说明 每个过程的具体内容。
数据预处理
数据预处理是进行数据分析、学习任务的必要过程。对于本实验而言,数据的分类、标注等任务数据集中已完成,因而所需的关键就在于数据增强。对样本进行数据增强(例如使用随机裁剪,镜像等增强方法)。图像分辨率调整至256×256,最终将数据保存为lmdb文件格式,供Caffe调用。
深度学习训练
在本公开的示例性实施例中,方法主要是针对本过程进行改进,依据训练数据实例的损失大小区分数据来进行迭代学习。该过程主要涉及通过深度网络训练得到深度模型。
具体地说,通过深度网络(本文中可以简称网络)进行训练的训练过程包括如下步骤:
(1)根据任务要求,获取足量的训练样本,并对所获取的训练样本进行筛选、处理、增强、均衡、标记标签等操作,构建训练样本集。
(2)选定深度网络模型结构,设定相应的训练参数,初始化深度网络模型。
(3)将一定数量的训练样本组成一个BATCH送入网络进行计算,得到此BATCH中每个样本数据的分类计算值Xc。
(4)对比每个样本的真实标签XT,计算Loss(损失)值L。将BATCH中的每个训练样本的Loss值L与阈值θ 1进行对比,若L超过阈值,则认为此训练样本为困难实例,用于本次学习,反之则将其舍弃。
损失计算公式有多种,本实验采用分类最常用的SoftmaxLoss。
SoftmaxLoss是以Softmax函数作为交叉损失函数的输入,计算公式如下:
Figure PCTCN2018073955-appb-000003
Softmax的计算结果为一个数据实例属于各个类别的概率。
进一步根据上述公式(1)可以计算出该数据实例的损失。
(5)计算整个BATCH中所有样本数据的平均损失值Lavg。
(6)将BATCH训练样本中的困难实例的平均损失值Lavg与阈值θ 2进行对比,若Lavg超过阈值,则认为此BATCH中绝大多数训练样本为困难实例,将Loss值进行反向传播,微调网络参数,对模型进行训练;若Lavg未超过阈值,则认为此BATCH训练样本几乎均为非困难实例,所得Loss值不进行反向传播,舍弃该BATCH,阻止模型学习此BATCH中的训练样本特征,由此实现进一步加速。
θ 1是基于单个实例的损失判定阈值,其值根据上述公式(3)确定。在本次实验中,将a设定为0.99,则计算得到的θ 1的值为0.01。
θ 2用于判定批量数据的平均损失,考虑到要防止因个别实例的损失值较小而影响整体平均损失,θ 2应小于θ 1。随着样本数量N的增大,该影响逐渐减小,θ 2也不断接近θ 1,因而采用上述公式(4)确定θ 2,计算得到的θ 2的值为9.9×10 -3
(7)若未达到终止条件,则返回步骤(3)继续训练。达到终止条件,结束学习过程。
综上,在本公开的示例性实施例中,深度学习训练部分通过对单个数据实例和批量数据实例进行控制,实现将训练学习集中于困难实例。对于单个数据实例,将单个数据实例的通过公式(1)计算出的损失值L与阈值θ 1进行比较,若L大于阈值θ 1,则该数据实例用于训练学习,反之,本次迭代中忽略该数据实例,即其反向传播梯度为0。对于批量数据实例,将整个批量数据的通过公式(2)计算出的损失均值Lavg与阈值θ 2进行比较,若Lavg大于阈值θ 2,则执行反向传播,反之,则取消,即该批量数据不用于学习。
实验结果显示,原训练方法在4367次迭代后,loss开始下降,逐渐收敛;而使用根据本公开的方法,在进行到第78次迭代后,loss开始下降。可见,本公开提供的深度学习训练方法的加速收敛效果十分明显。
根据本公开的另一方面,提供一种深度学习训练装置,如图3所示,该深度学习训练装置可以包括:
损失确定模块310,其构造为在每次迭代训练的前向传播过程中, 确定批量训练数据实例中的每个训练数据实例的损失值;
实例选择模块320,其构造为根据各训练数据实例的损失值,从所述批量训练数据实例中确定出所有困难实例;以及
学习模块330,其构造为放弃学习非困难实例的特征,而学习所述所有困难实例的特征。
本公开的示例性实施例通过计算训练迭代中训练数据实例的损失值,获得对该次迭代具有较大作用数据实例,并用于对模型进行训练,也就是说,集中训练困难实例,由此加快了模型的收敛速度。同时,由于学习训练过程忽略了无用数据实例,因此有效地改善了实际问题中训练数据不平衡的问题。本公开的示例性实施例通过对模型训练数据的分析,对现有的训练学习方法进行改进,而且可结合现有各种优化求解方法使用,并可以融合进当前的各个深度学习框架中。
在本公开的示例性实施例中,所述实例选择模块320构造为:针对任一训练数据实例,对比该训练数据实例的损失值和预设阈值θ 1的大小关系;若该损失值不小于所述预设阈值θ 1,则该训练数据实例为困难实例;以及遍历所述批量训练数据实例,以获得所有困难实例。
在本公开的示例性实施例中,所述装置还包括判断模块,所述判断模块构造为:
确定所述批量训练数据实例的平均损失值;
比较所述平均损失值与预设阈值θ 2
若所述平均损失值超过所述预设阈值θ 2,则触发所述学习模块放弃学习非困难实例的特征,而学习所述所有困难实例的特征,或者若所述平均损失值未超过所述预设阈值θ 2,则触发所述学习模块放弃学习所述批量训练数据实例的特征。
在本公开的示例性实施例中,所述预设阈值θ 2小于所述预设阈值θ 1
在本公开的示例性实施例中,所述装置还包括阈值设置模块,所述阈值设置模块构造为:
针对任一训练数据实例,根据该训练数据实例的类别概率,确定该训练数据实例的预设阈值θ 1;以及
根据任一训练数据实例的预设阈值θ 1,确定所述预设阈值θ 2
在本公开的示例性实施例中,所述装置还包括参数调整模块,所述参数调整模块构造为:
在学习时,将各困难实例的损失值反向传播;以及
根据各损失值调整用于训练的网络参数。
上述示例性描述的深度学习训练方法可由硬件、由处理器执行的软件模块或者这二者的结合实现。例如,附图中所示功能框图中的一个或多个功能框图和/或功能框图的一个和/或多个组合,既可以对应于计算机程序流程的各个软件模块,亦可以对应于各个硬件模块。这些软件模块,可以分别对应于附图所示的各个步骤。这些硬件模块例如可利用现场可编程门阵列(FPGA)将这些软件模块固化而实现。
软件模块可以位于RAM存储器、闪存、ROM存储器、EPROM存储器、EEPROM存储器、寄存器、硬盘、移动硬盘、CD-ROM或者本领域已知的任何其他形式的存储介质。可以将一种存储介质藕接至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息;或者该存储介质可以是处理器的组成部分。处理器和存储介质可以位于专用集成电路中。该软件模块可以存储在移动终端的存储器中,也可以存储在可插入移动终端的存储卡中。例如,若移动终端采用的是较大容量的MEGA-SIM卡或者大容量的闪存装置,则该软件模块可存储在该MEGA-SIM卡或者大容量的闪存装置中。
针对附图中描述的功能框图中的一个或多个和/或功能框图的一个或多个组合,可以实现为用于执行本申请所描述功能的通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或晶体管逻辑器件、分立硬件组件或者其任意适当组合。针对附图中描述的功能框图中的一个或多个和/或功能框图的一个或多个组合,还可以实现为计算机设备的组合,例如,DSP和微处理器的组合、多个微处理器、与DSP通信结合的一个或多个微处理器或者任何其他这种配置。
本领域技术人员在不脱离本公开内容的基础上可以对本公开做出各种改进,但这些改进应当理解为仍落在本公开的保护范围之内。

Claims (14)

  1. 一种深度学习训练方法,包括以下步骤:
    S101,在每次迭代训练的前向传播过程中,确定批量训练数据实例中的每个训练数据实例的损失值;
    S102,根据各训练数据实例的损失值,从所述批量训练数据实例中确定出所有困难实例;以及
    S103,放弃学习非困难实例的特征,而学习所述所有困难实例的特征。
  2. 如权利要求1所述的方法,其中,步骤S101包括以下子步骤:
    S1011,根据任务要求,获取足量的训练数据实例,并对所获取的训练数据实例进行筛选、处理、增强、均衡、标记标签的操作,构建训练数据实例集;
    S1012,选定深度网络模型结构,设定相应的训练参数,初始化深度网络模型;
    S1013,将一定数量的训练数据实例组成一个所述批量训练数据实例送入深度网络进行计算,得到所述批量训练数据实例中的每个训练数据实例的分类计算值;以及
    S1014,对比每个训练数据实例的真实标签,计算每个训练数据实例的损失值。
  3. 如权利要求1所述的方法,其中,步骤S102包括以下子步骤:
    S1021,针对任一训练数据实例,对比该训练数据实例的损失值和预设阈值θ 1的大小关系;若该损失值不小于所述预设阈值θ 1,则确定该训练数据实例为困难实例;以及
    S1022,遍历所述批量训练数据实例,以获得所有困难实例。
  4. 如权利要求3所述的方法,其中,步骤S103包括以下子步骤:
    S1031,确定所述批量训练数据实例的平均损失值;
    S1032,将所述平均损失值与预设阈值θ 2进行比较;以及
    S1033,若所述平均损失值超过所述预设阈值θ 2,则放弃学习非困难实例的特征,而学习所述所有困难实例的特征,或者若所述平均损失值未超过所述预设阈值θ 2,则放弃学习所述批量训练数据实例的特征。
  5. 如权利要求4所述的方法,其中,所述预设阈值θ 2小于所述预设阈值θ 1
  6. 如权利要求4所述的方法,其中,针对任一训练数据实例,根据该训练数据实例的类别概率,确定该训练数据实例的预设阈值θ 1;以及根据任一训练数据实例的预设阈值θ 1,确定所述预设阈值θ 2
  7. 如权利要求4所述的方法,其中,学习所述所有困难实例的特征的步骤还包括:
    在学习时,将各困难实例的损失值反向传播;以及
    根据各损失值调整用于训练的网络参数。
  8. 一种深度学习训练装置,包括:
    损失确定模块,其构造为在每次迭代训练的前向传播过程中,确定批量训练数据实例中的每个训练数据实例的损失值;
    实例选择模块,其构造为在每次迭代训练中,根据各训练数据实例的损失值,从所述批量训练数据实例中确定出所有困难实例;以及
    学习模块,其构造为放弃学习非困难实例的特征,而学习所述所有困难实例的特征。
  9. 如权利要求8所述的装置,其中,所述损失确定模块构造为:
    根据任务要求,获取足量的训练数据实例,并对所获取的训练数据实例进行筛选、处理、增强、均衡、标记标签的操作,构建训练数据实例集;
    选定深度网络模型结构,设定相应的训练参数,初始化深度网络模型;
    将一定数量的训练数据实例组成一个所述批量训练数据实例送入深度网络进行计算,得到所述批量训练数据实例中的每个训练数据实例的分类计算值;以及
    对比每个训练数据实例的真实标签,计算每个训练数据实例的损失值。
  10. 如权利要求8所述的装置,其中,所述实例选择模块构造为:
    针对任一训练数据实例,对比该训练数据实例的损失值和预设阈值θ 1的大小关系;若该损失值不小于所述预设阈值θ 1,则确定该训练数据实例为困难实例;以及
    遍历所述批量训练数据实例,以获得所有困难实例。
  11. 如权利要求10所述的装置,其中,所述装置还包括判断模块,所述判断模块构造为:
    确定所述批量训练数据实例的平均损失值;
    比较所述平均损失值与预设阈值θ 2;以及
    若所述平均损失值超过所述预设阈值θ 2,则触发所述学习模块放弃学习非困难实例的特征,而学习所述所有困难实例的特征,或者若所述平均损失值未超过所述预设阈值θ 2,则触发所述学习模块放弃学习所述批量训练数据实例的特征。
  12. 如权利要求11所述的装置,其中,所述预设阈值θ 2小于所述预设阈值θ 1
  13. 如权利要求11所述的装置,其中,所述装置还包括阈值设置模块,所述阈值设置模块构造为:
    针对任一训练数据实例,根据该训练数据实例的类别概率,确定该训练数据实例的预设阈值θ 1;以及
    根据任一训练数据实例的预设阈值θ 1,确定所述预设阈值θ 2
  14. 如权利要求11所述的装置,其中,所述装置还包括参数调整模块,所述参数调整模块构造为:
    在学习时,将各困难实例的损失值反向传播;以及
    根据各损失值调整用于训练的网络参数。
PCT/CN2018/073955 2017-02-22 2018-01-24 深度学习训练方法及装置 WO2018153201A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710094563.5A CN108460464A (zh) 2017-02-22 2017-02-22 深度学习训练方法及装置
CN201710094563.5 2017-02-22

Publications (1)

Publication Number Publication Date
WO2018153201A1 true WO2018153201A1 (zh) 2018-08-30

Family

ID=63222016

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/073955 WO2018153201A1 (zh) 2017-02-22 2018-01-24 深度学习训练方法及装置

Country Status (2)

Country Link
CN (1) CN108460464A (zh)
WO (1) WO2018153201A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110659678A (zh) * 2019-09-09 2020-01-07 腾讯科技(深圳)有限公司 一种用户行为分类方法、系统及存储介质
CN111105011A (zh) * 2018-10-26 2020-05-05 斯特拉德视觉公司 用于对有用学习数据进行取舍筛选的基于cnn的方法及装置
CN111400915A (zh) * 2020-03-17 2020-07-10 桂林理工大学 一种基于深度学习的砂土液化判别方法及装置
CN113420792A (zh) * 2021-06-03 2021-09-21 阿波罗智联(北京)科技有限公司 图像模型的训练方法、电子设备、路侧设备及云控平台
CN113538079A (zh) * 2020-04-17 2021-10-22 北京金山数字娱乐科技有限公司 一种推荐模型的训练方法及装置、一种推荐方法及装置
CN115100249A (zh) * 2022-06-24 2022-09-23 江西沃尔肯智能装备科技有限公司 一种基于目标跟踪算法的智慧工厂监控系统
CN116610960A (zh) * 2023-07-20 2023-08-18 北京万界数据科技有限责任公司 一种人工智能训练参数的监测管理系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858411A (zh) * 2019-01-18 2019-06-07 深圳壹账通智能科技有限公司 基于人工智能的案件审判方法、装置及计算机设备
CN112633459A (zh) * 2019-09-24 2021-04-09 华为技术有限公司 训练神经网络的方法、数据处理方法和相关装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593474A (zh) * 2013-11-28 2014-02-19 中国科学院自动化研究所 基于深度学习的图像检索排序方法
US20150248764A1 (en) * 2014-02-28 2015-09-03 Microsoft Corporation Depth sensing using an infrared camera
CN104992223A (zh) * 2015-06-12 2015-10-21 安徽大学 基于深度学习的密集人数估计方法
CN105608450A (zh) * 2016-03-01 2016-05-25 天津中科智能识别产业技术研究院有限公司 基于深度卷积神经网络的异质人脸识别方法
CN106096538A (zh) * 2016-06-08 2016-11-09 中国科学院自动化研究所 基于定序神经网络模型的人脸识别方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593474A (zh) * 2013-11-28 2014-02-19 中国科学院自动化研究所 基于深度学习的图像检索排序方法
US20150248764A1 (en) * 2014-02-28 2015-09-03 Microsoft Corporation Depth sensing using an infrared camera
CN104992223A (zh) * 2015-06-12 2015-10-21 安徽大学 基于深度学习的密集人数估计方法
CN105608450A (zh) * 2016-03-01 2016-05-25 天津中科智能识别产业技术研究院有限公司 基于深度卷积神经网络的异质人脸识别方法
CN106096538A (zh) * 2016-06-08 2016-11-09 中国科学院自动化研究所 基于定序神经网络模型的人脸识别方法及装置

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111105011A (zh) * 2018-10-26 2020-05-05 斯特拉德视觉公司 用于对有用学习数据进行取舍筛选的基于cnn的方法及装置
CN111105011B (zh) * 2018-10-26 2023-10-20 斯特拉德视觉公司 用于对有用学习数据进行取舍筛选的基于cnn的方法及装置
CN110659678A (zh) * 2019-09-09 2020-01-07 腾讯科技(深圳)有限公司 一种用户行为分类方法、系统及存储介质
CN110659678B (zh) * 2019-09-09 2023-11-17 腾讯科技(深圳)有限公司 一种用户行为分类方法、系统及存储介质
CN111400915A (zh) * 2020-03-17 2020-07-10 桂林理工大学 一种基于深度学习的砂土液化判别方法及装置
CN113538079A (zh) * 2020-04-17 2021-10-22 北京金山数字娱乐科技有限公司 一种推荐模型的训练方法及装置、一种推荐方法及装置
CN113420792A (zh) * 2021-06-03 2021-09-21 阿波罗智联(北京)科技有限公司 图像模型的训练方法、电子设备、路侧设备及云控平台
CN115100249A (zh) * 2022-06-24 2022-09-23 江西沃尔肯智能装备科技有限公司 一种基于目标跟踪算法的智慧工厂监控系统
CN116610960A (zh) * 2023-07-20 2023-08-18 北京万界数据科技有限责任公司 一种人工智能训练参数的监测管理系统
CN116610960B (zh) * 2023-07-20 2023-10-13 北京万界数据科技有限责任公司 一种人工智能训练参数的监测管理系统

Also Published As

Publication number Publication date
CN108460464A (zh) 2018-08-28

Similar Documents

Publication Publication Date Title
WO2018153201A1 (zh) 深度学习训练方法及装置
CN106897714B (zh) 一种基于卷积神经网络的视频动作检测方法
WO2020143321A1 (zh) 一种基于变分自编码器的训练样本数据扩充方法、存储介质及计算机设备
CN107346448B (zh) 基于深度神经网络的识别装置、训练装置及方法
US20190279088A1 (en) Training method, apparatus, chip, and system for neural network model
CN106897746B (zh) 数据分类模型训练方法和装置
WO2022042123A1 (zh) 图像识别模型生成方法、装置、计算机设备和存储介质
WO2020042658A1 (zh) 数据处理方法、装置、设备和系统
CN107392919B (zh) 基于自适应遗传算法的灰度阈值获取方法、图像分割方法
EP3620982B1 (en) Sample processing method and device
WO2021082780A1 (zh) 一种日志分类方法及装置
CN110008853B (zh) 行人检测网络及模型训练方法、检测方法、介质、设备
JP2022141931A (ja) 生体検出モデルのトレーニング方法及び装置、生体検出の方法及び装置、電子機器、記憶媒体、並びにコンピュータプログラム
WO2018036547A1 (zh) 一种数据处理的方法以及装置
WO2023206944A1 (zh) 一种语义分割方法、装置、计算机设备和存储介质
CN110929848A (zh) 基于多挑战感知学习模型的训练、跟踪方法
WO2022178775A1 (zh) 基于特征多样性学习的深度集成模型训练方法
WO2021238586A1 (zh) 一种训练方法、装置、设备以及计算机可读存储介质
CN113487610B (zh) 疱疹图像识别方法、装置、计算机设备和存储介质
CN108985151B (zh) 手写模型训练方法、手写字识别方法、装置、设备及介质
CN114003671A (zh) 一种地图图幅编码识别的方法及其识别系统
CN108364026A (zh) 一种簇心更新方法、装置及K-means聚类分析方法、装置
CN107274357B (zh) 一种参数最优的灰度图像增强处理系统
WO2023201932A1 (zh) 一种行人重识别方法、装置、设备及存储介质
WO2022194049A1 (zh) 一种对象处理方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18757152

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18757152

Country of ref document: EP

Kind code of ref document: A1