WO2020199743A1 - 用于训练学习模型的方法、装置和计算设备 - Google Patents

用于训练学习模型的方法、装置和计算设备 Download PDF

Info

Publication number
WO2020199743A1
WO2020199743A1 PCT/CN2020/073834 CN2020073834W WO2020199743A1 WO 2020199743 A1 WO2020199743 A1 WO 2020199743A1 CN 2020073834 W CN2020073834 W CN 2020073834W WO 2020199743 A1 WO2020199743 A1 WO 2020199743A1
Authority
WO
WIPO (PCT)
Prior art keywords
learning model
deep learning
current
sample data
training
Prior art date
Application number
PCT/CN2020/073834
Other languages
English (en)
French (fr)
Inventor
周俊
Original Assignee
创新先进技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 创新先进技术有限公司 filed Critical 创新先进技术有限公司
Priority to SG11202104298VA priority Critical patent/SG11202104298VA/en
Priority to EP20781845.1A priority patent/EP3852014A4/en
Publication of WO2020199743A1 publication Critical patent/WO2020199743A1/zh
Priority to US17/246,201 priority patent/US11514368B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the embodiments of this specification relate to the field of machine learning, in particular, to methods, devices, and computing devices for training learning models.
  • Deep learning originated from the research of artificial neural networks, and it is a new field of machine learning research in recent years, and it has shown broad application prospects in various fields.
  • deep learning models also known as deep neural networks
  • a training data set that is, offline training
  • the embodiments of this specification provide a method, an apparatus and a computing device for training a learning model.
  • the embodiment of this specification provides a method for training a learning model, including: receiving current streaming sample data; training the current deep learning model based on the current streaming sample data, where shallow learning The parameters of the model are used as the initialization parameters of the current deep learning model, and the shallow learning model is trained based on historical sample data that is correlated with the current streaming sample data.
  • the embodiment of this specification provides an apparatus for training a learning model, including: a receiving unit, configured to receive current streaming sample data; a training unit, configured to compare current The deep learning model is trained, wherein the parameters of the shallow learning model are used as the initialization parameters of the current deep learning model, and the shallow learning model is based on historical sample data related to the current streaming sample data Obtained by training.
  • the embodiments of the present specification provide a computing device, including: at least one processor; a memory in communication with the at least one processor, and executable instructions stored thereon, the executable instructions being When the at least one processor is executed, the at least one processor implements the foregoing method.
  • the deep learning model when the current deep learning model is trained based on the current streaming sample data, the deep learning model can be accelerated by using the parameters of the trained shallow learning model as the initialization parameters of the current deep learning model The convergence speed of, which can efficiently complete the model training process, and also help to improve the performance of the deep learning model.
  • Fig. 1 is a schematic flowchart of a method for training a learning model according to an embodiment.
  • Fig. 2A is an exemplary process for shallow learning model training according to one embodiment.
  • Figure 2B is an exemplary process for deep learning model training according to one embodiment.
  • Fig. 3 is a schematic block diagram of an apparatus for training a learning model according to an embodiment.
  • Fig. 4 is a hardware structure diagram of a computing device for training a learning model according to an embodiment.
  • deep learning models are usually trained in batch learning.
  • data may arrive sequentially in the form of streams, so the batch learning method may not be suitable for such applications.
  • the training data set prepared in advance may occupy a large storage space, so it is not suitable for some scenarios with limited storage space.
  • streaming sample data may generally include sample data continuously generated by data sample sources, such as log files generated by web applications, online shopping data, game player activity data, social networking site information data, and so on.
  • Streaming sample data can also be called real-time sample data, and its time span is usually between hundreds of milliseconds and several seconds.
  • Model training based on streaming sample data can generally be considered as online learning.
  • the current streaming sample data can be received. Then, the current deep learning model is trained based on the current streaming sample data.
  • the parameters of the shallow learning model can be used as the initialization parameters of the current deep learning model, and the shallow learning model can be trained based on historical sample data that is related to the current streaming sample data.
  • the deep learning model when the current deep learning model is trained based on the current streaming sample data, the deep learning model can be accelerated by using the parameters of the trained shallow learning model as the initialization parameters of the current deep learning model The convergence speed of, which can efficiently complete the model training process, and also help to improve the performance of the deep learning model.
  • the deep learning model is trained based on streaming sample data, and there is no need to prepare a training data set in advance, which can effectively save storage space.
  • the data may show a conceptual shift, that is, the statistical characteristics of the data change in an unforeseen way over time. Then, by training the deep learning model based on streaming sample data, the deep learning model can be adjusted in time as the data changes, so that the prediction effect of the deep learning model can be improved, and it also has good scalability.
  • the structure of the shallow learning model can be simpler than the structure of the deep learning model.
  • the shallow learning model may not have hidden layers, or the number of hidden layers it has is less than the number of hidden layers of the deep learning model.
  • the shallow learning model may be a Logistic Regression (LR) model
  • the deep learning model may have one or more hidden layers.
  • the deep learning model can be constructed as a hidden layer when it is initially constructed.
  • Fig. 1 is a schematic flowchart of a method for training a learning model according to an embodiment.
  • step 102 current streaming sample data can be received.
  • the current deep learning model may be trained based on the current streaming sample data.
  • the parameters of the shallow learning model can be used as the initialization parameters of the current deep learning model, and the shallow learning model can be trained based on historical sample data that is related to the current streaming sample data.
  • the correlation between the historical sample data and the current streaming sample data may indicate that the historical sample data and the current streaming sample data have one or more of the same sample characteristics.
  • the aforementioned historical sample data may be historical streaming sample data before the current streaming sample data.
  • the historical streaming sample data may include one or more batches of streaming sample data before the current streaming sample data.
  • the shallow learning model can be obtained by online training based on historical streaming sample data.
  • the above-mentioned historical sample data may be offline sample data that is related to the current streaming sample data.
  • the offline sample data may be data that has one or more sample characteristics that are the same as the current streaming sample data.
  • the shallow learning model can be obtained by offline training based on offline sample data.
  • the shallow learning model can be trained online or offline, which can adapt to different application requirements.
  • the parameters of each layer of the shallow learning model when used as the initialization parameters of the current deep learning model, the parameters of each layer of the shallow learning model can be used as the initialization parameters of the corresponding layers of the current deep learning model.
  • a one-to-one mapping method can be used to map the parameters of each layer of the shallow learning model to the corresponding layers of the current deep learning model. In this way, it is not only conducive to speeding up the convergence speed of the deep learning model, shortening the training time, but also conducive to improving the performance of the deep learning model.
  • the parameters of the remaining layers of the current deep learning model can be initialized in a randomized manner.
  • step 104 various applicable methods in the field, such as Online Gradient Descent, may be used to train the current deep learning model.
  • the performance of the trained deep learning model may be compared with the performance of the current deep learning model.
  • the indicators used to evaluate the performance may include various applicable indicators such as Area Under Curve (AUC), accuracy, coverage, F1Score. This manual does not limit this.
  • AUC Area Under Curve
  • F1Score F1Score
  • the performance of the trained deep learning model is improved compared with the current deep learning model, for example, the AUC of the trained deep learning model is higher than the AUC of the current deep learning model, you can use the trained deep learning model as the latest deep learning model.
  • the performance of the trained deep learning model is not improved compared with the current deep learning model, for example, the AUC of the trained deep learning model is lower than the AUC of the current deep learning model, or the AUC of the two are basically close, you can Consider updating the current deep learning model.
  • the number of hidden layers of the current deep learning model can be increased to obtain a deep learning model with an increased number of layers.
  • the number of hidden layers of the current deep learning model can be increased by one or more layers, which can be determined according to actual needs.
  • the deep learning model with the increased number of layers can be trained based on the current streaming sample data to obtain a new deep learning model.
  • the parameters of the shallow learning model can be used as the initialization parameters of the deep learning model after increasing the number of layers. In this way, it can help speed up the convergence speed of the deep learning model after increasing the number of layers, thereby efficiently obtaining the new deep learning model, and it is also conducive to improving the performance of the new deep learning model.
  • the latest deep learning model can be determined based on the performance comparison result of the new deep learning model and the current deep learning model.
  • the new deep learning model can be used as the latest deep learning model. If the performance of the new deep learning model is not improved compared with the current deep learning model, the current deep learning model can be used as the latest deep learning model.
  • the current optimal deep learning model can be effectively selected as the latest deep learning model.
  • the latest deep learning model can be directly used as the latest learning model to be applied.
  • the latest deep learning model and the shallow learning model may be weighted to obtain the latest learning model to be applied.
  • the latest deep learning model and the shallow learning model may each have corresponding weights.
  • Their weights can be pre-defined or defined by users according to actual needs.
  • the weight of the latest deep learning model can be 70%, and the weight of the shallow learning model can be 30%.
  • the weight of the latest deep learning model can be 100%, and the weight of the shallow learning model can be 0. Therefore, it can be applied to different application scenarios.
  • Fig. 2A is an exemplary process for shallow learning model training according to one embodiment.
  • step 202A historical sample data can be obtained.
  • shallow learning model training may be performed based on historical sample data to obtain a shallow learning model.
  • the historical sample data may be one or more batches of streaming sample data before the current streaming sample.
  • shallow learning model training can be performed every time a batch of streaming sample data is received. In this way, one or more batches of streaming sample data may be required to complete the shallow learning model training online to obtain the shallow learning model.
  • the historical sample data may be offline sample data related to the foregoing current streaming sample.
  • the shallow learning model can be trained in an offline manner based on the offline sample data to obtain the shallow learning model.
  • process shown in FIG. 2A may occur before the deep learning model training process (for example, the process shown in FIG. 2B), so that the parameters of the shallow learning model are used as the initialization parameters of the deep learning model.
  • Figure 2B is an exemplary process for deep learning model training according to one embodiment.
  • step 202B current streaming sample data can be received.
  • the current deep learning model may be trained based on the current streaming sample data to obtain a trained deep learning model.
  • the current deep learning model when training, can be initialized first.
  • the parameters of each layer of the shallow learning model can be obtained, and then the parameters of each layer of the shallow learning model can be used as the initialization parameters of the corresponding layers of the current deep learning model.
  • the parameters of the remaining layers of the current deep learning model can be initialized in a randomized manner.
  • step 206B it can be determined whether the performance of the trained deep learning model is improved compared with the current deep learning model.
  • the trained deep learning model can be used as the latest deep learning model.
  • step 210B the number of hidden layers of the current deep learning model can be increased. For example, it can be added by one or more layers.
  • step 212B the deep learning model after increasing the number of layers can be trained to obtain a new deep learning model.
  • the parameters of each layer of the shallow learning model can be used as the initialization parameters for each layer of the deep learning model after increasing the number of layers.
  • the parameters of the remaining layers of the deep learning model can be initialized by randomization.
  • step 214B it can be determined whether the performance of the new deep learning model is improved compared with the current deep learning model.
  • the new deep learning model can be used as the latest deep learning model.
  • step 218B the current deep learning model can be used as the latest deep learning model.
  • Fig. 3 is a schematic block diagram of an apparatus for training a learning model according to an embodiment.
  • the apparatus 300 may include a receiving unit 302 and a training unit 304.
  • the receiving unit 302 can receive the current streaming sample data.
  • the training unit 304 may train the current deep learning model based on the current streaming sample data.
  • the parameters of the shallow learning model can be used as the initialization parameters of the current deep learning model, and the shallow learning model can be trained based on historical sample data that is related to the current streaming sample data.
  • the parameters of the trained shallow learning model are used as the initialization parameters of the current deep learning model, so that when the current deep learning model is trained based on the current streaming sample data, deep learning can be accelerated
  • the convergence speed of the model can efficiently complete the model training process and also help to improve the performance of the deep learning model.
  • the historical sample data may be historical streaming sample data before the current streaming sample data.
  • the shallow learning model can be obtained by online training based on historical streaming sample data.
  • the historical sample data may be offline sample data.
  • the shallow learning model can be obtained by offline training based on offline sample data.
  • the device 300 may further include an evaluation unit 306.
  • the evaluation unit 306 can evaluate the performance of the trained deep learning model.
  • the evaluation unit 306 may use the trained deep learning model as the latest deep learning model.
  • the training unit 304 can increase the number of hidden layers of the current deep learning model to obtain an increased number of deep learning model.
  • the training unit 304 may train the deep learning model with the increased number of layers based on the current streaming sample data to obtain a new deep learning model.
  • the evaluation unit 306 may determine the latest deep learning model based on the performance comparison result of the new deep learning model and the current deep learning model.
  • the parameters of the shallow learning model can be used as the initialization parameters of the deep learning model after increasing the number of layers.
  • the evaluation unit 306 may use the new deep learning model as the latest deep learning model.
  • the evaluation unit 306 may use the current deep learning model as the latest deep learning model.
  • the device 300 may further include a weighting unit 308.
  • the weighting unit 308 may weight the latest deep learning model and the shallow learning model to obtain the latest learning model.
  • Each unit of the device 300 can execute the corresponding steps in the method embodiments of FIGS. 1 to 2B. Therefore, for brevity of description, the specific operations and functions of each unit of the device 300 will not be repeated here.
  • the foregoing apparatus 300 may be implemented by hardware, software, or a combination of software and hardware.
  • the device 300 when the device 300 is implemented by software, it can be formed by reading the corresponding executable instructions in the memory (such as non-volatile memory) into the memory by the processor of the device where it is located and running it.
  • the memory such as non-volatile memory
  • Fig. 4 is a hardware structure diagram of a computing device for training a learning model according to an embodiment.
  • the computing device 400 may include at least one processor 402, a memory 404, a memory 406, and a communication interface 408, and the at least one processor 402, the memory 404, the memory 406, and the communication interface 408 are connected together via a bus 410.
  • At least one processor 402 executes at least one executable instruction (ie, the above-mentioned element implemented in the form of software) stored or encoded in the memory 404.
  • the executable instructions stored in the memory 404 when executed by the at least one processor 402, enable the computing device to implement the various processes described above in conjunction with FIGS. 1-2B.
  • the computing device 400 can be implemented in any suitable form in the art, for example, it includes but is not limited to desktop computers, laptop computers, smart phones, tablet computers, consumer electronic devices, wearable smart devices, and so on.
  • the embodiments of the present specification also provide a machine-readable storage medium.
  • the machine-readable storage medium may store executable instructions, and the executable instructions, when executed by the machine, cause the machine to implement the specific process of the method embodiment described above with reference to FIGS.
  • a machine-readable storage medium may include, but is not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read-Only Memory, ROM), and electrically erasable programmable read-only memory (Electrically-Erasable Programmable). Read-Only Memory, EEPROM), Static Random Access Memory (SRAM), hard disk, flash memory, etc.
  • RAM Random Access Memory
  • ROM read-only memory
  • EEPROM Electrically erasable programmable read-only memory
  • SRAM Static Random Access Memory
  • hard disk hard disk
  • flash memory etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Analysis (AREA)
  • Measurement And Recording Of Electrical Phenomena And Electrical Characteristics Of The Living Body (AREA)

Abstract

一种用于训练学习模型的方法、装置和计算设备。该方法可以包括:接收当前流式样本数据(102);基于当前流式样本数据对当前深度学习模型进行训练(104),其中,浅层学习模型的参数被用作当前深度学习模型的初始化参数,浅层学习模型是基于与当前流式样本数据具有关联性的历史样本数据训练得到的。

Description

用于训练学习模型的方法、装置和计算设备 技术领域
本说明书的实施例涉及机器学习领域,具体地,涉及用于训练学习模型的方法、装置和计算设备。
背景技术
深度学习源于对人工神经网络的研究,是近年来机器学习研究的一个新领域,并且在各个领域展现了广阔的应用前景。目前,在大多数深度学习方法中,深度学习模型(也可以被称为深度神经网络)通常是以批量学习方式来训练的。在这种方式中,一般需要在学习任务开始前准备好训练数据集(也就是离线训练(Offline Training))。
然而,在目前的很多应用中,数据可能是以流的形式先后到达的,那么批量学习方式对于这样的应用而言可能不太实用。
发明内容
考虑到现有技术的上述问题,本说明书的实施例提供了用于训练学习模型的方法、装置和计算设备。
一方面,本说明书的实施例提供了一种用于训练学习模型的方法,包括:接收当前流式样本数据;基于所述当前流式样本数据对当前深度学习模型进行训练,其中,浅层学习模型的参数被用作所述当前深度学习模型的初始化参数,所述浅层学习模型是基于与所述当前流式样本数据具有关联性的历史样本数据训练得到的。
另一方面,本说明书的实施例提供了一种用于训练学习模型的装置,包括:接收单元,用于接收当前流式样本数据;训练单元,用于基于所述当前流式样本数据对当前深度学习模型进行训练,其中,浅层学习模型的参数被用作所述当前深度学习模型的初始化参数,所述浅层学习模型是基于与所述当前流式样本数据具有关联性的历史样本数据训练得到的。
另一方面,本说明书的实施例提供了一种计算设备,包括:至少一个处理器;与所述至少一个处理器进行通信的存储器,其上存储有可执行指令,所述可执行指令在被所述至少一个处理器执行时使得所述至少一个处理器实现上述方法。
可见,在该技术方案中,在基于当前流式样本数据对当前深度学习模型进行训练时,通过将训练好的浅层学习模型的参数用作当前深度学习模型的初始化参数,能够加快深度学习模型的收敛速度,从而能够高效地完成模型训练过程,而且也有利于提升深度学习模型的性能。
附图说明
下文将以明确易懂的方式,通过对优选实施例的说明并结合附图来对本发明上述特性、技术特征、优点及其实现方式予以进一步说明,其中:
图1是根据一个实施例的用于训练学习模型的方法的示意性流程图。
图2A是根据一个实施例的用于浅层学习模型训练的示例性过程。
图2B是根据一个实施例的用于深度学习模型训练的示例性过程。
图3是根据一个实施例的用于训练学习模型的装置的示意性框图。
图4是根据一个实施例的用于训练学习模型的计算设备的硬件结构图。
具体实施方式
现在将参考各实施例讨论本文描述的主题。应当理解的是,讨论这些实施例仅是为了使得本领域技术人员能够更好地理解并且实现本文描述的主题,并非是对权利要求书中所阐述的保护范围、适用性或者示例的限制。可以在不脱离权利要求书的保护范围的情况下,对所讨论的元素的功能和排列进行改变。各个实施例可以根据需要,省略、替换或者添加各种过程或组件。
目前,通常以批量学习方式来训练深度学习模型。在批量学习方式中,一般需要在学习任务开始前准备好训练数据集,然后利用这些训练数据集来训练深度学习模型(即,离线训练)。然而,在一些应用中,数据可能是以流的形式先后到达的,那么批量学习方式可能不太适合这样的应用。另外,在批量学习方式中,提前准备好的训练数据集可能会占用较大的存储空间,那么对于某些存储空间受限的场景而言也是不太适合的。
本说明书的实施例提出了一种基于流式样本数据来训练学习模型的技术方案。在本说明书中,流式样本数据通常可以包括由数据样本源持续生成的样本数据,例如Web应用程序生成的日志文件、网购数据、游戏玩家活动数据、社交网站信息数据等。流式样本数据也可以称为实时样本数据,其时间跨度通常在数百毫秒到数秒之间。基于流式 样本数据进行模型训练通常也可以认为是在线学习。
具体而言,在本说明书的技术方案中,可以接收当前流式样本数据。然后,基于当前流式样本数据对当前深度学习模型进行训练。其中,浅层学习模型的参数可以被用作当前深度学习模型的初始化参数,而浅层学习模型可以是基于与当前流式样本数据具有关联性的历史样本数据训练得到的。
可见,在该技术方案中,在基于当前流式样本数据对当前深度学习模型进行训练时,通过将训练好的浅层学习模型的参数用作当前深度学习模型的初始化参数,能够加快深度学习模型的收敛速度,从而能够高效地完成模型训练过程,而且也有利于提升深度学习模型的性能。
此外,在该技术方案中,基于流式样本数据来对深度学习模型进行训练,可以不需要提前准备好训练数据集,从而能够有效地节省存储空间。
在某些情况下,数据可能会表现出概念偏移,即数据的统计特性随着时间的推移以不可预见的方式变化的现象。那么,通过基于流式样本数据来对深度学习模型进行训练,能够随着数据的变化来及时地调整深度学习模型,从而能够提升深度学习模型的预测效果,并且也具有良好的可扩展性。
在本说明书中,浅层学习模型的结构可以比深度学习模型的结构简单。例如,浅层学习模型可以不具有隐藏层,或者其具有的隐藏层的层数比深度学习模型的隐藏层的层数少。
例如,浅层学习模型可以是逻辑回归模型(Logistic Regression,LR),而深度学习模型可以具有一层或多层的隐藏层。在一些情况下,为了便于实现,可以在初始构建深度学习模型时,将其构建为具有一层的隐藏层。
下面将结合具体实施例来详细地描述上述技术方案。
图1是根据一个实施例的用于训练学习模型的方法的示意性流程图。
如图1所示,在步骤102中,可以接收当前流式样本数据。
在步骤104中,可以基于当前流式样本数据来对当前深度学习模型进行训练。
浅层学习模型的参数可以被用作当前深度学习模型的初始化参数,而浅层学习模型可以是基于与当前流式样本数据具有关联性的历史样本数据训练得到的。
例如,历史样本数据与当前流式样本数据之间的关联性可以表示历史样本数据与当 前流式样本数据具有一个或多个相同的样本特征。
在一个实施例中,上述历史样本数据可以是当前流式样本数据之前的历史流式样本数据。例如,历史流式样本数据可能包括当前流式样本数据之前的一批或多批流式样本数据。在这种情况下,浅层学习模型可以是基于历史流式样本数据在线训练得到的。
在一个实施例中,上述历史样本数据可以是与当前流式样本数据具有关联性的离线样本数据。例如,离线样本数据可以是与当前流式样本数据具有一个或多个相同的样本特征的数据。在这种情况下,浅层学习模型可以是基于离线样本数据离线训练得到的。
可见,在本说明书中,浅层学习模型可以通过在线方式或离线方式来训练得到,这样能够适应不同的应用需求。
在一个实施例中,在将浅层学习模型的参数用作当前深度学习模型的初始化参数时,可以将浅层学习模型的各层的参数用作当前深度学习模型的对应各层的初始化参数。比如,可以采用一一映射的方式将浅层学习模型的各层的参数映射到当前深度学习模型的对应各层上。这样,不仅有利于加快深度学习模型的收敛速度,缩短训练时间,而且有利于提升深度学习模型的性能。此外,当前深度学习模型的其余各层的参数可以采用随机化的方式来初始化。
在一个实施例中,在步骤104中,可以利用在线梯度下降(Online Gradient Descent)等本领域中各种适用的方法来对当前深度学习模型进行训练。
在一个实施例中,在对当前深度学习模型训练结束而得到训练后的深度学习模型之后,可以将训练后的深度学习模型的性能和当前深度学习模型的性能进行比较。
例如,用于评估性能的指标可以包括曲线下面积(Area Under Curve,AUC)、准确率、覆盖率、F1Score等各种适用的指标。本说明书对此不作限定。
如果训练后的深度学习模型与当前深度学习模型相比性能得到提升,比如,训练后的深度学习模型的AUC高于当前深度学习模型的AUC,则可以将训练后的深度学习模型作为最新深度学习模型。
如果训练后的深度学习模型与当前深度学习模型相比性能未得到提升,比如,训练后的深度学习模型的AUC低于当前深度学习模型的AUC,或者这二者的AUC基本上接近,则可以考虑对当前深度学习模型进行更新。
例如,可以增加当前深度学习模型的隐藏层的层数,得到增加层数的深度学习模型。 比如,可以将当前深度学习模型的隐藏层的层数增加一层或多层,这可以根据实际需求等来决定。
之后,可以基于当前流式样本数据对增加层数后的深度学习模型进行训练,得到新深度学习模型。
在一个实施例中,浅层学习模型的参数可以被用作增加层数后的深度学习模型的初始化参数。这样,可以有助于加快增加层数后的深度学习模型的收敛速度,从而高效地得到新深度学习模型,而且也有利于提升新深度学习模型的性能。
之后,可以基于新深度学习模型与当前深度学习模型的性能比较结果,来确定最新深度学习模型。
例如,如果新深度学习模型与当前深度学习模型相比性能得到提升,则可以将新深度学习模型作为最新深度学习模型。如果新深度学习模型与当前深度学习模型相比性能未得到提升,则可以将当前深度学习模型作为最新深度学习模型。
可见,通过上述方式,能够有效地选择当前最优的深度学习模型作为最新深度学习模型。
在一个实施例中,可以直接将最新深度学习模型作为要应用的最新学习模型。
或者,在另一实施例中,可以将最新深度学习模型和浅层学习模型进行加权,得到要应用的最新学习模型。
例如,最新深度学习模型和浅层学习模型可以分别具有相应的权重。它们的权重可以是预先定义的,也可以是用户根据实际需求来定义的。比如,最新深度学习模型的权重可以为70%,而浅层学习模型的权重可以为30%。或者,最新深度学习模型的权重可以为100%,而浅层学习模型的权重可以为0。由此,可以适用于不同的应用场景。
下面将结合具体例子来描述本说明书中的技术方案。应理解的是,这些例子仅是为了帮助本领域技术人员更好地理解各个实施例,而非限制其范围。
图2A是根据一个实施例的用于浅层学习模型训练的示例性过程。
如图2A所示,在步骤202A中,可以获取历史样本数据。
在步骤204A中,可以基于历史样本数据进行浅层学习模型训练,以得到浅层学习模型。
例如,历史样本数据可以是上述当前流式样本之前的一批或多批流式样本数据。在 这种情况下,可以在每次接收到一批流式样本数据时进行浅层学习模型训练。这样,可能需要一批或多批流式样本数据来在线地完成浅层学习模型训练,从而得到浅层学习模型。
再例如,历史样本数据可以是与上述当前流式样本具有关联性的离线样本数据。在这种情况下,可以基于离线样本数据,以离线的方式进行浅层学习模型训练,从而得到浅层学习模型。
可以理解的是,图2A示出的过程可以发生在深度学习模型训练过程(例如,图2B所示的过程)之前,以便将浅层学习模型的参数用作深度学习模型的初始化参数。
图2B是根据一个实施例的用于深度学习模型训练的示例性过程。
如图2B所示,在步骤202B中,可以接收当前流式样本数据。
在步骤204B中,可以基于当前流式样本数据对当前深度学习模型进行训练,以得到训练后的深度学习模型。
例如,在进行训练时,可以首先对当前深度学习模型进行初始化。比如,可以获取浅层学习模型的各层的参数,然后将浅层学习模型的各层的参数作为当前深度学习模型的对应各层的初始化参数。可以以随机化的方式来初始化当前深度学习模型的其余各层的参数。
在步骤206B中,可以确定训练后的深度学习模型与当前深度学习模型相比性能是否得到提升。
如果得到提升,则在步骤208B中,可以将训练后的深度学习模型作为最新深度学习模型。
如果没有得到提升,则在步骤210B中,可以增加当前深度学习模型的隐藏层的层数。比如,可以将其增加一层或多层。
在步骤212B中,可以对增加层数后的深度学习模型进行训练,从而得到新深度学习模型。
例如,在对增加层数后的深度学习模型进行训练时,可以将浅层学习模型的各层的参数用作增加层数后的深度学习模型的对应各层的初始化参数,而增加层数后的深度学习模型的其余各层的参数可以通过随机化的方式来初始化。
在步骤214B中,可以确定新深度学习模型与当前深度学习模型相比性能是否得到 提升。
如果得到提升,则在步骤216B中,可以将新深度学习模型作为最新深度学习模型。
如果没有得到提升,则在步骤218B中,可以将当前深度学习模型作为最新深度学习模型。
应当理解的是,图2A和图2B示出的各个步骤的次序仅是示例性的。根据实际应用或不同的设计逻辑,这些步骤的次序可以相应地改变。
图3是根据一个实施例的用于训练学习模型的装置的示意性框图。
如图3所示,装置300可以包括接收单元302和训练单元304。
接收单元302可以接收当前流式样本数据。训练单元304可以基于所述当前流式样本数据对当前深度学习模型进行训练。其中,浅层学习模型的参数可以被用作当前深度学习模型的初始化参数,浅层学习模型可以是基于与当前流式样本数据具有关联性的历史样本数据训练得到的。
可见,在该技术方案中,通过将训练好的浅层学习模型的参数用作当前深度学习模型的初始化参数,这样在基于当前流式样本数据对当前深度学习模型进行训练时,能够加快深度学习模型的收敛速度,从而能够高效地完成模型训练过程,而且也有利于提升深度学习模型的性能。
在一个实施例中,历史样本数据可以是当前流式样本数据之前的历史流式样本数据。那么,浅层学习模型可以是基于历史流式样本数据在线训练得到的。
在一个实施例中,历史样本数据可以是离线样本数据。那么,浅层学习模型可以是基于离线样本数据离线训练得到的。
在一个实施例中,装置300还可以包括评估单元306。
在训练单元304对当前深度学习模型训练结束,得到训练后的深度学习模型之后,评估单元306可以对训练后的深度学习模型的性能进行评估。
如果训练后的深度学习模型与当前深度学习模型相比性能得到提升,则评估单元306可以将训练后的深度学习模型作为最新深度学习模型。
如果训练后的深度学习模型与当前深度学习模型相比性能未得到提升,则训练单元304可以增加当前深度学习模型的隐藏层的层数,得到增加层数后的深度学习模型。训练单元304可以基于当前流式样本数据对增加层数后的深度学习模型进行训练,以得到 新深度学习模型。之后,评估单元306可以基于新深度学习模型与当前深度学习模型的性能比较结果,确定最新深度学习模型。
在一个实施例中,浅层学习模型的参数可以被用作增加层数后的深度学习模型的初始化参数。
在一个实施例中,如果新深度学习模型与当前深度学习模型相比性能得到提升,则评估单元306可以将新深度学习模型作为最新深度学习模型。
如果新深度学习模型与当前深度学习模型相比性能未得到提升,则评估单元306可以将当前深度学习模型作为最新深度学习模型。
在一个实施例中,装置300还可以包括加权单元308。加权单元308可以将最新深度学习模型和浅层学习模型进行加权,以得到最新学习模型。
装置300的各个单元可以执行图1至2B的方法实施例中的相应步骤,因此,为了描述的简洁,装置300的各个单元的具体操作和功能此处不再赘述。
上述装置300可以采用硬件实现,也可以采用软件实现,或者可以通过软硬件的组合来实现。例如,装置300在采用软件实现时,其可以通过其所在设备的处理器将存储器(比如非易失性存储器)中对应的可执行指令读取到内存中运行来形成。
图4是根据一个实施例的用于训练学习模型的计算设备的硬件结构图。如图4所示,计算设备400可以包括至少一个处理器402、存储器404、内存406和通信接口408,并且至少一个处理器402、存储器404、内存406和通信接口408经由总线410连接在一起。至少一个处理器402执行在存储器404中存储或编码的至少一个可执行指令(即,上述以软件形式实现的元素)。
在一个实施例中,在存储器404中存储的可执行指令在被至少一个处理器402执行时,使得计算设备实现以上结合图1-2B描述的各种过程。
计算设备400可以采用本领域任何适用的形式来实现,例如,其包括但不限于台式计算机、膝上型计算机、智能电话、平板计算机、消费电子设备、可穿戴智能设备等等。
本说明书的实施例还提供了一种机器可读存储介质。该机器可读存储介质可以存储有可执行指令,可执行指令在被机器执行时使得机器实现上面参照图1-2B描述的方法实施例的具体过程。
例如,机器可读存储介质可以包括但不限于随机存取存储器(Random Access Memory,RAM)、只读存储器(Read-Only Memory,ROM)、电可擦除可编程只读存储器(Electrically-Erasable Programmable Read-Only Memory,EEPROM)、静态随机存取存储器(Static Random Access Memory,SRAM)、硬盘、闪存等等。
应当理解的是,本说明书中的各个实施例均采用递进的方式来描述,各个实施例之间相同或相似的部分相互参见即可,每个实施例重点说明的都是与其它实施例的不同之处。例如,对于上述关于装置的实施例、关于计算设备的实施例以及关于机器可读存储介质的实施例而言,由于它们基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
上文对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
应当理解的是,对于本领域普通技术人员来说,对本说明书中的实施例进行的各种修改将是显而易见的,并且可以在不脱离权利要求书的保护范围的情况下,将本文所定义的一般性原理应用于其它变型。

Claims (15)

  1. 一种用于训练学习模型的方法,包括:
    接收当前流式样本数据;
    基于所述当前流式样本数据对当前深度学习模型进行训练,其中,浅层学习模型的参数被用作所述当前深度学习模型的初始化参数,所述浅层学习模型是基于与所述当前流式样本数据具有关联性的历史样本数据训练得到的。
  2. 根据权利要求1所述的方法,其中,所述历史样本数据是所述当前流式样本数据之前的历史流式样本数据,所述浅层学习模型是基于所述历史流式样本数据在线训练得到的。
  3. 根据权利要求1所述的方法,其中,所述历史样本数据是离线样本数据,所述浅层学习模型是基于所述离线样本数据离线训练得到的。
  4. 根据权利要求1至3中任一项所述的方法,还包括:
    在对所述当前深度学习模型训练结束,得到训练后的深度学习模型之后:
    如果所述训练后的深度学习模型与所述当前深度学习模型相比性能得到提升,则将所述训练后的深度学习模型作为最新深度学习模型;
    如果所述训练后的深度学习模型与所述当前深度学习模型相比性能未得到提升,则进行以下操作:
    增加所述当前深度学习模型的隐藏层的层数,得到增加层数后的深度学习模型;
    基于所述当前流式样本数据对所述增加层数后的深度学习模型进行训练,以得到新深度学习模型;
    基于所述新深度学习模型与所述当前深度学习模型的性能比较结果,确定最新深度学习模型。
  5. 根据权利要求4所述的方法,其中,所述浅层学习模型的参数被用作所述增加层数后的深度学习模型的初始化参数。
  6. 根据权利要求4或5所述的方法,其中,所述基于所述新深度学习模型与所述训练后的深度学习模型的性能比较结果,确定最新深度学习模型,包括:
    如果所述新深度学习模型与所述当前深度学习模型相比性能得到提升,则将所述新深度学习模型作为最新深度学习模型;
    如果所述新深度学习模型与所述当前深度学习模型相比性能未得到提升,则将所述当前深度学习模型作为最新深度学习模型。
  7. 根据权利要求4至6中任一项所述的方法,还包括:
    将所述最新深度学习模型和所述浅层学习模型进行加权,以得到最新学习模型。
  8. 一种用于训练学习模型的装置,包括:
    接收单元,用于接收当前流式样本数据;
    训练单元,用于基于所述当前流式样本数据对当前深度学习模型进行训练,其中,浅层学习模型的参数被用作所述当前深度学习模型的初始化参数,所述浅层学习模型是基于与所述当前流式样本数据具有关联性的历史样本数据训练得到的。
  9. 根据权利要求8所述的装置,其中,所述历史样本数据是所述当前流式样本数据之前的历史流式样本数据,所述浅层学习模型是基于所述历史流式样本数据在线训练得到的。
  10. 根据权利要求8所述的装置,其中,所述历史样本数据是离线样本数据,所述浅层学习模型是基于所述离线样本数据离线训练得到的。
  11. 根据权利要求8至10中任一项所述的装置,还包括评估单元,
    其中,在所述训练单元对所述当前深度学习模型训练结束,得到训练后的深度学习模型之后:
    如果所述训练后的深度学习模型与所述当前深度学习模型相比性能得到提升,则所述评估单元用于将所述训练后的深度学习模型作为最新深度学习模型;
    如果所述训练后的深度学习模型与所述当前深度学习模型相比性能未得到提升,则:
    所述训练单元还用于增加所述当前深度学习模型的隐藏层的层数,得到增加层数后的深度学习模型,并且基于所述当前流式样本数据对所述增加层数后的深度学习模型进行训练,以得到新深度学习模型;
    所述评估单元用于基于所述新深度学习模型与所述当前深度学习模型的性能比较结果,确定最新深度学习模型。
  12. 根据权利要求11所述的装置,其中,所述浅层学习模型的参数被用作所述增加层数后的深度学习模型的初始化参数。
  13. 根据权利要求11或12所述的装置,其中,所述评估单元具体用于:
    如果所述新深度学习模型与所述当前深度学习模型相比性能得到提升,则将所述新深度学习模型作为最新深度学习模型;
    如果所述新深度学习模型与所述当前深度学习模型相比性能未得到提升,则将所述当前深度学习模型作为最新深度学习模型。
  14. 根据权利要求11至13中任一项所述的装置,还包括:
    加权单元,用于将所述最新深度学习模型和所述浅层学习模型进行加权,以得到最 新学习模型。
  15. 一种计算设备,包括:
    至少一个处理器;
    与所述至少一个处理器进行通信的存储器,其上存储有可执行指令,所述可执行指令在被所述至少一个处理器执行时使得所述至少一个处理器实现根据权利要求1至7中任一项所述的方法。
PCT/CN2020/073834 2019-03-29 2020-01-22 用于训练学习模型的方法、装置和计算设备 WO2020199743A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
SG11202104298VA SG11202104298VA (en) 2019-03-29 2020-01-22 Methods, apparatuses, and computing devices for trainings of learning models
EP20781845.1A EP3852014A4 (en) 2019-03-29 2020-01-22 METHOD AND APPARATUS FOR TRAINING A LEARNING MODEL, AND COMPUTER DEVICE
US17/246,201 US11514368B2 (en) 2019-03-29 2021-04-30 Methods, apparatuses, and computing devices for trainings of learning models

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910250563.9 2019-03-29
CN201910250563.9A CN110059802A (zh) 2019-03-29 2019-03-29 用于训练学习模型的方法、装置和计算设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/246,201 Continuation US11514368B2 (en) 2019-03-29 2021-04-30 Methods, apparatuses, and computing devices for trainings of learning models

Publications (1)

Publication Number Publication Date
WO2020199743A1 true WO2020199743A1 (zh) 2020-10-08

Family

ID=67317880

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/073834 WO2020199743A1 (zh) 2019-03-29 2020-01-22 用于训练学习模型的方法、装置和计算设备

Country Status (5)

Country Link
US (1) US11514368B2 (zh)
EP (1) EP3852014A4 (zh)
CN (1) CN110059802A (zh)
SG (1) SG11202104298VA (zh)
WO (1) WO2020199743A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112978919A (zh) * 2021-01-29 2021-06-18 上海西派埃智能化系统有限公司 污水处理厂碳源投加系统和方法

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110059802A (zh) 2019-03-29 2019-07-26 阿里巴巴集团控股有限公司 用于训练学习模型的方法、装置和计算设备
CN110704135B (zh) * 2019-09-26 2020-12-08 北京智能工场科技有限公司 一种基于虚拟环境的竞赛数据处理系统和方法
CN111599447B (zh) * 2020-05-18 2023-10-10 上海联影医疗科技股份有限公司 一种数据处理方法、装置、电子设备及存储介质
US11568176B1 (en) * 2021-06-25 2023-01-31 Amazon Technologies, Inc. Deep feature extraction and training tools and associated methods
CN116777629B (zh) * 2023-07-06 2024-05-03 创业树(厦门)数字科技有限公司 一种在线交易管理系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150206064A1 (en) * 2014-01-19 2015-07-23 Jacob Levman Method for supervised machine learning
CN105608219A (zh) * 2016-01-07 2016-05-25 上海通创信息技术有限公司 一种基于聚类的流式推荐引擎、推荐系统以及推荐方法
CN107070940A (zh) * 2017-05-03 2017-08-18 微梦创科网络科技(中国)有限公司 一种从流式登录日志中判断恶意登录ip地址的方法及装置
CN107707541A (zh) * 2017-09-28 2018-02-16 小花互联网金融服务(深圳)有限公司 一种流式的基于机器学习的攻击行为日志实时检测方法
CN109344959A (zh) * 2018-08-27 2019-02-15 联想(北京)有限公司 神经网络训练方法、神经网络系统和计算机系统
CN110059802A (zh) * 2019-03-29 2019-07-26 阿里巴巴集团控股有限公司 用于训练学习模型的方法、装置和计算设备

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109196527A (zh) * 2016-04-13 2019-01-11 谷歌有限责任公司 广度和深度机器学习模型
CN107452023A (zh) * 2017-07-21 2017-12-08 上海交通大学 一种基于卷积神经网络在线学习的单目标跟踪方法和系统
CN107506590A (zh) * 2017-08-26 2017-12-22 郑州大学 一种基于改进深度信念网络的心血管疾病预测模型
CN107944376A (zh) 2017-11-20 2018-04-20 北京奇虎科技有限公司 视频数据实时姿态识别方法及装置、计算设备
CN108010321B (zh) * 2017-12-25 2019-10-25 北京理工大学 一种交通流预测方法
CN108053080B (zh) 2017-12-30 2021-05-11 中国移动通信集团江苏有限公司 区域用户数量统计值预测方法、装置、设备及介质
CN108520197A (zh) * 2018-02-28 2018-09-11 中国航空工业集团公司洛阳电光设备研究所 一种遥感图像目标检测方法及装置
CN108829684A (zh) * 2018-05-07 2018-11-16 内蒙古工业大学 一种基于迁移学习策略的蒙汉神经机器翻译方法
US10380997B1 (en) * 2018-07-27 2019-08-13 Deepgram, Inc. Deep learning internal state index-based search and classification
CN109102126B (zh) * 2018-08-30 2021-12-10 燕山大学 一种基于深度迁移学习的理论线损率预测模型
CN109359385B (zh) * 2018-10-17 2021-11-23 网宿科技股份有限公司 一种服务质量评估模型的训练方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150206064A1 (en) * 2014-01-19 2015-07-23 Jacob Levman Method for supervised machine learning
CN105608219A (zh) * 2016-01-07 2016-05-25 上海通创信息技术有限公司 一种基于聚类的流式推荐引擎、推荐系统以及推荐方法
CN107070940A (zh) * 2017-05-03 2017-08-18 微梦创科网络科技(中国)有限公司 一种从流式登录日志中判断恶意登录ip地址的方法及装置
CN107707541A (zh) * 2017-09-28 2018-02-16 小花互联网金融服务(深圳)有限公司 一种流式的基于机器学习的攻击行为日志实时检测方法
CN109344959A (zh) * 2018-08-27 2019-02-15 联想(北京)有限公司 神经网络训练方法、神经网络系统和计算机系统
CN110059802A (zh) * 2019-03-29 2019-07-26 阿里巴巴集团控股有限公司 用于训练学习模型的方法、装置和计算设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3852014A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112978919A (zh) * 2021-01-29 2021-06-18 上海西派埃智能化系统有限公司 污水处理厂碳源投加系统和方法

Also Published As

Publication number Publication date
EP3852014A4 (en) 2022-06-08
EP3852014A1 (en) 2021-07-21
US11514368B2 (en) 2022-11-29
SG11202104298VA (en) 2021-05-28
US20210256423A1 (en) 2021-08-19
CN110059802A (zh) 2019-07-26

Similar Documents

Publication Publication Date Title
WO2020199743A1 (zh) 用于训练学习模型的方法、装置和计算设备
US20240046106A1 (en) Multi-task neural networks with task-specific paths
US20230252327A1 (en) Neural architecture search for convolutional neural networks
Du et al. Learning resource allocation and pricing for cloud profit maximization
WO2020253466A1 (zh) 一种用户界面的测试用例生成方法及装置
US10474827B2 (en) Application recommendation method and application recommendation apparatus
JP7296978B2 (ja) 対戦ゲームプレイに参加させるためのプレーヤへのインセンティブ授与
US20210365782A1 (en) Method and apparatus for generating neural network model, and computer-readable storage medium
WO2015103964A1 (en) Method, apparatus, and device for determining target user
CN109690576A (zh) 在多个机器学习任务上训练机器学习模型
CN108962238A (zh) 基于结构化神经网络的对话方法、系统、设备及存储介质
JP7412101B2 (ja) 強化学習エージェントを使用して交渉タスクを実行するための方法及びシステム
CN109847366B (zh) 用于游戏的数据处理方法和装置
WO2019084560A1 (en) SEARCH FOR NEURONAL ARCHITECTURES
US11907821B2 (en) Population-based training of machine learning models
CN112292701A (zh) 在多方策略互动中进行策略搜索
CN108122168B (zh) 社交活动网络中种子节点筛选方法和装置
CN108154197A (zh) 实现虚拟场景中图像标注验证的方法及装置
US20200202430A1 (en) Recommending shared products
CN112292696A (zh) 确定执行设备的动作选择方针
CN112639841A (zh) 用于在多方策略互动中进行策略搜索的采样方案
US11468521B2 (en) Social media account filtering method and apparatus
CN114072809A (zh) 经由神经架构搜索的小且快速的视频处理网络
CN112292699A (zh) 确定执行设备的动作选择方针
TWI785346B (zh) 用於轉換資料及最佳化資料轉換之雙重機器學習管道

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20781845

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020781845

Country of ref document: EP

Effective date: 20210412

NENP Non-entry into the national phase

Ref country code: DE