CN111931916A - Exploration method and device of deep learning model - Google Patents

Exploration method and device of deep learning model Download PDF

Info

Publication number
CN111931916A
CN111931916A CN202010814501.9A CN202010814501A CN111931916A CN 111931916 A CN111931916 A CN 111931916A CN 202010814501 A CN202010814501 A CN 202010814501A CN 111931916 A CN111931916 A CN 111931916A
Authority
CN
China
Prior art keywords
model
deep learning
variant
historical
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010814501.9A
Other languages
Chinese (zh)
Other versions
CN111931916B (en
Inventor
赵仕嘉
林涛
董浩欣
杨鹤鸣
向雷
李晁铭
麦洪永
陈华荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Potevio Information Technology Co Ltd
Original Assignee
Potevio Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Potevio Information Technology Co Ltd filed Critical Potevio Information Technology Co Ltd
Priority to CN202010814501.9A priority Critical patent/CN111931916B/en
Publication of CN111931916A publication Critical patent/CN111931916A/en
Application granted granted Critical
Publication of CN111931916B publication Critical patent/CN111931916B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开一种深度学习模型的探索方法及装置,该方法包括:确定一个云计算资源作为主节点以及多个其他云计算资源作为多个从节点,其中,主节点用于调度多个从节点对深度学习模型执行训练操作;基于每个从节点对深度学习模型执行训练操作,得到该从节点的训练结果,每个从节点的训练结果包括目标深度学习模型以及该目标深度学习模型的评分,每个从节点的训练结果包括的目标深度学习模型为训练后的深度学习模型;根据所有训练结果包括的评分从所有目标深度学习模型中确定最优深度学习模型。可见,实施本发明能够实现并行进行深度学习模型的训练操作,提高深度学习模型训练的效率,减少训练的时间,有利于深度学习模型探索技术在业务场景中的应用。

Figure 202010814501

The invention discloses a method and device for exploring a deep learning model. The method includes: determining one cloud computing resource as a master node and multiple other cloud computing resources as multiple slave nodes, wherein the master node is used to schedule multiple slave nodes Perform a training operation on the deep learning model; perform a training operation on the deep learning model based on each slave node, and obtain the training result of the slave node, and the training result of each slave node includes the target deep learning model and the score of the target deep learning model, The target deep learning model included in the training result of each slave node is the trained deep learning model; the optimal deep learning model is determined from all the target deep learning models according to the scores included in all the training results. It can be seen that the implementation of the present invention can realize parallel training operations of deep learning models, improve the efficiency of deep learning model training, reduce training time, and facilitate the application of deep learning model exploration technology in business scenarios.

Figure 202010814501

Description

深度学习模型的探索方法及装置Exploring method and device for deep learning model

技术领域technical field

本发明涉及深度学习技术领域,尤其涉及一种深度学习模型的探索方法及装置。The present invention relates to the technical field of deep learning, and in particular, to a method and device for exploring a deep learning model.

背景技术Background technique

近年,深度学习技术由于具有能够降低用户使用复杂度和用户技术理解难度的特性,被快速应用到各个行业的业务场景中。又由于应用深度学习技术的业务场景具有多变性,所以为了充分挖掘深度学习技术的潜力,提高深度学习技术在实际应用中的精度,针对不同的业务场景训练得到适用于该业务场景的深度学习模型显得尤为重要。In recent years, deep learning technology has been quickly applied to business scenarios in various industries due to its characteristics of reducing the complexity of users' use and the difficulty of users' technical understanding. In addition, due to the variability of business scenarios in which deep learning technology is applied, in order to fully tap the potential of deep learning technology and improve the accuracy of deep learning technology in practical applications, a deep learning model suitable for the business scenario is obtained by training for different business scenarios. appear particularly important.

实际应用中,为得到适用于特定业务场景的深度学习模型,可以通过进行深度学习模型探索(通过不断地对各种深度学习网络结构和各种超参数进行训练与验证,然后从训练与验证结果中选取出最优的深度学习模型)的方式来得到适合的深度学习模型。然而,由于进行深度学习模型的探索需要大量的计算资源、深度学习模型生成和训练的过程有较高的关联性、深度学习模型的生成过程的串行度高等因素,现有的深度学习模型的探索方法的效率较低、探索时间较长,不利于深度学习模型的探索技术在业务场景中的应用。In practical applications, in order to obtain a deep learning model suitable for specific business scenarios, deep learning model exploration can be performed (by continuously training and verifying various deep learning network structures and various hyperparameters, and then from the training and verification results. Select the optimal deep learning model) to get a suitable deep learning model. However, due to the fact that the exploration of deep learning models requires a lot of computing resources, the process of deep learning model generation and training is highly correlated, and the generation process of deep learning models is highly serialized. The efficiency of the exploration method is low and the exploration time is long, which is not conducive to the application of the exploration technology of the deep learning model in business scenarios.

发明内容SUMMARY OF THE INVENTION

本发明所要解决的技术问题在于,提供一种深度学习模型的探索方法及装置,能够确定多个云计算资源来并行进行深度学习模型的训练操作,从而提高深度学习模型训练的效率,减少训练的时间,有利于深度学习模型探索技术在业务场景中的应用。The technical problem to be solved by the present invention is to provide a method and device for exploring deep learning models, which can determine multiple cloud computing resources to perform training operations of deep learning models in parallel, thereby improving the efficiency of deep learning model training and reducing training time. Time is conducive to the application of deep learning model exploration technology in business scenarios.

为了解决上述技术问题,本发明第一方面公开了一种深度学习模型的探索方法,所述方法包括:In order to solve the above technical problems, a first aspect of the present invention discloses a method for exploring a deep learning model, the method comprising:

确定一个云计算资源作为主节点以及多个其他云计算资源作为多个从节点,其中,所述主节点用于调度多个所述从节点对深度学习模型执行训练操作;Determining one cloud computing resource as the master node and multiple other cloud computing resources as multiple slave nodes, wherein the master node is used to schedule a plurality of the slave nodes to perform training operations on the deep learning model;

基于每个所述从节点对所述深度学习模型执行训练操作,得到该从节点的训练结果,每个所述从节点的训练结果包括目标深度学习模型以及该目标深度学习模型的评分,每个所述从节点的训练结果包括的目标深度学习模型为训练后的深度学习模型;Perform a training operation on the deep learning model based on each slave node, and obtain the training result of the slave node. The training result of each slave node includes the target deep learning model and the score of the target deep learning model. Each The target deep learning model included in the training result of the slave node is the deep learning model after training;

根据所有所述训练结果包括的评分从所有所述目标深度学习模型中确定最优深度学习模型。An optimal deep learning model is determined from all the target deep learning models according to the scores included in all the training results.

作为一种可选的实施方式,在本发明第一方面中,所述基于每个所述从节点对所述深度学习模型执行训练操作,得到该从节点的训练结果,包括:As an optional implementation manner, in the first aspect of the present invention, performing a training operation on the deep learning model based on each slave node to obtain the training result of the slave node, including:

创建每个所述从节点对应的第一进程和第二进程,并基于每个所述从节点的第一进程生成该从节点对应的多个深度学习模型;Create a first process and a second process corresponding to each of the slave nodes, and generate a plurality of deep learning models corresponding to the slave node based on the first process of each of the slave nodes;

基于每个所述从节点的第二进程、每个所述从节点的本地云计算资源以及确定出的超参数对该从节点对应的每个所述深度学习模型执行训练与验证操作,得到该从节点对应的训练结果。Perform training and verification operations on each of the deep learning models corresponding to the slave node based on the second process of each slave node, the local cloud computing resources of each slave node, and the determined hyperparameters, and obtain the The training result corresponding to the slave node.

作为一种可选的实施方式,在本发明第一方面中,所述基于每个所述从节点的第一进程生成该从节点对应的多个深度学习模型,包括:As an optional implementation manner, in the first aspect of the present invention, the generation of multiple deep learning models corresponding to the slave node based on the first process of each slave node includes:

基于每个所述从节点的第一进程,从确定出的历史模型池中选取多个历史模型作为该从节点对应的多个基础模型,所述历史模型为所有所述从节点已生成的深度学习模型;Based on the first process of each slave node, multiple historical models are selected from the determined historical model pool as multiple basic models corresponding to the slave node, and the historical models are the depths generated by all the slave nodes learning model;

基于确定出的模拟退火方法从每个所述从节点对应的所有所述基础模型中选取多个所述基础模型作为该从节点对应的多个目标基础模型;Based on the determined simulated annealing method, a plurality of the basic models are selected from all the basic models corresponding to each of the secondary nodes as the target basic models corresponding to the secondary node;

对每个所述从节点对应的每个所述目标基础模型执行模型变形操作,得到该目标基础模型对应的多个变种模型,并从该目标基础模型对应的多个变种模型中筛选出该目标基础模型对应的目标变种模型;Perform a model deformation operation on each of the target base models corresponding to each of the slave nodes, obtain multiple variant models corresponding to the target base model, and screen out the target from the multiple variant models corresponding to the target base model The target variant model corresponding to the base model;

对每个所述目标变种模型进行评分,并根据每个所述目标变种模型的评分从每个所述从节点对应的所有所述目标变种模型中筛选出该从节点对应的多个深度学习模型;Score each of the target variant models, and screen out a plurality of deep learning models corresponding to the slave node from all the target variant models corresponding to each of the slave nodes according to the score of each of the target variant models ;

其中,模型变形操作是指对神经网络模型随机进行网络结构加深操作、网络结构加宽操作和加跳层结构操作中的至少一种。The model deformation operation refers to randomly performing at least one of a network structure deepening operation, a network structure widening operation, and a skip layer structure operation on the neural network model.

作为一种可选的实施方式,在本发明第一方面中,所述从该目标基础模型对应的多个所述变种模型中筛选出该目标基础模型对应的目标变种模型,包括:As an optional implementation manner, in the first aspect of the present invention, the target variant model corresponding to the target basic model is selected from the plurality of variant models corresponding to the target basic model, including:

计算该目标基础模型对应的每个所述变种模型与所述历史模型池中每个所述历史模型的模型距离;Calculate the model distance between each of the variant models corresponding to the target base model and each of the historical models in the historical model pool;

判断该目标基础模型对应的每个所述变种模型是否存在匹配历史模型,该变种模型对应的匹配历史模型是与该变种模型的模型距离小于预设阈值的所述历史模型,当判断出该变种模型存在匹配历史模型时,将该变种模型从该目标基础模型对应的多个所述变种模型中删除;Determine whether each of the variant models corresponding to the target base model has a matching history model, and the matching history model corresponding to the variant model is the history model whose model distance from the variant model is less than a preset threshold. When there is a matching historical model in the model, delete the variant model from the plurality of variant models corresponding to the target base model;

对该目标基础模型对应的每个所述变种模型进行评分,并从该目标基础模型对应的多个所述变种模型中选取评分最高的所述变种模型作为该目标基础模型对应的目标变种模型。Each of the variant models corresponding to the target basic model is scored, and the variant model with the highest score is selected from the plurality of variant models corresponding to the target basic model as the target variant model corresponding to the target basic model.

作为一种可选的实施方式,在本发明第一方面中,所述计算该目标基础模型对应的每个所述变种模型与所述历史模型池中每个所述历史模型的模型距离,包括:As an optional implementation manner, in the first aspect of the present invention, calculating the model distance between each of the variant models corresponding to the target base model and each of the historical models in the historical model pool includes: :

计算该目标基础模型对应的每个所述变种模型与所述历史模型池中每个所述历史模型的普通层距离;Calculate the common layer distance between each of the variant models corresponding to the target base model and each of the historical models in the historical model pool;

计算该目标基础模型对应的每个所述变种模型与所述历史模型池中每个所述历史模型的跳层距离;Calculate the jumping layer distance between each of the variant models corresponding to the target base model and each of the historical models in the historical model pool;

将该目标基础模型对应的每个所述变种模型与所述历史模型池中每个所述历史模型的所述普通层距离和所述跳层距离相加以作为该变种模型和该历史模型的模型距离。adding the normal layer distance and the jump layer distance of each of the variant models corresponding to the target base model and each of the historical models in the historical model pool as models for the variant model and the historical model distance.

作为一种可选的实施方式,在本发明第一方面中,所述计算该目标基础模型对应的每个所述变种模型与所述历史模型池中每个所述历史模型的普通层距离,包括:As an optional implementation manner, in the first aspect of the present invention, the calculating the common layer distance between each of the variant models corresponding to the target base model and each of the historical models in the historical model pool, include:

对计算该目标基础模型对应的每个所述变种模型的普通层进行信息编码,得到该变种模型对应的普通层信息列表,以及对所述历史模型池中每个所述历史模型的普通层进行信息编码,得到该历史模型对应的普通层信息列表;Information coding is performed on the common layer of each of the variant models corresponding to the target base model to obtain a list of common layer information corresponding to the variant model, and the common layer of each of the historical models in the historical model pool is coded. Information coding, to obtain the common layer information list corresponding to the historical model;

根据该目标基础模型对应的每个所述变种模型对应的普通层信息列表和所述历史模型池中每个所述历史模型对应的普通层信息列表构造该变种模型与该历史模型对应的普通层矩阵;The normal layer corresponding to the variant model and the historical model is constructed according to the normal layer information list corresponding to each variant model corresponding to the target base model and the normal layer information list corresponding to each historical model in the historical model pool matrix;

以从左到右以及从上到下的顺序对该目标基础模型对应的每个所述变种模型和所述历史模型池中每个所述历史模型对应的所述普通层矩阵的每个元素进行赋值,其中,对该普通层矩阵的每个元素进行赋值的公式如下:Perform each element of the normal layer matrix corresponding to each of the variant models corresponding to the target base model and each of the historical models in the historical model pool in the order from left to right and from top to bottom. Assignment, where the formula for assigning each element of the normal layer matrix is as follows:

matrixij=min(matrixij+dist(M1i,M2j),matrixi(j-1),matrix(i-1)j)matrix ij =min(matrix ij +dist(M1 i ,M2 j ),matrix i(j-1) ,matrix (i-1)j )

其中,matrixij表示该普通层矩阵中第i行第j列的元素,M1i表示该变种模型的第i个普通层,M2j表示该历史模型的第j个普通层;Wherein, matrix ij represents the element in the ith row and jth column of the common layer matrix, M1 i represents the ith common layer of the variant model, and M2 j represents the jth common layer of the historical model;

其中,函数dist用于计算M1i和M2j两个层之间的距离,当M1i和M2j是不同类型的层时,dist(M1i,M2j)的值为1,当M1i和M2j是相同类型的层时,dist(M1i,M2j)的值通过以下方式计算:Among them, the function dist is used to calculate the distance between the two layers of M1 i and M2 j . When M1 i and M2 j are different types of layers, the value of dist(M1 i , M2 j ) is 1. When M1 i and M2 j are different types of layers When M2 j are layers of the same type, the value of dist(M1 i ,M2 j ) is calculated by:

Figure BDA0002632188850000031
Figure BDA0002632188850000031

其中,ak为M1i表示的普通层的信息编码中第k个参数信息,bk为M2j表示的普通层的信息编码中第k个参数信息,n为信息编码中包含的参数信息的个数;Among them, a k is the k-th parameter information in the information coding of the common layer represented by M1 i , b k is the k-th parameter information in the information coding of the common layer represented by M2 j , and n is the parameter information contained in the information coding. number;

取该目标基础模型对应的每个所述变种模型和所述历史模型池中每个所述历史模型对应的所述普通层矩阵右下角的元素作为该变种模型与该历史模型的普通层距离。Each of the variant models corresponding to the target base model and the element in the lower right corner of the normal layer matrix corresponding to each of the historical models in the historical model pool are taken as the normal layer distance between the variant model and the historical model.

作为一种可选的实施方式,在本发明第一方面中,所述计算该目标基础模型对应的每个所述变种模型与所述历史模型池中每个所述历史模型的跳层距离,包括:As an optional implementation manner, in the first aspect of the present invention, calculating the layer-hopping distance between each of the variant models corresponding to the target base model and each of the historical models in the historical model pool, include:

对该目标基础模型对应的每个所述变种模型的跳层进行信息编码,得到该变种模型对应的跳层信息列表,以及对所述历史模型池中每个所述历史模型的跳层进行信息编码,得到该历史模型对应的跳层信息列表;Information coding is performed on the skip layer of each of the variant models corresponding to the target base model, to obtain a layer skip information list corresponding to the variant model, and information on the skip layer of each of the historical models in the historical model pool is performed. Encoding to obtain the layer-hopping information list corresponding to the historical model;

根据该目标基础模型对应的每个所述变种模型对应的跳层信息列表和所述历史模型池中每个所述历史模型对应的跳层信息列表构造该变种模型和该历史模型对应的跳层矩阵,所述跳层矩阵的行数为该变种模型对应的跳层信息列表的长度,所述跳层矩阵的列数为该历史模型对应的跳层信息列表的长度;According to the layer jump information list corresponding to each of the variant models corresponding to the target base model and the layer jump information list corresponding to each of the historical models in the historical model pool, construct the layer jump corresponding to the variant model and the historical model matrix, the number of rows of the layer-hopping matrix is the length of the layer-hopping information list corresponding to the variant model, and the number of columns of the layer-hopping matrix is the length of the layer-hopping information list corresponding to the historical model;

根据以下公式对该目标基础模型对应的每个所述变种模型与所述历史模型池中每个所述历史模型对应的所述跳层矩阵的每个元素进行赋值:Each element of the layer jump matrix corresponding to each of the variant models corresponding to the target basic model and each of the historical models in the historical model pool is assigned a value according to the following formula:

skip_connection_matrixpq=dist2(S1p,S2q)skip_connection_matrix pq = dist2(S1 p , S2 q )

其中,skip_connection_matrix表示该跳层矩阵,skip_connection_matrixpq表示该跳层矩阵中第p行第q列的元素,S1p表示该变种模型的第p个跳层,S2q表示该历史模型的第q个跳层;Among them, skip_connection_matrix represents the skip layer matrix, skip_connection_matrix pq represents the element in the pth row and the qth column of the skip layer matrix, S1 p represents the pth skip layer of the variant model, and S2 q represents the qth skip of the historical model Floor;

其中,函数dist2用于计算S1p和S2q两个层之间的距离,当S1p和S2q是不同类型的层时,dist2(S1p,S2q)的值为1,当S1p和S2q是相同类型的层时,dist2(S1p,S2q)的值通过以下方式计算:Among them, the function dist2 is used to calculate the distance between the two layers of S1 p and S2 q . When S1 p and S2 q are different types of layers, the value of dist2 (S1 p , S2 q ) is 1. When S1 p and S2 q are different types of layers, the value of dist2 is 1. When S2 q are layers of the same type, the value of dist2(S1 p , S2 q ) is calculated by:

Figure BDA0002632188850000041
Figure BDA0002632188850000041

其中,ps表示该变种模型中第p层跳层中起始层在该模型中的层位置索引,qs表示该历史模型中第q层跳层中起始层在该模型中的层位置索引,pl表示该变种模型中第p层跳层的深度,ql表示该历史模型中第q层跳层的深度;Among them, p s represents the layer position index of the starting layer in the p-th layer jumping layer in the variant model, and q s represents the layer position of the starting layer in the q-th layer jumping layer in the historical model. index, p l represents the depth of the p-th layer jumping layer in the variant model, q l represents the q-th layer jumping layer depth in the historical model;

根据以下公式计算该目标基础模型对应的每个所述变种模型与所述历史模型池中每个所述历史模型之间的跳层距离:Calculate the jump distance between each of the variant models corresponding to the target base model and each of the historical models in the historical model pool according to the following formula:

dists=sum(skip_connection_matrix)+|s1-s2|dist s =sum(skip_connection_matrix)+|s1-s2|

其中,dists表示该变种模型和该历史模型之间的跳层距离,sum(skip_connection_matrix)表示将跳层矩阵skip_connection_matrix中的每一个元素相加进行求和,s1表示该变种模型对应的跳层信息列表的长度,s2表示该历史模型对应的跳层信息列表的长度。Among them, dist s represents the jumping distance between the variant model and the historical model, sum(skip_connection_matrix) means adding and summing each element in the skip_connection_matrix, and s1 represents the jumping information corresponding to the variant model The length of the list, s2 represents the length of the layer jump information list corresponding to the history model.

作为一种可选的实施方式,在本发明第一方面中,所述基于每个所述从节点的第二进程、每个所述从节点的本地云计算资源以及确定出的超参数对该从节点对应的每个所述深度学习模型执行训练与验证操作,得到该从节点对应的训练结果,包括:As an optional implementation manner, in the first aspect of the present invention, based on the second process of each of the slave nodes, the local cloud computing resources of each of the slave nodes, and the determined hyperparameters, the Each of the deep learning models corresponding to the slave node performs training and verification operations, and obtains the training result corresponding to the slave node, including:

基于每个所述从节点的第二进程以及每个所述从节点的本地云计算资源,针对该从节点对应的每个所述深度学习模型,拟定该深度学习模型对应的超参数空间,所述超参数空间至少包括批处理数量和学习率;Based on the second process of each slave node and the local cloud computing resources of each slave node, for each of the deep learning models corresponding to the slave node, the hyperparameter space corresponding to the deep learning model is formulated, so The hyperparameter space includes at least the number of batches and the learning rate;

设定每个所述从节点对应的每个所述深度学习模型对应的搜索次数;Set the number of searches corresponding to each of the deep learning models corresponding to each of the slave nodes;

构造每个所述从节点对应的每个所述深度学习模型对应的集合,该集合用于保存该深度学习模型进行训练后在验证集上的评分;Construct a set corresponding to each of the deep learning models corresponding to each of the slave nodes, and the set is used to save the score on the verification set after the deep learning model is trained;

根据以下公式设定每个所述从节点对应的每个所述深度学习模型对应的目标函数:The objective function corresponding to each of the deep learning models corresponding to each of the slave nodes is set according to the following formula:

F=max(SC)F=max(SC)

其中,F为目标函数,SC为该深度学习模型进行训练后在验证集上的评分;Among them, F is the objective function, and SC is the score on the validation set after the deep learning model is trained;

在每个所述从节点对应的每个所述深度学习模型对应的所述超参数空间中随机选取一个起点,然后通过高斯过程映射在该超参数空间中循环进行搜索以选取出该深度学习模型对应的多个目标超参数,其中,高斯过程映射可以表示为:Randomly select a starting point in the hyperparameter space corresponding to each of the deep learning models corresponding to each of the slave nodes, and then perform a circular search in the hyperparameter space through Gaussian process mapping to select the deep learning model The corresponding multiple target hyperparameters, where the Gaussian process map can be expressed as:

T=G(C、R、F、J)T=G(C, R, F, J)

其中,T为该深度学习模型对应的目标超参数,每个所述目标超参数均是G推荐的值得尝试的超参数的值,C为该深度学习模型对应的超参数空间,R为该深度学习模型对应的集合,F为该深度学习模型对应的目标函数,J为该深度学习模型对应的搜索次数;Among them, T is the target hyperparameter corresponding to the deep learning model, each target hyperparameter is the value of the hyperparameter worth trying recommended by G, C is the hyperparameter space corresponding to the deep learning model, and R is the depth The set corresponding to the learning model, F is the objective function corresponding to the deep learning model, and J is the number of searches corresponding to the deep learning model;

基于每个所述从节点对应的每个所述深度学习模型对应的每个所述目标超参数对该深度学习模型执行训练与验证操作,得到该深度学习模型对应的多个中间深度学习模型以及每个所述中间深度学习模型对应的评分;Perform training and verification operations on the deep learning model based on each of the target hyperparameters corresponding to each of the deep learning models corresponding to each of the slave nodes, to obtain a plurality of intermediate deep learning models corresponding to the deep learning model and a score corresponding to each of the intermediate deep learning models;

从每个所述从节点对应的每个所述深度学习模型对应的所有所述中间深度学习模型中,选取所述评分最高的所述中间深度学习模型以及该中间深度学习模型对应的所述评分作为该从节点对应的训练结果。From all the intermediate deep learning models corresponding to each of the deep learning models corresponding to each of the slave nodes, select the intermediate deep learning model with the highest score and the score corresponding to the intermediate deep learning model as the training result corresponding to the slave node.

本发明第二方面公开了一种深度学习模型的探索装置,所述装置包括:A second aspect of the present invention discloses a device for exploring a deep learning model, the device comprising:

确定模块,用于确定一个云计算资源作为主节点以及多个其他云计算资源作为多个从节点,其中,所述主节点用于调度多个所述从节点对深度学习模型执行训练操作;a determination module, configured to determine one cloud computing resource as the master node and multiple other cloud computing resources as multiple slave nodes, wherein the master node is used to schedule a plurality of the slave nodes to perform training operations on the deep learning model;

训练模块,用于基于每个所述从节点对所述深度学习模型执行训练操作,得到该从节点的训练结果,每个所述从节点的训练结果包括目标深度学习模型以及该目标深度学习模型的评分,每个所述从节点的训练结果包括的目标深度学习模型为训练后的深度学习模型;A training module, configured to perform a training operation on the deep learning model based on each of the slave nodes to obtain a training result of the slave node, where the training result of each slave node includes the target deep learning model and the target deep learning model The score, the target deep learning model included in the training result of each described slave node is the deep learning model after training;

所述确定模块,还用于根据所有所述训练结果包括的评分从所有所述目标深度学习模型中确定最优深度学习模型。The determining module is further configured to determine an optimal deep learning model from all the target deep learning models according to the scores included in all the training results.

作为一种可选的实施方式,在本发明第二方面中,所述训练模块包括:As an optional implementation manner, in the second aspect of the present invention, the training module includes:

创建子模块,用于创建每个所述从节点对应的第一进程和第二进程;creating a submodule for creating a first process and a second process corresponding to each of the slave nodes;

生成子模块,用于基于每个所述从节点的第一进程生成该从节点对应的多个深度学习模型;generating a submodule for generating a plurality of deep learning models corresponding to the slave node based on the first process of each slave node;

训练子模块,用于基于每个所述从节点的第二进程、每个所述从节点的本地云计算资源以及确定出的超参数对该从节点对应的每个所述深度学习模型执行训练与验证操作,得到该从节点对应的训练结果。A training submodule for performing training on each of the deep learning models corresponding to the slave node based on the second process of each slave node, the local cloud computing resources of each slave node, and the determined hyperparameters With the verification operation, the training result corresponding to the slave node is obtained.

作为一种可选的实施方式,在本发明第二方面中,所述生成子模块包括:As an optional implementation manner, in the second aspect of the present invention, the generating submodule includes:

选取单元,用于基于每个所述从节点的第一进程,从确定出的历史模型池中选取多个历史模型作为该从节点对应的多个基础模型,所述历史模型为所有所述从节点已生成的深度学习模型;The selection unit is used to select a plurality of historical models from the determined historical model pool as a plurality of basic models corresponding to the slave node based on the first process of each of the slave nodes, and the historical models are all the slave nodes. The deep learning model that the node has generated;

所述选取单元,还用于基于确定出的模拟退火方法从每个所述从节点对应的所有所述基础模型中选取多个所述基础模型作为该从节点对应的多个目标基础模型;The selection unit is further configured to select a plurality of the basic models from all the basic models corresponding to each of the secondary nodes based on the determined simulated annealing method as the target basic models corresponding to the secondary node;

变形单元,用于对每个所述从节点对应的每个所述目标基础模型执行模型变形操作,得到该目标基础模型对应的多个变种模型;a deformation unit, configured to perform a model deformation operation on each of the target basic models corresponding to each of the slave nodes, to obtain a plurality of variant models corresponding to the target basic model;

第一筛选单元,用于从该目标基础模型对应的多个变种模型中筛选出该目标基础模型对应的目标变种模型;a first screening unit, configured to screen out a target variant model corresponding to the target basic model from a plurality of variant models corresponding to the target basic model;

第二筛选单元,用于对每个所述目标变种模型进行评分,并根据每个所述目标变种模型的评分从每个所述从节点对应的所有所述目标变种模型中筛选出该从节点对应的多个深度学习模型;The second screening unit is configured to score each of the target variant models, and filter out the subordinate node from all the target variant models corresponding to each of the subordinate nodes according to the score of each of the target variant models Corresponding multiple deep learning models;

其中,模型变形操作是指对神经网络模型随机进行网络结构加深操作、网络结构加宽操作和加跳层结构操作中的至少一种。The model deformation operation refers to randomly performing at least one of a network structure deepening operation, a network structure widening operation, and a skip layer structure operation on the neural network model.

作为一种可选的实施方式,在本发明第二方面中,所述第一筛选单元包括:As an optional embodiment, in the second aspect of the present invention, the first screening unit includes:

计算子单元,用于计算该目标基础模型对应的每个所述变种模型与所述历史模型池中每个所述历史模型的模型距离;A calculation subunit for calculating the model distance between each of the variant models corresponding to the target base model and each of the historical models in the historical model pool;

判断子单元,用于判断该目标基础模型对应的每个所述变种模型是否存在匹配历史模型,该变种模型对应的匹配历史模型是与该变种模型的模型距离小于预设阈值的所述历史模型,当判断出该变种模型存在匹配历史模型时,将该变种模型从该目标基础模型对应的多个所述变种模型中删除;A judging subunit for judging whether each of the variant models corresponding to the target base model has a matching history model, and the matching history model corresponding to the variant model is the history model whose model distance from the variant model is less than a preset threshold , when it is determined that there is a matching historical model in the variant model, delete the variant model from the plurality of variant models corresponding to the target base model;

选取子单元,用于对该目标基础模型对应的每个所述变种模型进行评分,并从该目标基础模型对应的多个所述变种模型中选取评分最高的所述变种模型作为该目标基础模型对应的目标变种模型。Selecting a subunit for scoring each of the variant models corresponding to the target basic model, and selecting the variant model with the highest score from the plurality of variant models corresponding to the target basic model as the target basic model The corresponding target variant model.

作为一种可选的实施方式,在本发明第二方面中,所述计算子单元包括:As an optional implementation manner, in the second aspect of the present invention, the calculation subunit includes:

计算二级子单元,用于计算该目标基础模型对应的每个所述变种模型与所述历史模型池中每个所述历史模型的普通层距离;A second-level subunit is calculated for calculating the common layer distance between each of the variant models corresponding to the target base model and each of the historical models in the historical model pool;

所述计算二级子单元,还用于计算该目标基础模型对应的每个所述变种模型与所述历史模型池中每个所述历史模型的跳层距离;The calculating secondary subunit is also used to calculate the jump distance between each of the variant models corresponding to the target basic model and each of the historical models in the historical model pool;

相加二级子单元,用于将该目标基础模型对应的每个所述变种模型与所述历史模型池中每个所述历史模型的所述普通层距离和所述跳层距离相加以作为该变种模型和该历史模型的模型距离。The addition secondary subunit is used to add the normal layer distance and the jump layer distance of each of the variant models corresponding to the target basic model and each of the historical models in the historical model pool as The model distance between the variant model and the historical model.

作为一种可选的实施方式,在本发明第二方面中,所述计算二级子单元计算该目标基础模型对应的每个所述变种模型与所述历史模型池中每个所述历史模型的普通层距离的具体方式为:As an optional implementation manner, in the second aspect of the present invention, the calculating secondary subunit calculates each of the variant models corresponding to the target basic model and each of the historical models in the historical model pool The specific way of the common layer distance is:

对计算该目标基础模型对应的每个所述变种模型的普通层进行信息编码,得到该变种模型对应的普通层信息列表,以及对所述历史模型池中每个所述历史模型的普通层进行信息编码,得到该历史模型对应的普通层信息列表;Information coding is performed on the common layer of each of the variant models corresponding to the target base model to obtain a list of common layer information corresponding to the variant model, and the common layer of each of the historical models in the historical model pool is coded. Information coding, to obtain the common layer information list corresponding to the historical model;

根据该目标基础模型对应的每个所述变种模型对应的普通层信息列表和所述历史模型池中每个所述历史模型对应的普通层信息列表构造该变种模型与该历史模型对应的普通层矩阵;The normal layer corresponding to the variant model and the historical model is constructed according to the normal layer information list corresponding to each variant model corresponding to the target base model and the normal layer information list corresponding to each historical model in the historical model pool matrix;

以从左到右以及从上到下的顺序对该目标基础模型对应的每个所述变种模型和所述历史模型池中每个所述历史模型对应的所述普通层矩阵的每个元素进行赋值,其中,对该普通层矩阵的每个元素进行赋值的公式如下:Perform each element of the normal layer matrix corresponding to each of the variant models corresponding to the target base model and each of the historical models in the historical model pool in the order from left to right and from top to bottom. Assignment, where the formula for assigning each element of the normal layer matrix is as follows:

matrixij=min(matrixij+dist(M1i,M2j),matrixi(j-1),matrix(i-1)j)matrix ij =min(matrix ij +dist(M1 i ,M2 j ),matrix i(j-1) ,matrix (i-1)j )

其中,matrixij表示该普通层矩阵中第i行第j列的元素,M1i表示该变种模型的第i个普通层,M2j表示该历史模型的第j个普通层;Wherein, matrix ij represents the element in the ith row and jth column of the common layer matrix, M1 i represents the ith common layer of the variant model, and M2 j represents the jth common layer of the historical model;

其中,函数dist用于计算M1i和M2j两个层之间的距离,当M1i和M2j是不同类型的层时,dist(M1i,M2j)的值为1,当M1i和M2j是相同类型的层时,dist(M1i,M2j)的值通过以下方式计算:Among them, the function dist is used to calculate the distance between the two layers of M1 i and M2 j . When M1 i and M2 j are different types of layers, the value of dist(M1 i , M2 j ) is 1. When M1 i and M2 j are different types of layers When M2 j are layers of the same type, the value of dist(M1 i ,M2 j ) is calculated by:

Figure BDA0002632188850000071
Figure BDA0002632188850000071

其中,ak为M1i表示的普通层的信息编码中第k个参数信息,bk为M2j表示的普通层的信息编码中第k个参数信息,n为信息编码中包含的参数信息的个数;Among them, a k is the k-th parameter information in the information coding of the common layer represented by M1 i , b k is the k-th parameter information in the information coding of the common layer represented by M2 j , and n is the parameter information contained in the information coding. number;

取该目标基础模型对应的每个所述变种模型和所述历史模型池中每个所述历史模型对应的所述普通层矩阵右下角的元素作为该变种模型与该历史模型的普通层距离。Each of the variant models corresponding to the target base model and the element in the lower right corner of the normal layer matrix corresponding to each of the historical models in the historical model pool are taken as the normal layer distance between the variant model and the historical model.

作为一种可选的实施方式,在本发明第二方面中,所述计算二级子单元计算该目标基础模型对应的每个所述变种模型与所述历史模型池中每个所述历史模型的跳层距离的具体方式为:As an optional implementation manner, in the second aspect of the present invention, the calculating secondary subunit calculates each of the variant models corresponding to the target basic model and each of the historical models in the historical model pool The specific way of the jump distance is:

对该目标基础模型对应的每个所述变种模型的跳层进行信息编码,得到该变种模型对应的跳层信息列表,以及对所述历史模型池中每个所述历史模型的跳层进行信息编码,得到该历史模型对应的跳层信息列表;Information coding is performed on the skip layer of each of the variant models corresponding to the target base model, to obtain a layer skip information list corresponding to the variant model, and information on the skip layer of each of the historical models in the historical model pool is performed. Encoding to obtain the layer-hopping information list corresponding to the historical model;

根据该目标基础模型对应的每个所述变种模型对应的跳层信息列表和所述历史模型池中每个所述历史模型对应的跳层信息列表构造该变种模型和该历史模型对应的跳层矩阵,所述跳层矩阵的行数为该变种模型对应的跳层信息列表的长度,所述跳层矩阵的列数为该历史模型对应的跳层信息列表的长度;According to the layer jump information list corresponding to each of the variant models corresponding to the target base model and the layer jump information list corresponding to each of the historical models in the historical model pool, construct the layer jump corresponding to the variant model and the historical model matrix, the number of rows of the layer-hopping matrix is the length of the layer-hopping information list corresponding to the variant model, and the number of columns of the layer-hopping matrix is the length of the layer-hopping information list corresponding to the historical model;

根据以下公式对该目标基础模型对应的每个所述变种模型与所述历史模型池中每个所述历史模型对应的所述跳层矩阵的每个元素进行赋值:Each element of the layer jump matrix corresponding to each of the variant models corresponding to the target basic model and each of the historical models in the historical model pool is assigned a value according to the following formula:

skip_connection_matrixpq=dist2(S1p,S2q)skip_connection_matrix pq = dist2(S1 p , S2 q )

其中,skip_connection_matrix表示该跳层矩阵,skip_connection_matrixpq表示该跳层矩阵中第p行第q列的元素,S1p表示该变种模型的第p个跳层,S2q表示该历史模型的第q个跳层;Among them, skip_connection_matrix represents the skip layer matrix, skip_connection_matrix pq represents the element in the pth row and the qth column of the skip layer matrix, S1 p represents the pth skip layer of the variant model, and S2 q represents the qth skip of the historical model Floor;

其中,函数dist2用于计算S1p和S2q两个层之间的距离,当S1p和S2q是不同类型的层时,dist2(S1p,S2q)的值为1,当S1p和S2q是相同类型的层时,dist2(S1p,S2q)的值通过以下方式计算:Among them, the function dist2 is used to calculate the distance between the two layers S1 p and S2 q . When S1 p and S2 q are different types of layers, the value of dist2(S1 p , S2 q ) is 1, when S1 p and S2 q When S2 q are layers of the same type, the value of dist2(S1 p , S2 q ) is calculated by:

Figure BDA0002632188850000072
Figure BDA0002632188850000072

其中,ps表示该变种模型中第p层跳层中起始层在该模型中的层位置索引,qs表示该历史模型中第q层跳层中起始层在该模型中的层位置索引,pl表示该变种模型中第p层跳层的深度,ql表示该历史模型中第q层跳层的深度;Among them, p s represents the layer position index of the starting layer in the p-th layer jumping layer in the variant model, and q s represents the layer position of the starting layer in the q-th layer jumping layer in the historical model. index, p l represents the depth of the p-th layer jumping layer in the variant model, q l represents the q-th layer jumping layer depth in the historical model;

根据以下公式计算该目标基础模型对应的每个所述变种模型与所述历史模型池中每个所述历史模型之间的跳层距离:Calculate the jump distance between each of the variant models corresponding to the target base model and each of the historical models in the historical model pool according to the following formula:

dists=sum(skip_connection_matrix)+|s1-s2|dist s =sum(skip_connection_matrix)+|s1-s2|

其中,dists表示该变种模型和该历史模型之间的跳层距离,sum(skip_connection_matrix)表示将跳层矩阵skip_connection_matrix中的每一个元素相加进行求和,s1表示该变种模型对应的跳层信息列表的长度,s2表示该历史模型对应的跳层信息列表的长度。Among them, dist s represents the jumping distance between the variant model and the historical model, sum(skip_connection_matrix) means adding and summing each element in the skip_connection_matrix, and s1 represents the jumping information corresponding to the variant model The length of the list, s2 represents the length of the layer jump information list corresponding to the history model.

作为一种可选的实施方式,在本发明第二方面中,所述训练子模块基于每个所述从节点的第二进程、每个所述从节点的本地云计算资源以及确定出的超参数对该从节点对应的每个所述深度学习模型执行训练与验证操作,得到该从节点对应的训练结果的具体方式为:As an optional implementation manner, in the second aspect of the present invention, the training submodule is based on the second process of each of the slave nodes, the local cloud computing resources of each of the slave nodes, and the determined overtime The parameters perform training and verification operations on each of the deep learning models corresponding to the slave node, and the specific way to obtain the training result corresponding to the slave node is:

基于每个所述从节点的第二进程以及每个所述从节点的本地云计算资源,针对该从节点对应的每个所述深度学习模型,拟定该深度学习模型对应的超参数空间,所述超参数空间至少包括批处理数量和学习率;Based on the second process of each slave node and the local cloud computing resources of each slave node, for each of the deep learning models corresponding to the slave node, the hyperparameter space corresponding to the deep learning model is formulated, so The hyperparameter space includes at least the number of batches and the learning rate;

设定每个所述从节点对应的每个所述深度学习模型对应的搜索次数;Set the number of searches corresponding to each of the deep learning models corresponding to each of the slave nodes;

构造每个所述从节点对应的每个所述深度学习模型对应的集合,该集合用于保存该深度学习模型进行训练后在验证集上的评分;Construct a set corresponding to each of the deep learning models corresponding to each of the slave nodes, and the set is used to save the score on the verification set after the deep learning model is trained;

根据以下公式设定每个所述从节点对应的每个所述深度学习模型对应的目标函数:The objective function corresponding to each of the deep learning models corresponding to each of the slave nodes is set according to the following formula:

F=max(SC)F=max(SC)

其中,F为目标函数,SC为该深度学习模型进行训练后在验证集上的评分;Among them, F is the objective function, and SC is the score on the validation set after the deep learning model is trained;

在每个所述从节点对应的每个所述深度学习模型对应的所述超参数空间中随机选取一个起点,然后通过高斯过程映射在该超参数空间中循环进行搜索以选取出该深度学习模型对应的多个目标超参数,其中,高斯过程映射可以表示为:Randomly select a starting point in the hyperparameter space corresponding to each of the deep learning models corresponding to each of the slave nodes, and then perform a circular search in the hyperparameter space through Gaussian process mapping to select the deep learning model The corresponding multiple target hyperparameters, where the Gaussian process map can be expressed as:

T=G(C、R、F、J)T=G(C, R, F, J)

其中,T为该深度学习模型对应的目标超参数,每个所述目标超参数均是G推荐的值得尝试的超参数的值,C为该深度学习模型对应的超参数空间,R为该深度学习模型对应的集合,F为该深度学习模型对应的目标函数,J为该深度学习模型对应的搜索次数;Among them, T is the target hyperparameter corresponding to the deep learning model, each target hyperparameter is the value of the hyperparameter worth trying recommended by G, C is the hyperparameter space corresponding to the deep learning model, and R is the depth The set corresponding to the learning model, F is the objective function corresponding to the deep learning model, and J is the number of searches corresponding to the deep learning model;

基于每个所述从节点对应的每个所述深度学习模型对应的每个所述目标超参数对该深度学习模型执行训练与验证操作,得到该深度学习模型对应的多个中间深度学习模型以及每个所述中间深度学习模型对应的评分;Perform training and verification operations on the deep learning model based on each of the target hyperparameters corresponding to each of the deep learning models corresponding to each of the slave nodes, to obtain a plurality of intermediate deep learning models corresponding to the deep learning model and a score corresponding to each of the intermediate deep learning models;

从每个所述从节点对应的每个所述深度学习模型对应的所有所述中间深度学习模型中,选取所述评分最高的所述中间深度学习模型以及该中间深度学习模型对应的所述评分作为该从节点对应的训练结果。From all the intermediate deep learning models corresponding to each of the deep learning models corresponding to each of the slave nodes, select the intermediate deep learning model with the highest score and the score corresponding to the intermediate deep learning model as the training result corresponding to the slave node.

本发明第三方面公开了一种深度学习模型的探索装置,所述装置包括:A third aspect of the present invention discloses a device for exploring a deep learning model, the device comprising:

存储有可执行程序代码的存储器;a memory in which executable program code is stored;

与所述存储器耦合的处理器;a processor coupled to the memory;

所述处理器调用所述存储器中存储的所述可执行程序代码,执行本发明第一方面公开的深度学习模型的探索方法。The processor invokes the executable program code stored in the memory to execute the deep learning model exploration method disclosed in the first aspect of the present invention.

本发明第四方面公开了一种计算机存储介质,所述计算机存储介质存储有计算机指令,所述计算机指令被调用时,用于执行本发明第一方面公开的深度学习模型的探索方法。A fourth aspect of the present invention discloses a computer storage medium. The computer storage medium stores computer instructions, and when the computer instructions are invoked, they are used to execute the exploration method for the deep learning model disclosed in the first aspect of the present invention.

与现有技术相比,本发明具有以下有益效果:Compared with the prior art, the present invention has the following beneficial effects:

本发明实施例通过确定一个云计算资源作为主节点以及多个其他云计算资源作为多个从节点,然后基于每个从节点进行深度学习模型的训练操作得到该从节点的训练结果,最后根据训练结果的评分从所有的训练结果中确定出最优的深度学习模型,从而能够实现并行进行深度学习模型的训练操作,提高深度学习模型训练的效率,减少训练的时间,有利于深度学习模型探索技术在业务场景中的应用。In the embodiment of the present invention, one cloud computing resource is determined as the master node and multiple other cloud computing resources are used as multiple slave nodes, and then the training operation of the deep learning model is performed based on each slave node to obtain the training result of the slave node. The scoring of the results determines the optimal deep learning model from all the training results, so that the training operation of the deep learning model can be performed in parallel, the efficiency of the deep learning model training is improved, the training time is reduced, and it is beneficial to the deep learning model exploration technology. Applications in business scenarios.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative effort.

图1是本发明实施例公开的一种深度学习模型的探索方法的流程示意图;1 is a schematic flowchart of a method for exploring a deep learning model disclosed in an embodiment of the present invention;

图2是本发明实施例公开的另一种深度学习模型的探索方法的流程示意图;2 is a schematic flowchart of another method for exploring a deep learning model disclosed in an embodiment of the present invention;

图3是本发明实施例公开的一种深度学习模型的探索装置的结构示意图;3 is a schematic structural diagram of a device for exploring a deep learning model disclosed in an embodiment of the present invention;

图4是本发明实施例公开的另一种深度学习模型的探索装置的结构示意图;4 is a schematic structural diagram of another apparatus for exploring a deep learning model disclosed in an embodiment of the present invention;

图5是本发明实施例公开的又一种深度学习模型的探索装置的结构示意图。FIG. 5 is a schematic structural diagram of another apparatus for exploring a deep learning model disclosed in an embodiment of the present invention.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make those skilled in the art better understand the solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、装置、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。The terms "first", "second" and the like in the description and claims of the present invention and the above drawings are used to distinguish different objects, rather than to describe a specific order. Furthermore, the terms "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, apparatus, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally also includes For other steps or units inherent to these processes, methods, products or devices.

在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本发明的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

本发明公开了一种深度学习模型的探索方法及装置,通过确定一个云计算资源作为主节点以及多个其他云计算资源作为多个从节点,然后基于每个从节点进行深度学习模型的训练操作得到该从节点的训练结果,最后根据训练结果的评分从所有的训练结果中确定出最优的深度学习模型,从而能够实现并行进行深度学习模型的训练操作,提高深度学习模型训练的效率,减少训练的时间,有利于深度学习模型探索技术在业务场景中的应用。The invention discloses a method and device for exploring a deep learning model. By determining one cloud computing resource as a master node and multiple other cloud computing resources as multiple slave nodes, and then performing a deep learning model training operation based on each slave node The training result of the slave node is obtained, and finally the optimal deep learning model is determined from all the training results according to the score of the training result, so that the training operation of the deep learning model can be performed in parallel, the efficiency of deep learning model training can be improved, and the The training time is conducive to the application of deep learning model exploration technology in business scenarios.

实施例一Example 1

请参阅图1,图1是本发明实施例公开的一种深度学习模型的探索方法的流程示意图。如图1所示,该深度学习模型的探索方法可以包括以下操作:Please refer to FIG. 1. FIG. 1 is a schematic flowchart of a method for exploring a deep learning model disclosed in an embodiment of the present invention. As shown in Figure 1, the exploration method of the deep learning model can include the following operations:

101、确定一个云计算资源作为主节点以及多个其他云计算资源作为多个从节点,其中,主节点用于调度多个从节点对深度学习模型执行训练操作。101. Determine one cloud computing resource as the master node and multiple other cloud computing resources as multiple slave nodes, where the master node is used to schedule multiple slave nodes to perform training operations on the deep learning model.

在上述步骤101中,主节点主要负责和用户的交互、从节点的启动、从节点的交互和深度学习模型的训练结果的汇总。主节点的中枢为一个消息循环模块,该消息循环模块用于管理主节点自身各个模块的交互,主节点自身的模块包括调参器、评估器、训练服务模块等,其中,调参器用于生成深度学习模型的网络结构,调参器在初始化阶段需指定方式,例如Network Morphism。评估器用于评估网络结构的好坏。训练服务模块用于调度从节点进行训练。主节点自身各个模块的交互、主节点和从节点的交互、主节点和用户的交互均可以通过网络通讯进行。在并行训练的过程中,主节点将调参器发送给从节点,这样从节点就能够自主生成和训练深度学习模型。In the above step 101, the master node is mainly responsible for the interaction with the user, the startup of the slave node, the interaction of the slave node, and the aggregation of the training results of the deep learning model. The hub of the master node is a message loop module, which is used to manage the interaction of each module of the master node itself. The modules of the master node include parameter adjusters, evaluators, training service modules, etc. The network structure of the deep learning model, the parameter adjuster needs to specify the method in the initialization stage, such as Network Morphism. The evaluator is used to evaluate the quality of the network structure. The training service module is used to schedule slave nodes for training. The interaction of each module of the master node itself, the interaction between the master node and the slave node, and the interaction between the master node and the user can all be carried out through network communication. In the process of parallel training, the master node sends the parameter adjuster to the slave node, so that the slave node can generate and train the deep learning model autonomously.

具体地,云计算资源的启动及调度过程可以参见下面的描述,用户通过yml配置相关任务信息(例如图像分类相关),然后通过shell脚本启动主节点,主节点启动nodejs,在这个过程中,用户可以使用web查看云运行状态。主节点启动自身消息循环模块与zmq模块,然后开始基于typescript管理模块间(typescript为一种java软件,这里做网络通信)进行和从节点的交互。主节点解析用户的shell请求信息,据其初始化业务用例、调参器和历史信息表(用于保存所有从节点上传的训练结果),并在历史信息表中存入一张16层的初始结构(若干类型的深度学习层的组合,例如卷积、采样层),作为网络结构生成的基础,并据其获取的从节点信息通过训练服务模块基于shell脚本启动从节点,然后将业务用例(包括训练对应的配置参数和业务数据,例如,用例为图像分类时,配置参数为图像输入维度和图像输出维度,业务数据为图像分类训练数据集)、调参器和历史信息表通过网络传递给从节点。接下来,从节点开始维护本地历史信息表(用于保存该从节点的训练结果)、本地调参器和本地业务用例。Specifically, the startup and scheduling process of cloud computing resources can refer to the following description. The user configures relevant task information (such as image classification related) through yml, and then starts the master node through a shell script, and the master node starts nodejs. In this process, the user You can use the web to view the cloud running status. The master node starts its own message loop module and zmq module, and then starts to interact with the slave node based on the typescript management module (typescript is a kind of java software, here for network communication). The master node parses the user's shell request information, initializes the business use case, parameter adjuster and historical information table (used to save all the training results uploaded from the slave nodes) according to it, and stores a 16-layer initial structure in the historical information table (Combination of several types of deep learning layers, such as convolution, sampling layer), as the basis for network structure generation, and according to the obtained slave node information, the slave node is started based on the shell script through the training service module, and then the business use cases (including The corresponding configuration parameters and business data for training. For example, when the use case is image classification, the configuration parameters are the image input dimension and the image output dimension, and the business data is the image classification training data set), the parameter adjuster and the history information table are passed to the slave through the network. node. Next, the slave node starts to maintain the local history information table (used to save the training results of the slave node), the local parameter adjuster and the local business use case.

102、基于每个从节点对深度学习模型执行训练操作,得到该从节点的训练结果。102. Perform a training operation on the deep learning model based on each slave node to obtain a training result of the slave node.

在上述步骤102中,每个从节点的训练结果包括目标深度学习模型以及该目标深度学习模型的评分,每个从节点的训练结果包括的目标深度学习模型为训练后的深度学习模型。这里,深度学习模型的评分可以是该深度学习模型在验证集上的精度评分。每个从节点在接收到由主节点通过网络传递过来的业务用例、调参器和历史信息表后,即可以据此开始维护本地历史信息表(用于保存该从节点的训练结果)、本地调参器和本地业务用例,开始独立进行深度学习模型的训练。具体地,利用本地调参器和本地业务用例进行深度学习模型的训练,然后将得到的训练结果保存至本地历史信息表。In the above step 102, the training result of each slave node includes the target deep learning model and the score of the target deep learning model, and the target deep learning model included in the training result of each slave node is the trained deep learning model. Here, the score of the deep learning model may be the accuracy score of the deep learning model on the validation set. After each slave node receives the business use case, parameter adjuster and historical information table transmitted by the master node through the network, it can start to maintain the local historical information table (used to save the training results of the slave node), local The parameter tuner and local business use cases start to train the deep learning model independently. Specifically, use the local parameter adjuster and local business use cases to train the deep learning model, and then save the obtained training results to the local historical information table.

在一个可选的实施例中,基于每个从节点对深度学习模型执行训练操作,得到该从节点的训练结果,包括:In an optional embodiment, a training operation is performed on the deep learning model based on each slave node, and the training result of the slave node is obtained, including:

创建每个从节点对应的第一进程和第二进程,并基于每个从节点的第一进程生成该从节点对应的多个深度学习模型;Create a first process and a second process corresponding to each slave node, and generate multiple deep learning models corresponding to the slave node based on the first process of each slave node;

基于每个从节点的第二进程、每个从节点的本地云计算资源以及确定出的超参数对该从节点对应的每个深度学习模型执行训练与验证操作,得到该从节点对应的训练结果。Perform training and verification operations on each deep learning model corresponding to the slave node based on the second process of each slave node, the local cloud computing resources of each slave node, and the determined hyperparameters, and obtain the training result corresponding to the slave node .

在该可选的实施例中,在每个从节点中均创建该从节点的第一进程和第二进程,该从节点的第一进程用于不断地循环执行生成深度学习模型的任务以不断地生成新的深度学习模型,该从节点的第二进程用于不断地循环执行对第一进程所生成的深度学习模型的训练与验证的任务。In this optional embodiment, a first process and a second process of the slave node are created in each slave node, and the first process of the slave node is used to continuously cyclically execute the task of generating a deep learning model to continuously A new deep learning model is generated, and the second process of the slave node is used for continuously cyclically executing the tasks of training and validating the deep learning model generated by the first process.

可见,实施该可选的实施例,通过在每个从节点中均生成该从节点对应的第一进程和第二进程,并通过第一进程不断生成新的深度学习模型,通过第二进程不断对新生成的深度学习模型进行训练与验证以得到训练结果,这样能够实现深度学习模型的生成与训练的并行化,使得深度学习模型的生成和训练无需相互等待,提高深度学习模型的训练效率。It can be seen that, by implementing this optional embodiment, the first process and the second process corresponding to the slave node are generated in each slave node, and a new deep learning model is continuously generated through the first process, and a new deep learning model is continuously generated through the second process. The newly generated deep learning model is trained and verified to obtain the training result, which can realize the parallelization of the generation and training of the deep learning model, so that the generation and training of the deep learning model do not need to wait for each other, and the training efficiency of the deep learning model can be improved.

103、根据所有训练结果包括的评分从所有目标深度学习模型中确定最优深度学习模型。103. Determine an optimal deep learning model from all target deep learning models according to the scores included in all training results.

在上述步骤103中,由于历史信息表用于保存所有从节点上传的训练结果,所以根据历史信息表中的信息即能够确定出最优的深度学习模型。这里,可以选取评分最高的目标深度学习模型作为最优深度学习模型。In the above step 103, since the history information table is used to save all the training results uploaded from the nodes, the optimal deep learning model can be determined according to the information in the history information table. Here, the target deep learning model with the highest score can be selected as the optimal deep learning model.

可见,实施图1所描述的深度学习模型的探索方法,通过确定一个云计算资源作为主节点以及多个其他云计算资源作为多个从节点,然后基于每个从节点进行深度学习模型的训练操作得到该从节点的训练结果,最后根据训练结果的评分从所有的训练结果中确定出最优的深度学习模型,从而能够实现并行进行深度学习模型的训练操作,提高深度学习模型训练的效率,减少训练的时间,有利于深度学习模型探索技术在业务场景中的应用。另外,还通过在每个从节点中均生成该从节点对应的第一进程和第二进程,实现深度学习模型的生成与训练的并行化,使得深度学习模型的生成和训练无需相互等待,提高深度学习模型的训练效率。It can be seen that the exploration method of the deep learning model described in FIG. 1 is implemented by determining one cloud computing resource as the master node and multiple other cloud computing resources as multiple slave nodes, and then performing the training operation of the deep learning model based on each slave node. The training result of the slave node is obtained, and finally the optimal deep learning model is determined from all the training results according to the score of the training result, so that the training operation of the deep learning model can be performed in parallel, the efficiency of deep learning model training can be improved, and the The training time is conducive to the application of deep learning model exploration technology in business scenarios. In addition, by generating the first process and the second process corresponding to the slave node in each slave node, the parallelization of the generation and training of the deep learning model is realized, so that the generation and training of the deep learning model do not need to wait for each other. Training efficiency of deep learning models.

实施例二Embodiment 2

请参阅图2,图2是本发明实施例公开的另一种深度学习模型的探索方法的流程示意图。如图2所示,该深度学习模型的探索方法可以包括以下操作:Please refer to FIG. 2. FIG. 2 is a schematic flowchart of another method for exploring a deep learning model disclosed in an embodiment of the present invention. As shown in Figure 2, the exploration method of the deep learning model can include the following operations:

201、确定一个云计算资源作为主节点以及多个其他云计算资源作为多个从节点,其中,主节点用于调度多个从节点对深度学习模型执行训练操作。201. Determine one cloud computing resource as the master node and multiple other cloud computing resources as multiple slave nodes, where the master node is used to schedule multiple slave nodes to perform training operations on the deep learning model.

202、创建每个从节点对应的第一进程和第二进程。202. Create a first process and a second process corresponding to each slave node.

203、基于每个从节点的第一进程,从确定出的历史模型池中选取多个历史模型作为该从节点对应的多个基础模型,历史模型为所有从节点已生成的深度学习模型。203. Based on the first process of each slave node, select multiple historical models from the determined historical model pool as multiple basic models corresponding to the slave node, where the historical models are deep learning models that have been generated by all slave nodes.

在上述步骤203中,历史模型池可以是根据历史信息表确定的,历史信息表中可以包括历史模型的模型编号和模型信息(网络结构、模型评分、模型权重等)。这里,可以选取模型评分前十的历史模型作为从节点对应的十个基础模型。In the above step 203, the historical model pool may be determined according to the historical information table, and the historical information table may include the model number and model information (network structure, model score, model weight, etc.) of the historical model. Here, the top ten historical models in the model score can be selected as the ten basic models corresponding to the slave nodes.

204、基于确定出的模拟退火方法从每个从节点对应的所有基础模型中选取多个基础模型作为该从节点对应的多个目标基础模型。204. Based on the determined simulated annealing method, select multiple base models from all base models corresponding to each slave node as multiple target base models corresponding to the slave node.

在上述步骤204中,模拟退火方法是一种寻找全局最优解的算法,能够有效避免陷入局部最优解。基于模拟退火方法选取出目标基础模型,能够更加高效地探索出评分更高的深度学习模型。In the above step 204, the simulated annealing method is an algorithm for finding a global optimal solution, which can effectively avoid falling into a local optimal solution. The target basic model is selected based on the simulated annealing method, which can more efficiently explore deep learning models with higher scores.

205、对每个从节点对应的每个目标基础模型执行模型变形操作,得到该目标基础模型对应的多个变种模型,并从该目标基础模型对应的多个变种模型中筛选出该目标基础模型对应的目标变种模型。205. Perform a model deformation operation on each target base model corresponding to each slave node, obtain multiple variant models corresponding to the target base model, and screen out the target base model from the multiple variant models corresponding to the target base model The corresponding target variant model.

在上述步骤205中,模型变形操作是指对神经网络模型随机进行网络结构加深操作、网络结构加宽操作和加跳层结构操作中的至少一种。通过执行模型变形操作即可以获得多个与目标基础模型对应的变种模型,这样即可以扩大探索的空间,使探索的结果更加准确。In the above step 205, the model deformation operation refers to randomly performing at least one of a network structure deepening operation, a network structure widening operation, and a skip layer structure operation on the neural network model. By performing the model deformation operation, multiple variant models corresponding to the target base model can be obtained, which can expand the exploration space and make the exploration results more accurate.

在一个可选的实施例中,从该目标基础模型对应的多个变种模型中筛选出该目标基础模型对应的目标变种模型,包括:In an optional embodiment, the target variant model corresponding to the target basic model is selected from the plurality of variant models corresponding to the target basic model, including:

计算该目标基础模型对应的每个变种模型与历史模型池中每个历史模型的模型距离;Calculate the model distance between each variant model corresponding to the target base model and each historical model in the historical model pool;

判断该目标基础模型对应的每个变种模型是否存在匹配历史模型,该变种模型对应的匹配历史模型是与该变种模型的模型距离小于预设阈值的历史模型,当判断出该变种模型存在匹配历史模型时,将该变种模型从该目标基础模型对应的多个变种模型中删除;Determine whether each variant model corresponding to the target base model has a matching history model. The matching history model corresponding to the variant model is a historical model whose model distance from the variant model is less than the preset threshold. When it is judged that the variant model has a matching history model, delete the variant model from the multiple variant models corresponding to the target base model;

对该目标基础模型对应的每个变种模型进行评分,并从该目标基础模型对应的多个变种模型中选取评分最高的变种模型作为该目标基础模型对应的目标变种模型。Each variant model corresponding to the target base model is scored, and the variant model with the highest score is selected from the plurality of variant models corresponding to the target base model as the target variant model corresponding to the target base model.

在该可选的实施例中,由于模型变形操作是随机的一种操作,所以得到的变种模型的结构也是随机的,若变种模型的结构变形不大,则对该变种模型进行后续的训练与验证操作的意义并不大,会造成计算资源的浪费。因此,先通过计算变种模型和历史模型的模型距离,将模型距离过小的变种模型删除,然后再对剩余的变种模型进行评分,并从剩余的变种模型中选取评分最高的变种模型作为目标变种模型。可选地,当历史模型池中的历史模型数量大于40时,可以在第一进程里创建进程池,进程池中的每个子进程均用于执行计算变种模型和历史模型的模型距离的任务,这样能够实现并行计算每个变种模型和每个历史模型的模型距离,提高深度学习模型的探索效率。In this optional embodiment, since the model deformation operation is a random operation, the structure of the obtained variant model is also random. If the structural deformation of the variant model is not large, then the variant model is subjected to subsequent training and The verification operation is of little significance and will cause a waste of computing resources. Therefore, by calculating the model distance between the variant model and the historical model, the variant model with too small model distance is deleted, and then the remaining variant models are scored, and the variant model with the highest score is selected from the remaining variant models as the target variant. Model. Optionally, when the number of historical models in the historical model pool is greater than 40, a process pool can be created in the first process, and each subprocess in the process pool is used to perform the task of calculating the model distance between the variant model and the historical model, In this way, the model distance of each variant model and each historical model can be calculated in parallel, and the exploration efficiency of the deep learning model can be improved.

具体地,对变种模型的评分规则为:Specifically, the scoring rules for variant models are:

Figure BDA0002632188850000131
Figure BDA0002632188850000131

其中,n为历史模型数量,di为变种模型和其中第i个历史模型的距离,acc为评分结果。Among them, n is the number of historical models, d i is the distance between the variant model and the i-th historical model, and acc is the scoring result.

例如,历史模型个数为2,变种模型和他们的距离分别为2,4,则评分为:(2+4)/2=3。For example, if the number of historical models is 2, and the distances between variant models and them are 2 and 4, respectively, the score is: (2+4)/2=3.

可见,实施该可选的实施例,通过将与历史模型的模型距离过小的变种模型删除,并从剩余的变种模型中选取出评分最高的变种模型作为目标变种模型,能够实现从多个变种模型中筛选出目标变种模型,减少后续的数据处理量,提高训练的效率。It can be seen that, by implementing this optional embodiment, by deleting the variant model whose distance from the historical model is too small, and selecting the variant model with the highest score from the remaining variant models as the target variant model, it is possible to realize the transformation from multiple variants. The target variant model is screened out from the model, which reduces the amount of subsequent data processing and improves the efficiency of training.

在该可选的实施例中,进一步可选的,计算该目标基础模型对应的每个变种模型与历史模型池中每个历史模型的模型距离,包括:In this optional embodiment, further optional, calculating the model distance between each variant model corresponding to the target base model and each historical model in the historical model pool, including:

计算该目标基础模型对应的每个变种模型与历史模型池中每个历史模型的普通层距离;Calculate the common layer distance between each variant model corresponding to the target base model and each historical model in the historical model pool;

计算该目标基础模型对应的每个变种模型与历史模型池中每个历史模型的跳层距离;Calculate the jumping distance between each variant model corresponding to the target base model and each historical model in the historical model pool;

将该目标基础模型对应的每个变种模型与历史模型池中每个历史模型的普通层距离和跳层距离相加以作为该变种模型和该历史模型的模型距离。Each variant model corresponding to the target base model is added with the common layer distance and the jump layer distance of each historical model in the historical model pool as the model distance between the variant model and the historical model.

在该进一步可选的实施例中,深度学习模型的普通层包括单层卷积层、批归一化层、池化层等,深度学习模型的跳层是指残差连接层。由于深度学习模型通常由多个普通层和多个跳层共同组成,所以通过计算两个深度学习模型之间的普通层距离和跳层距离,然后将普通层距离和跳层距离相加作为两个模型间的模型距离,能够有效地比对两个模型。可选地,可以先统计将要进行比较的变种模型和历史模型的跳层层数,当跳层层数大于5时,在从节点的第一进程里创建若干个子进程并行计算两个模型间的跳层距离,这样能够实现并行计算两个模型间的跳层距离,提高训练的效率。In this further optional embodiment, the common layer of the deep learning model includes a single-layer convolutional layer, a batch normalization layer, a pooling layer, etc., and the skip layer of the deep learning model refers to a residual connection layer. Since a deep learning model is usually composed of multiple ordinary layers and multiple jump layers, the ordinary layer distance and the jump layer distance between the two deep learning models are calculated, and then the ordinary layer distance and the jump layer distance are added as two The model distance between the two models can effectively compare the two models. Optionally, you can first count the number of jump layers of the variant model to be compared and the historical model. When the number of jump layers is greater than 5, create several sub-processes in the first process of the slave node to calculate the difference between the two models in parallel. The layer-jump distance can realize parallel calculation of the layer-jump distance between the two models and improve the training efficiency.

可见,实施该进一步可选的实施例,通过计算两个模型间的普通层距离和跳层距离得到两个模型间的模型距离,能够有效地比对两个模型,从而能够准确有效地筛选出目标变种模型。It can be seen that, by implementing this further optional embodiment, the model distance between the two models can be obtained by calculating the normal layer distance and the jump layer distance between the two models, which can effectively compare the two models, so that the two models can be accurately and effectively screened out. Target variant model.

在该进一步可选的实施例中,又进一步可选的,计算该目标基础模型对应的每个变种模型与历史模型池中每个历史模型的普通层距离,包括:In this further optional embodiment, it is further optional to calculate the common layer distance between each variant model corresponding to the target basic model and each historical model in the historical model pool, including:

对计算该目标基础模型对应的每个变种模型的普通层进行信息编码,得到该变种模型对应的普通层信息列表,以及对历史模型池中每个历史模型的普通层进行信息编码,得到该历史模型对应的普通层信息列表;Perform information encoding on the common layer of each variant model corresponding to the target basic model to obtain a list of common layer information corresponding to the variant model, and encode information on the common layer of each historical model in the historical model pool to obtain the history A list of common layer information corresponding to the model;

根据该目标基础模型对应的每个变种模型对应的普通层信息列表和历史模型池中每个历史模型对应的普通层信息列表构造该变种模型与该历史模型对应的普通层矩阵;According to the normal layer information list corresponding to each variant model corresponding to the target basic model and the normal layer information list corresponding to each historical model in the historical model pool, construct the normal layer matrix corresponding to the variant model and the historical model;

以从左到右以及从上到下的顺序对该目标基础模型对应的每个变种模型和历史模型池中每个历史模型对应的普通层矩阵的每个元素进行赋值,其中,对该普通层矩阵的每个元素进行赋值的公式如下:Assign values to each variant model corresponding to the target base model and each element of the normal layer matrix corresponding to each historical model in the historical model pool in the order from left to right and from top to bottom, where the normal layer The formula for assigning a value to each element of the matrix is as follows:

matrixij=min(matrixij+dist(M1i,M2j),matrixi(j-1),matrix(i-1)j)matrix ij =min(matrix ij +dist(M1 i ,M2 j ),matrix i(j-1) ,matrix (i-1)j )

其中,matrixij表示该普通层矩阵中第i行第j列的元素,M1i表示该变种模型的第i个普通层,M2j表示该历史模型的第j个普通层;Wherein, matrix ij represents the element in the ith row and jth column of the common layer matrix, M1 i represents the ith common layer of the variant model, and M2 j represents the jth common layer of the historical model;

其中,函数dist用于计算M1i和M2j两个层之间的距离,当M1i和M2j是不同类型的层时,dist(M1i,M2j)的值为1,当M1i和M2j是相同类型的层时,dist(M1i,M2j)的值通过以下方式计算:Among them, the function dist is used to calculate the distance between the two layers of M1 i and M2 j . When M1 i and M2 j are different types of layers, the value of dist(M1 i , M2 j ) is 1. When M1 i and M2 j are different types of layers When M2 j are layers of the same type, the value of dist(M1 i ,M2 j ) is calculated by:

Figure BDA0002632188850000141
Figure BDA0002632188850000141

其中,ak为M1i表示的普通层的信息编码中第k个参数信息,bk为M2j表示的普通层的信息编码中第k个参数信息,n为信息编码中包含的参数信息的个数;Among them, a k is the k-th parameter information in the information coding of the common layer represented by M1 i , b k is the k-th parameter information in the information coding of the common layer represented by M2 j , and n is the parameter information contained in the information coding. number;

取该目标基础模型对应的每个变种模型和历史模型池中每个历史模型对应的普通层矩阵右下角的元素作为该变种模型与该历史模型的普通层距离。Take each variant model corresponding to the target base model and the element in the lower right corner of the normal layer matrix corresponding to each historical model in the historical model pool as the normal layer distance between the variant model and the historical model.

在该又进一步可选的实施例中,对模型的普通层进行信息编码,得到该模型对应的普通层信息列表的过程可以表示为:In this further optional embodiment, the process of performing information encoding on the common layer of the model to obtain the common layer information list corresponding to the model can be expressed as:

M=(Lm1,Lm2,...,LmN)M=(Lm 1 ,Lm 2 ,...,Lm N )

其中,M表示该模型的普通层信息列表,N表示该模型的普通层的数量,Lmi为该模型第I层普通层的信息编码。例如,一个激活层的信息编码为字符串“ReLU”。Among them, M represents the list of common layer information of the model, N represents the number of common layers of the model, and Lm i is the information code of the first common layer of the model. For example, the information of an activation layer is encoded as the string "ReLU".

具体地,根据变种模型对应的普通层信息列表和历史模型对应的普通层信息列表构造该变种模型与该历史模型对应的普通层矩阵的具体过程可以为:Specifically, the specific process of constructing the common layer matrix corresponding to the variant model and the historical model according to the common layer information list corresponding to the variant model and the common layer information list corresponding to the historical model may be as follows:

根据变种模型对应的普通层信息列表的长度m1和历史模型对应的普通层信息列表的长度m2构造行数为m1+1、列数为m2+1的普通层矩阵,然后对普通层矩阵的第一行赋值为0到m1,第一列赋值为0到m2,从而实现普通层矩阵的初始化,完成普通层矩阵的构造。According to the length m1 of the normal layer information list corresponding to the variant model and the length m2 of the normal layer information list corresponding to the historical model, construct a normal layer matrix with m1+1 rows and m2+1 columns, and then calculate the first layer of the normal layer matrix. One row is assigned as 0 to m1, and the first column is assigned as 0 to m2, so as to realize the initialization of the ordinary layer matrix and complete the construction of the ordinary layer matrix.

具体地,对于函数dist(M1i,M2j)的说明如下:Specifically, the description of the function dist(M1 i , M2 j ) is as follows:

深度学习模型的层通常有卷积层、采样层等,例如,M1i为卷积层,M2j为采样层,则这两个层为不同类型的层,则dist(M1i,M2j)的值为1。又例如,M1i对应的信息编码为“conv(32,8)”,M2j对应的信息编码为“conv(16,4)”,计算函数dist(M1i,M2j)的值时,根据信息编码的前4个字母判断两个层的类型是否相同,两个层的信息编码的前4个字母均为“conv”,所以两个层为相同类型的层。另外,M1i对应的信息编码“conv(32,8)”表示M1i层的第一个参数信息a1(卷积核的个数)为32,第二个参数信息a2(卷积核的大小)为8,M2j对应的信息编码“conv(16,4)”表示M2j层的第一个参数信息b1(卷积核的个数)为16,第二个参数信息b2(卷积核的大小)为4,所以dist(M1i,M2j)=(32-16)/2*32+(8-4)/2*8=0.5。The layers of the deep learning model usually include convolutional layers, sampling layers, etc. For example, if M1 i is a convolutional layer and M2 j is a sampling layer, then these two layers are different types of layers, then dist(M1 i , M2 j ) value of 1. For another example, the information corresponding to M1 i is coded as "conv(32, 8)", and the information corresponding to M2 j is coded as "conv(16, 4)", when calculating the value of the function dist(M1 i , M2 j ), according to The first four letters of the information code determine whether the types of the two layers are the same. The first four letters of the information code of the two layers are all "conv", so the two layers are of the same type. In addition, the information code "conv(32, 8)" corresponding to M1 i indicates that the first parameter information a1 (the number of convolution kernels) of the M1 i layer is 32, and the second parameter information a2 (the size of the convolution kernel) is 32. ) is 8, and the information code corresponding to M2 j "conv(16, 4)" indicates that the first parameter information b1 (the number of convolution kernels) of the M2 j layer is 16, and the second parameter information b2 (the number of convolution kernels) The size of ) is 4, so dist(M1 i , M2 j )=(32-16)/2*32+(8-4)/2*8=0.5.

可见,实施该又进一步可选的实施例,通过对模型的普通层进行编码得到普通层信息列表,然后根据普通层信息列表构造普通层矩阵,并通过特定的方式对普通层矩阵进行赋值以得到模型间的普通层距离,这样能够实现计算两个模型间的普通层距离。It can be seen that, implementing this further optional embodiment, the common layer information list is obtained by encoding the common layer of the model, and then the common layer matrix is constructed according to the common layer information list, and the common layer matrix is assigned in a specific way to obtain The common layer distance between models, so that the common layer distance between two models can be calculated.

在该进一步可选的实施例中,又进一步可选的,计算该目标基础模型对应的每个变种模型与历史模型池中每个历史模型的跳层距离,包括:In this further optional embodiment, it is further optional to calculate the jumping distance between each variant model corresponding to the target basic model and each historical model in the historical model pool, including:

对该目标基础模型对应的每个变种模型的跳层进行信息编码,得到该变种模型对应的跳层信息列表,以及对历史模型池中每个历史模型的跳层进行信息编码,得到该历史模型对应的跳层信息列表;Perform information coding on the skip layer of each variant model corresponding to the target basic model to obtain a list of skip layer information corresponding to the variant model, and perform information coding on the skip layer of each historical model in the historical model pool to obtain the historical model. Corresponding layer jump information list;

根据该目标基础模型对应的每个变种模型对应的跳层信息列表和历史模型池中每个历史模型对应的跳层信息列表构造该变种模型和该历史模型对应的跳层矩阵,跳层矩阵的行数为该变种模型对应的跳层信息列表的长度,跳层矩阵的列数为该历史模型对应的跳层信息列表的长度;According to the layer jump information list corresponding to each variant model corresponding to the target basic model and the layer jump information list corresponding to each historical model in the historical model pool, the layer jump matrix corresponding to the variant model and the historical model is constructed. The number of rows is the length of the layer-hopping information list corresponding to the variant model, and the number of columns of the layer-hopping matrix is the length of the layer-hopping information list corresponding to the historical model;

根据以下公式对该目标基础模型对应的每个变种模型与历史模型池中每个历史模型对应的跳层矩阵的每个元素进行赋值:Each element of the jump-layer matrix corresponding to each variant model corresponding to the target base model and each historical model in the historical model pool is assigned a value according to the following formula:

skip_connection_matrixpq=dist2(S1p,S2q)skip_connection_matrix pq = dist2(S1 p , S2 q )

其中,skip_connection_matrix表示该跳层矩阵,skip_connection_matrixpq表示该跳层矩阵中第p行第q列的元素,S1p表示该变种模型的第p个跳层,S2q表示该历史模型的第q个跳层;Among them, skip_connection_matrix represents the skip layer matrix, skip_connection_matrix pq represents the element in the pth row and the qth column of the skip layer matrix, S1 p represents the pth skip layer of the variant model, and S2 q represents the qth skip of the historical model Floor;

其中,函数dist2用于计算S1p和S2q两个层之间的距离,当S1p和S2q是不同类型的层时,dist2(S1p,S2q)的值为1,当S1p和S2q是相同类型的层时,dist2(S1p,S2q)的值通过以下方式计算:Among them, the function dist2 is used to calculate the distance between the two layers S1 p and S2 q . When S1 p and S2 q are different types of layers, the value of dist2(S1 p , S2 q ) is 1, when S1 p and S2 q When S2 q are layers of the same type, the value of dist2(S1 p , S2 q ) is calculated by:

Figure BDA0002632188850000151
Figure BDA0002632188850000151

其中,ps表示该变种模型中第p层跳层中起始层在该模型中的层位置索引,qs表示该历史模型中第q层跳层中起始层在该模型中的层位置索引,pl表示该变种模型中第p层跳层的深度,ql表示该历史模型中第q层跳层的深度;Among them, p s represents the layer position index of the starting layer in the p-th layer jumping layer in the variant model, and q s represents the layer position of the starting layer in the q-th layer jumping layer in the historical model. index, p l represents the depth of the p-th layer jumping layer in the variant model, q l represents the q-th layer jumping layer depth in the historical model;

根据以下公式计算该目标基础模型对应的每个变种模型与历史模型池中每个历史模型之间的跳层距离:Calculate the jump distance between each variant model corresponding to the target base model and each historical model in the historical model pool according to the following formula:

dists=sum(skip_connection_matrix)+|s1-s2|dist s =sum(skip_connection_matrix)+|s1-s2|

其中,dists表示该变种模型和该历史模型之间的跳层距离,sum(skip_connection_matrix)表示将跳层矩阵skip_connection_matrix中的每一个元素相加进行求和,s1表示该变种模型对应的跳层信息列表的长度,s2表示该历史模型对应的跳层信息列表的长度。Among them, dist s represents the jumping distance between the variant model and the historical model, sum(skip_connection_matrix) means adding and summing each element in the skip_connection_matrix, and s1 represents the jumping information corresponding to the variant model The length of the list, s2 represents the length of the layer jump information list corresponding to the history model.

在该又进一步可选的实施例中,对模型的跳层进行信息编码,得到该模型对应的跳层信息列表的过程可以表示为:In this further optional embodiment, the information coding is performed on the layer-hopping information of the model, and the process of obtaining the layer-hopping information list corresponding to the model can be expressed as:

S=(Ls1,Ls2,...,LsN)S=(Ls 1 , Ls 2 , ..., Ls N )

其中,S表示该模型的跳层信息列表,N表示该模型的跳层的数量,Lsi为该模型第i层跳层的信息编码。Among them, S represents the list of skip layer information of the model, N represents the number of skip layers of the model, and Ls i is the information code of the ith layer skip layer of the model.

具体地,对于函数dist2(S1p,S2q)的说明如下:Specifically, the description of the function dist2(S1 p , S2 q ) is as follows:

例如,变种模型的第p个跳层S1p的起始位置为第3层,终止位置为第7层,则该跳层S1p对应的ps=3,pl=7-3=4,历史模型的第q个跳层S2q的起始位置为第4层,终止位置为第9层,则该跳层S2q对应的qs=4,ql=9-4=5,则dist2(S1p,S2q)=((4-3)+(5-4))/(4+5)=2/9。For example, the starting position of the p-th jump layer S1 p of the variant model is the 3rd layer and the end position is the 7th layer, then the corresponding jump layer S1 p corresponds to ps =3, p l =7-3=4, The starting position of the qth jump layer S2 q of the historical model is the 4th floor, and the end position is the 9th floor, then the corresponding jump layer S2 q corresponds to q s =4, q l =9-4=5, then dist2 (S1 p , S2 q )=((4-3)+(5-4))/(4+5)=2/9.

可见,实施该又进一步可选的实施例,通过对模型的跳层进行编码得到跳层信息列表,然后根据跳层信息列表构造跳层矩阵,并通过特定的方式对跳层矩阵进行赋值,最后根据跳层矩阵计算得到模型间的跳层距离,这样能够实现计算两个模型间的跳层距离。It can be seen that, implementing this further optional embodiment, the layer-hopping information list is obtained by encoding the layer-hopping information of the model, then the layer-hopping matrix is constructed according to the layer-hopping information list, and the layer-hopping matrix is assigned in a specific way, and finally The layer-jump distance between the models is calculated according to the layer-jump matrix, so that the layer-jump distance between the two models can be calculated.

206、对每个目标变种模型进行评分,并根据每个目标变种模型的评分从每个从节点对应的所有目标变种模型中筛选出该从节点对应的多个深度学习模型。206. Score each target variant model, and screen out multiple deep learning models corresponding to the slave node from all target variant models corresponding to each slave node according to the score of each target variant model.

在上述步骤206中,对目标变种模型的评分规则可以与上述变种模型的评分规则一致,在此不再一一赘述。这里,可以将每个从节点对应的所有目标变种模型按照目标变种模型所基于的目标基础模型进行分类,然后从每个目标基础模型对应的分类中的所有目标变种模型筛选出评分最高的目标变种模型以形成该从节点对应的多个深度学习模型。In the above step 206, the scoring rules for the target variant model may be consistent with the scoring rules for the above variant models, which will not be repeated here. Here, all target variant models corresponding to each slave node can be classified according to the target basic model on which the target variant model is based, and then the target variant with the highest score can be selected from all target variant models in the classification corresponding to each target basic model. model to form multiple deep learning models corresponding to the slave node.

207、基于每个从节点的第二进程、每个从节点的本地云计算资源以及确定出的超参数对该从节点对应的每个深度学习模型执行训练与验证操作,得到该从节点对应的训练结果。207. Based on the second process of each slave node, the local cloud computing resources of each slave node, and the determined hyperparameters, perform training and verification operations on each deep learning model corresponding to the slave node, and obtain the corresponding slave node. training results.

在另一个可选的实施例中,基于每个从节点的第二进程、每个从节点的本地云计算资源以及确定出的超参数对该从节点对应的每个深度学习模型执行训练与验证操作,得到该从节点对应的训练结果,包括:In another optional embodiment, training and verification are performed on each deep learning model corresponding to the slave node based on the second process of each slave node, the local cloud computing resources of each slave node, and the determined hyperparameters operation to get the training result corresponding to the slave node, including:

基于每个从节点的第二进程以及每个从节点的本地云计算资源,针对该从节点对应的每个深度学习模型,拟定该深度学习模型对应的超参数空间,超参数空间至少包括批处理数量和学习率;Based on the second process of each slave node and the local cloud computing resources of each slave node, for each deep learning model corresponding to the slave node, formulate a hyperparameter space corresponding to the deep learning model, and the hyperparameter space includes at least batch processing number and learning rate;

设定每个从节点对应的每个深度学习模型对应的搜索次数;Set the number of searches corresponding to each deep learning model corresponding to each slave node;

构造每个从节点对应的每个深度学习模型对应的集合,该集合用于保存该深度学习模型进行训练后在验证集上的评分;Construct a set corresponding to each deep learning model corresponding to each slave node, and the set is used to save the score on the validation set after the deep learning model is trained;

根据以下公式设定每个从节点对应的每个深度学习模型对应的目标函数:The objective function corresponding to each deep learning model corresponding to each slave node is set according to the following formula:

F=max(SC)F=max(SC)

其中,F为目标函数,SC为该深度学习模型进行训练后在验证集上的评分;Among them, F is the objective function, and SC is the score on the validation set after the deep learning model is trained;

在每个从节点对应的每个深度学习模型对应的超参数空间中随机选取一个起点,然后通过高斯过程映射在该超参数空间中循环进行搜索以选取出该深度学习模型对应的多个目标超参数,其中,高斯过程映射可以表示为:Randomly select a starting point in the hyperparameter space corresponding to each deep learning model corresponding to each slave node, and then cyclically search in the hyperparameter space through Gaussian process mapping to select multiple target hyperparameters corresponding to the deep learning model. parameters, where the Gaussian process map can be expressed as:

T=G(C、R、F、J)T=G(C, R, F, J)

其中,T为该深度学习模型对应的目标超参数,每个目标超参数均是G推荐的值得尝试的超参数的值,C为该深度学习模型对应的超参数空间,R为该深度学习模型对应的集合,F为该深度学习模型对应的目标函数,J为该深度学习模型对应的搜索次数;Among them, T is the target hyperparameter corresponding to the deep learning model, each target hyperparameter is the value of the hyperparameter worth trying recommended by G, C is the hyperparameter space corresponding to the deep learning model, and R is the deep learning model. The corresponding set, F is the objective function corresponding to the deep learning model, and J is the number of searches corresponding to the deep learning model;

基于每个从节点对应的每个深度学习模型对应的每个目标超参数对该深度学习模型执行训练与验证操作,得到该深度学习模型对应的多个中间深度学习模型以及每个中间深度学习模型对应的评分;Perform training and verification operations on the deep learning model based on each target hyperparameter corresponding to each deep learning model corresponding to each slave node, and obtain multiple intermediate deep learning models corresponding to the deep learning model and each intermediate deep learning model. corresponding score;

从每个从节点对应的每个深度学习模型对应的所有中间深度学习模型中,选取评分最高的中间深度学习模型以及该中间深度学习模型对应的评分作为该从节点对应的训练结果。From all the intermediate deep learning models corresponding to each deep learning model corresponding to each slave node, the intermediate deep learning model with the highest score and the score corresponding to the intermediate deep learning model are selected as the training result corresponding to the slave node.

在该另一个可选的实施例中,拟定的超参数空间所包括的批处理数量的配置范围可以为[64,512],学习率的配置范围内可以为[0.0002,0.001]。设定的搜索次数默认为20。当设定的搜索次数为20时,即在超参数空间进行20次搜索以产生20个目标超参数。高斯过程映射是一种常见的预测算法,这里用于预测下一个超参数的值。每个中间深度学习模型对应的评分可以是该中间深度学习模型在验证集上能达到的最高精度。In this another optional embodiment, the configuration range of the number of batches included in the proposed hyperparameter space may be [64, 512], and the configuration range of the learning rate may be [0.0002, 0.001]. The set number of searches is 20 by default. When the set number of searches is 20, 20 searches are performed in the hyperparameter space to generate 20 target hyperparameters. Gaussian process mapping is a common prediction algorithm used here to predict the value of the next hyperparameter. The score corresponding to each intermediate deep learning model may be the highest accuracy that the intermediate deep learning model can achieve on the validation set.

可见,实施该另一个可选的实施例,能够从超参数空间中选取出多个超参数对深度学习模型执行训练与验证操作,以得到每个超参数对应的训练结果,这样能够实现基于超参数执行深度学习模型的训练与验证。It can be seen that by implementing this other optional embodiment, multiple hyperparameters can be selected from the hyperparameter space to perform training and verification operations on the deep learning model, so as to obtain the training result corresponding to each hyperparameter, so that the hyperparameters based on the hyperparameter can be obtained. The parameters perform training and validation of the deep learning model.

208、根据所有训练结果包括的评分从所有目标深度学习模型中确定最优深度学习模型。208. Determine an optimal deep learning model from all target deep learning models according to the scores included in all training results.

对于上述步骤201、208的具体描述可以参见上述步骤101、103的具体描述,在此不再一一赘述。For the specific description of the foregoing steps 201 and 208, reference may be made to the specific description of the foregoing steps 101 and 103, which will not be repeated here.

可见,实施图2描述的深度学习模型的探索方法,通过确定一个云计算资源作为主节点以及多个其他云计算资源作为多个从节点,然后基于每个从节点进行深度学习模型的训练操作得到该从节点的训练结果,最后根据训练结果的评分从所有的训练结果中确定出最优的深度学习模型,从而能够实现并行进行深度学习模型的训练操作,提高深度学习模型训练的效率,减少训练的时间,有利于深度学习模型探索技术在业务场景中的应用。另外,通过不断生成变种模型,基于模型距离对生成的变种模型进行筛选,最后基于不同的超参数对筛选后的变种模型进行训练与验证操作,从而能够在扩大模型的探索空间的同时能保证模型的训练效率。It can be seen that the exploration method of the deep learning model described in FIG. 2 is implemented by determining one cloud computing resource as the master node and multiple other cloud computing resources as multiple slave nodes, and then performing the training operation of the deep learning model based on each slave node. The training result of the slave node finally determines the optimal deep learning model from all the training results according to the score of the training result, so that the training operation of the deep learning model can be performed in parallel, the efficiency of the deep learning model training is improved, and the training is reduced. The time is conducive to the application of deep learning model exploration technology in business scenarios. In addition, by continuously generating variant models, screening the generated variant models based on the model distance, and finally training and verifying the filtered variant models based on different hyperparameters, so as to expand the exploration space of the model and ensure the model training efficiency.

实施例三Embodiment 3

请参阅图3,图3是本发明实施例公开的一种深度学习模型的探索装置的结构示意图。如图3所示,深度学习模型的探索装置可以包括:Please refer to FIG. 3 , which is a schematic structural diagram of a device for exploring a deep learning model disclosed in an embodiment of the present invention. As shown in Figure 3, the exploration device of the deep learning model may include:

确定模块301,用于确定一个云计算资源作为主节点以及多个其他云计算资源作为多个从节点,其中,主节点用于调度多个从节点对深度学习模型执行训练操作;A determination module 301, configured to determine one cloud computing resource as the master node and multiple other cloud computing resources as multiple slave nodes, wherein the master node is used to schedule multiple slave nodes to perform training operations on the deep learning model;

训练模块302,用于基于每个从节点对深度学习模型执行训练操作,得到该从节点的训练结果,每个从节点的训练结果包括目标深度学习模型以及该目标深度学习模型的评分,每个从节点的训练结果包括的目标深度学习模型为训练后的深度学习模型;The training module 302 is configured to perform a training operation on the deep learning model based on each slave node, and obtain the training result of the slave node. The training result of each slave node includes the target deep learning model and the score of the target deep learning model. The target deep learning model included in the training result of the slave node is the trained deep learning model;

确定模块301,还用于根据所有训练结果包括的评分从所有目标深度学习模型中确定最优深度学习模型。The determining module 301 is further configured to determine the optimal deep learning model from all target deep learning models according to the scores included in all training results.

可见,实施图3所描述的深度学习模型的探索装置,通过确定一个云计算资源作为主节点以及多个其他云计算资源作为多个从节点,然后基于每个从节点进行深度学习模型的训练操作得到该从节点的训练结果,最后根据训练结果的评分从所有的训练结果中确定出最优的深度学习模型,从而能够实现并行进行深度学习模型的训练操作,提高深度学习模型训练的效率,减少训练的时间,有利于深度学习模型探索技术在业务场景中的应用。It can be seen that the exploration device for implementing the deep learning model described in FIG. 3 determines one cloud computing resource as the master node and multiple other cloud computing resources as multiple slave nodes, and then performs the training operation of the deep learning model based on each slave node. The training result of the slave node is obtained, and finally the optimal deep learning model is determined from all the training results according to the score of the training result, so that the training operation of the deep learning model can be performed in parallel, the efficiency of deep learning model training can be improved, and the The training time is conducive to the application of deep learning model exploration technology in business scenarios.

在一个可选的实施例中,训练模块302包括:In an optional embodiment, the training module 302 includes:

创建子模块3021,用于创建每个从节点对应的第一进程和第二进程;Create a submodule 3021 for creating a first process and a second process corresponding to each slave node;

生成子模块3022,用于基于每个从节点的第一进程生成该从节点对应的多个深度学习模型;generating submodule 3022, for generating a plurality of deep learning models corresponding to the slave node based on the first process of each slave node;

训练子模块3023,用于基于每个从节点的第二进程、每个从节点的本地云计算资源以及确定出的超参数对该从节点对应的每个深度学习模型执行训练与验证操作,得到该从节点对应的训练结果。The training submodule 3023 is used to perform training and verification operations on each deep learning model corresponding to the slave node based on the second process of each slave node, the local cloud computing resources of each slave node, and the determined hyperparameters, and obtain The training result corresponding to the slave node.

可见,实施图4所描述的深度学习模型的探索装置,通过在每个从节点中均生成该从节点对应的第一进程和第二进程,并通过第一进程不断生成新的深度学习模型,通过第二进程不断对新生成的深度学习模型进行训练与验证以得到训练结果,这样能够实现深度学习模型的生成与训练的并行化,使得深度学习模型的生成和训练无需相互等待,提高深度学习模型的训练效率。It can be seen that the exploration device implementing the deep learning model described in FIG. 4 generates a first process and a second process corresponding to the slave node in each slave node, and continuously generates a new deep learning model through the first process, Through the second process, the newly generated deep learning model is continuously trained and verified to obtain the training result, which can realize the parallelization of the generation and training of the deep learning model, so that the generation and training of the deep learning model do not need to wait for each other, and the deep learning is improved. The training efficiency of the model.

在该可选的实施例中,进一步可选的,生成子模块3022包括:In this optional embodiment, further optional, the generating sub-module 3022 includes:

选取单元30221,用于基于每个从节点的第一进程,从确定出的历史模型池中选取多个历史模型作为该从节点对应的多个基础模型,历史模型为所有从节点已生成的深度学习模型;The selection unit 30221 is used to select multiple historical models from the determined historical model pool as the multiple basic models corresponding to the slave node based on the first process of each slave node, and the historical model is the depth that all slave nodes have generated. learning model;

选取单元30221,还用于基于确定出的模拟退火方法从每个从节点对应的所有基础模型中选取多个基础模型作为该从节点对应的多个目标基础模型;The selecting unit 30221 is further configured to select multiple base models from all base models corresponding to each slave node based on the determined simulated annealing method as multiple target base models corresponding to the slave node;

变形单元30222,用于对每个从节点对应的每个目标基础模型执行模型变形操作,得到该目标基础模型对应的多个变种模型;Deformation unit 30222, configured to perform model deformation operation on each target base model corresponding to each slave node, to obtain multiple variant models corresponding to the target base model;

第一筛选单元30223,用于从该目标基础模型对应的多个变种模型中筛选出该目标基础模型对应的目标变种模型;The first screening unit 30223 is used to screen out the target variant model corresponding to the target basic model from the plurality of variant models corresponding to the target basic model;

第二筛选单元30224,用于对每个目标变种模型进行评分,并根据每个目标变种模型的评分从每个从节点对应的所有目标变种模型中筛选出该从节点对应的多个深度学习模型;The second screening unit 30224 is configured to score each target variant model, and screen out a plurality of deep learning models corresponding to the slave node from all target variant models corresponding to each slave node according to the score of each target variant model ;

其中,模型变形操作是指对神经网络模型随机进行网络结构加深操作、网络结构加宽操作和加跳层结构操作中的至少一种。The model deformation operation refers to randomly performing at least one of a network structure deepening operation, a network structure widening operation, and a skip layer structure operation on the neural network model.

可见,实施图4所描述的深度学习模型的探索装置,通过不断生成变种模型,基于模型距离对生成的变种模型进行筛选,最后基于不同的超参数对筛选后的变种模型进行训练与验证操作,从而能够在扩大模型的探索空间的同时能保证模型的训练效率。It can be seen that the exploration device implementing the deep learning model described in FIG. 4 continuously generates variant models, screens the generated variant models based on the model distance, and finally performs training and verification operations on the filtered variant models based on different hyperparameters. Therefore, the training efficiency of the model can be ensured while expanding the exploration space of the model.

在该进一步可选的实施例中,再进一步可选的,第一筛选单元30223包括:In this further optional embodiment, still further optional, the first screening unit 30223 includes:

计算子单元302231,用于计算该目标基础模型对应的每个变种模型与历史模型池中每个历史模型的模型距离;The calculation subunit 302231 is used to calculate the model distance between each variant model corresponding to the target basic model and each historical model in the historical model pool;

判断子单元302232,用于判断该目标基础模型对应的每个变种模型是否存在匹配历史模型,该变种模型对应的匹配历史模型是与该变种模型的模型距离小于预设阈值的历史模型,当判断出该变种模型存在匹配历史模型时,将该变种模型从该目标基础模型对应的多个变种模型中删除;The judging subunit 302232 is used to judge whether each variant model corresponding to the target basic model has a matching history model, and the matching history model corresponding to the variant model is a history model whose model distance from the variant model is less than the preset threshold. When judging When it is found that the variant model has a matching historical model, the variant model is deleted from the multiple variant models corresponding to the target base model;

选取子单元302233,用于对该目标基础模型对应的每个变种模型进行评分,并从该目标基础模型对应的多个变种模型中选取评分最高的变种模型作为该目标基础模型对应的目标变种模型。The selection subunit 302233 is used to score each variant model corresponding to the target base model, and select the variant model with the highest score from the multiple variant models corresponding to the target base model as the target variant model corresponding to the target base model .

可见,实施图4所描述的深度学习模型的探索装置,通过将与历史模型的模型距离过小的变种模型删除,并从剩余的变种模型中选取出评分最高的变种模型作为目标变种模型,能够实现从多个变种模型中筛选出目标变种模型,减少后续的数据处理量,提高训练的效率。It can be seen that the exploration device implementing the deep learning model described in FIG. 4 can delete the variant model whose distance from the historical model is too small, and select the variant model with the highest score from the remaining variant models as the target variant model. The target variant model can be screened from multiple variant models, reducing the amount of subsequent data processing and improving the efficiency of training.

在该再进一步可选的实施例中,又进一步可选的,计算子单元302231包括:In this further optional embodiment, still further optional, the calculation subunit 302231 includes:

计算二级子单元3022311,用于计算该目标基础模型对应的每个变种模型与历史模型池中每个历史模型的普通层距离;The second-level calculation subunit 3022311 is used to calculate the common layer distance between each variant model corresponding to the target basic model and each historical model in the historical model pool;

计算二级子单元3022311,还用于计算该目标基础模型对应的每个变种模型与历史模型池中每个历史模型的跳层距离;The calculation secondary subunit 3022311 is also used to calculate the jump distance between each variant model corresponding to the target basic model and each historical model in the historical model pool;

相加二级子单元3022312,用于将该目标基础模型对应的每个变种模型与历史模型池中每个历史模型的普通层距离和跳层距离相加以作为该变种模型和该历史模型的模型距离。The addition secondary subunit 3022312 is used to add the normal layer distance and jump layer distance of each variant model corresponding to the target base model and each historical model in the historical model pool as the model of the variant model and the historical model distance.

可见,实施图4所描述的深度学习模型的探索装置,通过计算两个模型间的普通层距离和跳层距离得到两个模型间的模型距离,能够有效地比对两个模型,从而能够准确有效地筛选出目标变种模型。It can be seen that the exploration device implementing the deep learning model described in FIG. 4 obtains the model distance between the two models by calculating the normal layer distance and the jump layer distance between the two models, which can effectively compare the two models, so as to accurately Effectively filter out target variant models.

在该又进一步可选的实施例中,又进一步可选的,计算二级子单元3022311计算该目标基础模型对应的每个变种模型与历史模型池中每个历史模型的普通层距离的具体方式为:In this still further optional embodiment, still further optional, the specific method for calculating the second-level subunit 3022311 to calculate the common layer distance between each variant model corresponding to the target basic model and each historical model in the historical model pool for:

对计算该目标基础模型对应的每个变种模型的普通层进行信息编码,得到该变种模型对应的普通层信息列表,以及对历史模型池中每个历史模型的普通层进行信息编码,得到该历史模型对应的普通层信息列表;Perform information encoding on the common layer of each variant model corresponding to the target basic model to obtain a list of common layer information corresponding to the variant model, and encode information on the common layer of each historical model in the historical model pool to obtain the history A list of common layer information corresponding to the model;

根据该目标基础模型对应的每个变种模型对应的普通层信息列表和历史模型池中每个历史模型对应的普通层信息列表构造该变种模型与该历史模型对应的普通层矩阵;According to the normal layer information list corresponding to each variant model corresponding to the target basic model and the normal layer information list corresponding to each historical model in the historical model pool, construct the normal layer matrix corresponding to the variant model and the historical model;

以从左到右以及从上到下的顺序对该目标基础模型对应的每个变种模型和历史模型池中每个历史模型对应的普通层矩阵的每个元素进行赋值,其中,对该普通层矩阵的每个元素进行赋值的公式如下:Assign values to each variant model corresponding to the target base model and each element of the normal layer matrix corresponding to each historical model in the historical model pool in the order from left to right and from top to bottom, where the normal layer The formula for assigning a value to each element of the matrix is as follows:

matrixij=min(matrixij+dist(M1i,M2j),matrixi(j-1),matrix(i-1)j)matrix ij =min(matrix ij +dist(M1 i ,M2 j ),matrix i(j-1) ,matrix (i-1)j )

其中,matrixij表示该普通层矩阵中第i行第j列的元素,M1i表示该变种模型的第i个普通层,M2j表示该历史模型的第j个普通层;Wherein, matrix ij represents the element in the ith row and jth column of the common layer matrix, M1 i represents the ith common layer of the variant model, and M2 j represents the jth common layer of the historical model;

其中,函数dist用于计算M1i和M2j两个层之间的距离,当M1i和M2j是不同类型的层时,dist(M1i,M2j)的值为1,当M1i和M2j是相同类型的层时,dist(M1i,M2j)的值通过以下方式计算:Among them, the function dist is used to calculate the distance between the two layers of M1 i and M2 j . When M1 i and M2 j are different types of layers, the value of dist(M1 i , M2 j ) is 1. When M1 i and M2 j are different types of layers When M2 j are layers of the same type, the value of dist(M1 i ,M2 j ) is calculated by:

Figure BDA0002632188850000201
Figure BDA0002632188850000201

其中,ak为M1i表示的普通层的信息编码中第k个参数信息,bk为M2j表示的普通层的信息编码中第k个参数信息,n为信息编码中包含的参数信息的个数;Among them, a k is the k-th parameter information in the information coding of the common layer represented by M1 i , b k is the k-th parameter information in the information coding of the common layer represented by M2 j , and n is the parameter information contained in the information coding. number;

取该目标基础模型对应的每个变种模型和历史模型池中每个历史模型对应的普通层矩阵右下角的元素作为该变种模型与该历史模型的普通层距离。Take each variant model corresponding to the target base model and the element in the lower right corner of the normal layer matrix corresponding to each historical model in the historical model pool as the normal layer distance between the variant model and the historical model.

可见,实施图4所描述的深度学习模型的探索装置,通过对模型的普通层进行编码得到普通层信息列表,然后根据普通层信息列表构造普通层矩阵,并通过特定的方式对普通层矩阵进行赋值以得到模型间的普通层距离,这样能够实现计算两个模型间的普通层距离。It can be seen that the exploration device implementing the deep learning model described in FIG. 4 obtains a list of common layer information by encoding the common layer of the model, and then constructs a common layer matrix according to the list of common layer information, and performs a specific method on the common layer matrix. Assign value to get the common layer distance between models, so that the common layer distance between two models can be calculated.

在该又进一步可选的实施例中,又进一步可选的,计算二级子单元3022311计算该目标基础模型对应的每个变种模型与历史模型池中每个历史模型的跳层距离的具体方式为:In this still further optional embodiment, still further optional, the specific method for calculating the second-level subunit 3022311 to calculate the jump distance between each variant model corresponding to the target basic model and each historical model in the historical model pool for:

对该目标基础模型对应的每个变种模型的跳层进行信息编码,得到该变种模型对应的跳层信息列表,以及对历史模型池中每个历史模型的跳层进行信息编码,得到该历史模型对应的跳层信息列表;Perform information coding on the skip layer of each variant model corresponding to the target basic model to obtain a list of skip layer information corresponding to the variant model, and perform information coding on the skip layer of each historical model in the historical model pool to obtain the historical model. Corresponding layer jump information list;

根据该目标基础模型对应的每个变种模型对应的跳层信息列表和历史模型池中每个历史模型对应的跳层信息列表构造该变种模型和该历史模型对应的跳层矩阵,跳层矩阵的行数为该变种模型对应的跳层信息列表的长度,跳层矩阵的列数为该历史模型对应的跳层信息列表的长度;According to the layer jump information list corresponding to each variant model corresponding to the target basic model and the layer jump information list corresponding to each historical model in the historical model pool, the layer jump matrix corresponding to the variant model and the historical model is constructed. The number of rows is the length of the layer-hopping information list corresponding to the variant model, and the number of columns of the layer-hopping matrix is the length of the layer-hopping information list corresponding to the historical model;

根据以下公式对该目标基础模型对应的每个变种模型与历史模型池中每个历史模型对应的跳层矩阵的每个元素进行赋值:Each element of the jump-layer matrix corresponding to each variant model corresponding to the target base model and each historical model in the historical model pool is assigned a value according to the following formula:

skip_connection_matrixpq=dist2(S1p,S2q)skip_connection_matrix pq = dist2(S1 p , S2 q )

其中,skip_connection_matrix表示该跳层矩阵,skip_connection_matrixpq表示该跳层矩阵中第p行第q列的元素,S1p表示该变种模型的第p个跳层,S2q表示该历史模型的第q个跳层;Among them, skip_connection_matrix represents the skip layer matrix, skip_connection_matrix pq represents the element in the pth row and the qth column of the skip layer matrix, S1 p represents the pth skip layer of the variant model, and S2 q represents the qth skip of the historical model Floor;

其中,函数dist2用于计算S1p和S2q两个层之间的距离,当S1p和S2q是不同类型的层时,dist2(S1p,S2q)的值为1,当S1p和S2q是相同类型的层时,dist2(S1p,S2q)的值通过以下方式计算:Among them, the function dist2 is used to calculate the distance between the two layers S1 p and S2 q . When S1 p and S2 q are different types of layers, the value of dist2(S1 p , S2 q ) is 1, when S1 p and S2 q When S2 q are layers of the same type, the value of dist2(S1 p , S2 q ) is calculated by:

Figure BDA0002632188850000211
Figure BDA0002632188850000211

其中,ps表示该变种模型中第p层跳层中起始层在该模型中的层位置索引,qs表示该历史模型中第q层跳层中起始层在该模型中的层位置索引,pl表示该变种模型中第p层跳层的深度,ql表示该历史模型中第q层跳层的深度;Among them, p s represents the layer position index of the starting layer in the p-th layer jumping layer in the variant model, and q s represents the layer position of the starting layer in the q-th layer jumping layer in the historical model. index, p l represents the depth of the p-th layer jumping layer in the variant model, q l represents the q-th layer jumping layer depth in the historical model;

根据以下公式计算该目标基础模型对应的每个变种模型与历史模型池中每个历史模型之间的跳层距离:Calculate the jump distance between each variant model corresponding to the target base model and each historical model in the historical model pool according to the following formula:

dists=sum(skip_connection_matrix)+|s1-s2|dist s =sum(skip_connection_matrix)+|s1-s2|

其中,dists表示该变种模型和该历史模型之间的跳层距离,sum(skip_connection_matrix)表示将跳层矩阵skip_connection_matrix中的每一个元素相加进行求和,s1表示该变种模型对应的跳层信息列表的长度,s2表示该历史模型对应的跳层信息列表的长度。Among them, dist s represents the jumping distance between the variant model and the historical model, sum(skip_connection_matrix) means adding and summing each element in the skip_connection_matrix, and s1 represents the jumping information corresponding to the variant model The length of the list, s2 represents the length of the layer jump information list corresponding to the history model.

可见,实施图4所描述的深度学习模型的探索装置,通过对模型的跳层进行编码得到跳层信息列表,然后根据跳层信息列表构造跳层矩阵,并通过特定的方式对跳层矩阵进行赋值,最后根据跳层矩阵计算得到模型间的跳层距离,这样能够实现计算两个模型间的跳层距离。It can be seen that the exploration device for implementing the deep learning model described in FIG. 4 obtains a layer-jump information list by encoding the layer-jump of the model, then constructs a layer-jump matrix according to the layer-jump information list, and performs the layer-jump matrix in a specific way. Assignment, and finally calculate the jump distance between the models according to the jump matrix, so that the jump distance between the two models can be calculated.

在另一个可选的实施例中,训练子模块3023基于每个从节点的第二进程、每个从节点的本地云计算资源以及确定出的超参数对该从节点对应的每个深度学习模型执行训练与验证操作,得到该从节点对应的训练结果的具体方式为:In another optional embodiment, the training sub-module 3023 is based on the second process of each slave node, the local cloud computing resources of each slave node, and the determined hyperparameters for each deep learning model corresponding to the slave node The specific way to perform training and verification operations to obtain the training result corresponding to the slave node is as follows:

基于每个从节点的第二进程以及每个从节点的本地云计算资源,针对该从节点对应的每个深度学习模型,拟定该深度学习模型对应的超参数空间,超参数空间至少包括批处理数量和学习率;Based on the second process of each slave node and the local cloud computing resources of each slave node, for each deep learning model corresponding to the slave node, formulate a hyperparameter space corresponding to the deep learning model, and the hyperparameter space includes at least batch processing number and learning rate;

设定每个从节点对应的每个深度学习模型对应的搜索次数;Set the number of searches corresponding to each deep learning model corresponding to each slave node;

构造每个从节点对应的每个深度学习模型对应的集合,该集合用于保存该深度学习模型进行训练后在验证集上的评分;Construct a set corresponding to each deep learning model corresponding to each slave node, and the set is used to save the score on the validation set after the deep learning model is trained;

根据以下公式设定每个从节点对应的每个深度学习模型对应的目标函数:The objective function corresponding to each deep learning model corresponding to each slave node is set according to the following formula:

F=max(SC)F=max(SC)

其中,F为目标函数,SC为该深度学习模型进行训练后在验证集上的评分;Among them, F is the objective function, and SC is the score on the validation set after the deep learning model is trained;

在每个从节点对应的每个深度学习模型对应的超参数空间中随机选取一个起点,然后通过高斯过程映射在该超参数空间中循环进行搜索以选取出该深度学习模型对应的多个目标超参数,其中,高斯过程映射可以表示为:Randomly select a starting point in the hyperparameter space corresponding to each deep learning model corresponding to each slave node, and then cyclically search in the hyperparameter space through Gaussian process mapping to select multiple target hyperparameters corresponding to the deep learning model. parameters, where the Gaussian process map can be expressed as:

T=G(C、R、F、J)T=G(C, R, F, J)

其中,T为该深度学习模型对应的目标超参数,每个目标超参数均是G推荐的值得尝试的超参数的值,C为该深度学习模型对应的超参数空间,R为该深度学习模型对应的集合,F为该深度学习模型对应的目标函数,J为该深度学习模型对应的搜索次数;Among them, T is the target hyperparameter corresponding to the deep learning model, each target hyperparameter is the value of the hyperparameter worth trying recommended by G, C is the hyperparameter space corresponding to the deep learning model, and R is the deep learning model. The corresponding set, F is the objective function corresponding to the deep learning model, and J is the number of searches corresponding to the deep learning model;

基于每个从节点对应的每个深度学习模型对应的每个目标超参数对该深度学习模型执行训练与验证操作,得到该深度学习模型对应的多个中间深度学习模型以及每个中间深度学习模型对应的评分;Perform training and verification operations on the deep learning model based on each target hyperparameter corresponding to each deep learning model corresponding to each slave node, and obtain multiple intermediate deep learning models corresponding to the deep learning model and each intermediate deep learning model. corresponding score;

从每个从节点对应的每个深度学习模型对应的所有中间深度学习模型中,选取评分最高的中间深度学习模型以及该中间深度学习模型对应的评分作为该从节点对应的训练结果。From all the intermediate deep learning models corresponding to each deep learning model corresponding to each slave node, the intermediate deep learning model with the highest score and the score corresponding to the intermediate deep learning model are selected as the training result corresponding to the slave node.

可见,实施图4所描述的深度学习模型的探索装置,能够从超参数空间中选取出多个超参数对深度学习模型执行训练与验证操作,以得到每个超参数对应的训练结果,这样能够实现基于超参数执行深度学习模型的训练与验证。It can be seen that the exploration device implementing the deep learning model described in FIG. 4 can select multiple hyperparameters from the hyperparameter space to perform training and verification operations on the deep learning model, so as to obtain the training results corresponding to each hyperparameter. Implements training and validation of deep learning models based on hyperparameters.

对于深度学习模型的探索装置的具体描述可以参照深度学习模型的探索方法的具体描述,在此不再一一赘述。For the specific description of the device for exploring the deep learning model, reference may be made to the specific description of the method for exploring the deep learning model, which will not be repeated here.

可见,实施图4所描述的深度学习模型的探索装置,通过确定一个云计算资源作为主节点以及多个其他云计算资源作为多个从节点,然后基于每个从节点进行深度学习模型的训练操作得到该从节点的训练结果,最后根据训练结果的评分从所有的训练结果中确定出最优的深度学习模型,从而能够实现并行进行深度学习模型的训练操作,提高深度学习模型训练的效率,减少训练的时间,有利于深度学习模型探索技术在业务场景中的应用。另外,还通过在每个从节点中均生成该从节点对应的第一进程和第二进程,实现深度学习模型的生成与训练的并行化,使得深度学习模型的生成和训练无需相互等待,提高深度学习模型的训练效率。此外,还通过不断生成变种模型,基于模型距离对生成的变种模型进行筛选,最后基于不同的超参数对筛选后的变种模型进行训练与验证操作,从而能够在扩大模型的探索空间的同时能保证模型的训练效率。It can be seen that the exploration device for implementing the deep learning model described in FIG. 4 determines one cloud computing resource as the master node and multiple other cloud computing resources as multiple slave nodes, and then performs the training operation of the deep learning model based on each slave node. The training result of the slave node is obtained, and finally the optimal deep learning model is determined from all the training results according to the score of the training result, so that the training operation of the deep learning model can be performed in parallel, the efficiency of deep learning model training can be improved, and the reduction of The training time is conducive to the application of deep learning model exploration technology in business scenarios. In addition, by generating the first process and the second process corresponding to the slave node in each slave node, the parallelization of the generation and training of the deep learning model is realized, so that the generation and training of the deep learning model do not need to wait for each other, improving the Training efficiency of deep learning models. In addition, by continuously generating variant models, the generated variant models are screened based on the model distance, and finally the filtered variant models are trained and verified based on different hyperparameters, so as to expand the exploration space of the model while ensuring that The training efficiency of the model.

实施例三Embodiment 3

请参阅图5,图5是本发明实施例公开的又一种深度学习模型的探索装置的结构示意图。如图5所示,该装置可以包括:Please refer to FIG. 5. FIG. 5 is a schematic structural diagram of another apparatus for exploring a deep learning model disclosed in an embodiment of the present invention. As shown in Figure 5, the apparatus may include:

存储有可执行程序代码的存储器501;a memory 501 storing executable program code;

与存储器501耦合的处理器502;a processor 502 coupled to the memory 501;

处理器502调用存储器501中存储的可执行程序代码,用于执行实施例一或实施例二中所描述的深度学习模型的探索方法。The processor 502 invokes the executable program code stored in the memory 501 to execute the deep learning model exploration method described in the first embodiment or the second embodiment.

实施例四Embodiment 4

本发明实施例公开了一种计算机可读存储介质,其存储用于电子数据交换的计算机程序,其中,该计算机程序使得计算机执行实施例一或实施例二中所描述的深度学习模型的探索方法。An embodiment of the present invention discloses a computer-readable storage medium, which stores a computer program for electronic data exchange, wherein the computer program enables a computer to execute the deep learning model exploration method described in the first embodiment or the second embodiment .

实施例五Embodiment 5

本发明实施例公开了一种计算机程序产品,该计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,且该计算机程序可操作来使计算机执行实施例一或实施例二中所描述的深度学习模型的探索方法。An embodiment of the present invention discloses a computer program product. The computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute the first or second embodiment. Describes an exploration method for deep learning models.

以上所描述的装置实施例仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are only illustrative, wherein the modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, that is, they may be located in One place, or it can be distributed over multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.

通过以上的实施例的具体描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,存储介质包括只读存储器(Read-Only Memory,ROM)、随机存储器(Random Access Memory,RAM)、可编程只读存储器(Programmable Read-only Memory,PROM)、可擦除可编程只读存储器(ErasableProgrammable Read Only Memory,EPROM)、一次可编程只读存储器(One-timeProgrammable Read-Only Memory,OTPROM)、电子抹除式可复写只读存储器(Electrically-Erasable Programmable Read-Only Memory,EEPROM)、只读光盘(CompactDisc Read-Only Memory,CD-ROM)或其他光盘存储器、磁盘存储器、磁带存储器、或者能够用于携带或存储数据的计算机可读的任何其他介质。From the specific description of the above embodiments, those skilled in the art can clearly understand that each implementation manner can be implemented by means of software plus a necessary general hardware platform, and certainly can also be implemented by means of hardware. Based on this understanding, the above-mentioned technical solutions can be embodied in the form of software products in essence or that make contributions to the prior art. The computer software products can be stored in a computer-readable storage medium, and the storage medium includes a read-only memory. (Read-Only Memory, ROM), Random Access Memory (Random Access Memory, RAM), Programmable Read-only Memory (Programmable Read-only Memory, PROM), Erasable Programmable Read Only Memory (Erasable Programmable Read Only Memory, EPROM) , One-time Programmable Read-Only Memory (OTPROM), Electronically-Erasable Programmable Read-Only Memory (EEPROM), CompactDisc Read-Only Memory , CD-ROM) or other optical disk storage, magnetic disk storage, magnetic tape storage, or any other computer-readable medium that can be used to carry or store data.

最后应说明的是:本发明实施例公开的一种深度学习模型的探索方法及装置所揭露的仅为本发明较佳实施例而已,仅用于说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解;其依然可以对前述各项实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或替换,并不使相应的技术方案的本质脱离本发明各项实施例技术方案的精神和范围。Finally, it should be noted that the method and device for exploring a deep learning model disclosed in the embodiments of the present invention are only preferred embodiments of the present invention, and are only used to illustrate the technical solutions of the present invention, but not to limit them. Although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that; it can still modify the technical solutions described in the foregoing embodiments, or perform equivalent replacements to some of the technical features; However, these modifications or substitutions do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for exploring a deep learning model, the method comprising:
determining one cloud computing resource as a master node and a plurality of other cloud computing resources as a plurality of slave nodes, wherein the master node is used for scheduling the slave nodes to execute training operation on the deep learning model;
executing a training operation on the deep learning model based on each slave node to obtain a training result of the slave node, wherein the training result of each slave node comprises a target deep learning model and a score of the target deep learning model, and the target deep learning model included in the training result of each slave node is a trained deep learning model;
and determining an optimal deep learning model from all the target deep learning models according to the scores included in all the training results.
2. The method for exploring a deep learning model according to claim 1, wherein said performing a training operation on the deep learning model based on each slave node to obtain a training result of the slave node comprises:
creating a first process and a second process corresponding to each slave node, and generating a plurality of deep learning models corresponding to each slave node based on the first process of each slave node;
and performing training and verification operation on each deep learning model corresponding to each slave node based on the second process of each slave node, the local cloud computing resource of each slave node and the determined hyper-parameter to obtain a training result corresponding to the slave node.
3. The method for exploring a deep learning model according to claim 2, wherein the generating a plurality of deep learning models corresponding to each slave node based on the first process of the slave node includes:
based on the first process of each slave node, selecting a plurality of historical models from the determined historical model pool as a plurality of base models corresponding to the slave node, wherein the historical models are deep learning models generated by all the slave nodes;
selecting a plurality of basic models from all the basic models corresponding to each slave node based on the determined simulated annealing method as a plurality of target basic models corresponding to the slave node;
executing model deformation operation on each target base model corresponding to each slave node to obtain a plurality of variant models corresponding to the target base model, and screening out a target variant model corresponding to the target base model from the plurality of variant models corresponding to the target base model;
scoring each target variant model, and screening a plurality of deep learning models corresponding to the slave nodes from all the target variant models corresponding to the slave nodes according to the score of each target variant model;
the model deformation operation refers to at least one of network structure deepening operation, network structure widening operation and jump layer structure adding operation which are randomly performed on the neural network model.
4. The method for exploring a deep learning model as claimed in claim 3, wherein said step of screening out a target variant model corresponding to the target base model from a plurality of said variant models corresponding to the target base model comprises:
calculating the model distance between each variant model corresponding to the target base model and each historical model in the historical model pool;
judging whether each variant model corresponding to the target base model has a matching historical model, wherein the matching historical model corresponding to the variant model is the historical model of which the model distance from the variant model is smaller than a preset threshold value, and deleting the variant model from a plurality of variant models corresponding to the target base model when the variant model is judged to have the matching historical model;
and scoring each variant model corresponding to the target base model, and selecting the variant model with the highest score from the plurality of variant models corresponding to the target base model as the target variant model corresponding to the target base model.
5. The method for exploring a deep learning model as claimed in claim 4, wherein said calculating a model distance between each said variant model corresponding to the target base model and each said historical model in said historical model pool comprises:
calculating the distance between each variant model corresponding to the target base model and each historical model in the historical model pool;
calculating the layer jump distance between each variant model corresponding to the target base model and each historical model in the historical model pool;
and adding the common layer distance and the jump layer distance of each variant model corresponding to the target base model and each historical model in the historical model pool to serve as the model distance of the variant model and the historical model.
6. The method for exploring a deep learning model as claimed in claim 5, wherein said calculating a distance between a common layer of each said variant model corresponding to the target base model and each said historical model in said historical model pool comprises:
performing information coding on the common layer of each variant model corresponding to the calculated target base model to obtain a common layer information list corresponding to the variant model, and performing information coding on the common layer of each historical model in the historical model pool to obtain a common layer information list corresponding to the historical model;
constructing a common layer matrix corresponding to the variant model and the historical model according to a common layer information list corresponding to each variant model corresponding to the target base model and a common layer information list corresponding to each historical model in the historical model pool;
assigning values to each variant model corresponding to the target base model and each element of the common layer matrix corresponding to each historical model in the historical model pool from left to right and from top to bottom, wherein the formula for assigning each element of the common layer matrix is as follows:
matrixij=min(matrixij+dist(M1i,M2j),matrixi(j-1),matrix(i-1)j)
wherein, matrixijM1, an element representing the ith row and the jth column in the normal layer matrixiThe i-th common layer, M2, representing the variant modeljA jth common layer representing the history model;
where the function dist is used to compute M1iAnd M2jDistance between two layers, when M1iAnd M2jIn the case of layers of different types, dist (M1)i,M2j) When M1 is 1iAnd M2jIs the same type of layer, dist (M1)i,M2j) The value of (d) is calculated by:
Figure FDA0002632188840000031
wherein, akIs M1iInformation of the k-th parameter in the information coding of the indicated normal layer, bkIs M2jThe kth parameter information in the information coding of the indicated common layer, and n is the number of the parameter information contained in the information coding;
and taking the element of the lower right corner of the common layer matrix corresponding to each variant model corresponding to the target base model and each historical model in the historical model pool as the common layer distance between the variant model and the historical model.
7. The method for exploring a deep learning model according to claim 5 or 6, wherein said calculating a jump distance between each said variant model corresponding to the target base model and each said historical model in said historical model pool comprises:
carrying out information coding on the jump layer of each variant model corresponding to the target basic model to obtain a jump layer information list corresponding to the variant model, and carrying out information coding on the jump layer of each historical model in the historical model pool to obtain a jump layer information list corresponding to the historical model;
constructing a layer jump matrix corresponding to the variant model and the historical model according to a layer jump information list corresponding to each variant model corresponding to the target base model and a layer jump information list corresponding to each historical model in the historical model pool, wherein the row number of the layer jump matrix is the length of the layer jump information list corresponding to the variant model, and the column number of the layer jump matrix is the length of the layer jump information list corresponding to the historical model;
assigning a value to each element of the layer jump matrix corresponding to each of the variety models corresponding to the target base model and each of the historical models in the historical model pool according to the following formula:
skip_connection_matrixpq=dist2(S1p,S2q)
wherein the skip _ connection _ matrix represents the skip layer matrix, and the skip _ connection _ matrix represents the skip layer matrixpqRepresenting the elements of the p-th row and q-th column in the layer-hopping matrix, S1pP-th jump representing the variant model, S2qRepresenting the qth hop level of the historical model;
wherein the function dist2 is used to calculate S1pAnd S2qDistance between two layers when S1pAnd S2qIs a different type of layer, dist2 (S1)p,S2q) When S1 is 1pAnd S2qIs the same type of layer, dist2 (S1)p,S2q) The value of (d) is calculated by:
Figure FDA0002632188840000032
wherein p issRepresenting the layer position index of the starting layer in the p-th layer jump in the variant model, qsAn index of layer positions in the model representing the starting layer in the q-th layer jump in the historical model, plRepresenting the depth of the p-th layer jump in the variant model, qlRepresenting the depth of a q-th layer jump in the historical model;
calculating the jump layer distance between each variant model corresponding to the target base model and each historical model in the historical model pool according to the following formula:
dists=sum(skip_connection_matrix)+|s1-s2|
wherein, distsRepresents the skip-layer distance between the variant model and the history model, sum (skip _ connection _ matrix) represents that each element in a skip-layer matrix skip _ connection _ matrix is added and summed, s1 represents the length of the skip-layer information list corresponding to the variant model, and s2 represents the length of the skip-layer information list corresponding to the history model.
8. The method for exploring a deep learning model according to any one of claims 2 to 7, wherein the performing a training and verification operation on each deep learning model corresponding to each slave node based on the second process of each slave node, the local cloud computing resource of each slave node and the determined hyper-parameter to obtain a training result corresponding to the slave node comprises:
drawing up a hyper-parameter space corresponding to each deep learning model aiming at each deep learning model corresponding to each slave node based on the second process of each slave node and the local cloud computing resource of each slave node, wherein the hyper-parameter space at least comprises batch processing quantity and learning rate;
setting the number of searching times corresponding to each deep learning model corresponding to each slave node;
constructing a set corresponding to each deep learning model corresponding to each slave node, wherein the set is used for storing scores of the deep learning models on a verification set after training;
setting an objective function corresponding to each deep learning model corresponding to each slave node according to the following formula:
F=max(SC)
f is a target function, and SC is the score of the deep learning model on the verification set after training;
randomly selecting a starting point in the hyper-parameter space corresponding to each deep learning model corresponding to each slave node, and then circularly searching in the hyper-parameter space through Gaussian process mapping to select a plurality of target hyper-parameters corresponding to the deep learning model, wherein the Gaussian process mapping can be expressed as:
T=G(C、R、F、J)
the method comprises the following steps that T is a target hyper-parameter corresponding to the deep learning model, each target hyper-parameter is a value of a hyper-parameter worth trying recommended by G, C is a hyper-parameter space corresponding to the deep learning model, R is a set corresponding to the deep learning model, F is a target function corresponding to the deep learning model, and J is the number of search times corresponding to the deep learning model;
training and verifying the deep learning model based on each target hyper-parameter corresponding to each deep learning model corresponding to each slave node to obtain a plurality of intermediate deep learning models corresponding to the deep learning model and a score corresponding to each intermediate deep learning model;
and selecting the intermediate deep learning model with the highest score and the score corresponding to the intermediate deep learning model from all the intermediate deep learning models corresponding to the slave nodes as training results corresponding to the slave nodes.
9. An apparatus for exploring a deep learning model, the apparatus comprising:
the determining module is used for determining one cloud computing resource as a master node and a plurality of other cloud computing resources as a plurality of slave nodes, wherein the master node is used for scheduling the slave nodes to execute training operation on the deep learning model;
the training module is used for executing training operation on the deep learning model based on each slave node to obtain a training result of the slave node, the training result of each slave node comprises a target deep learning model and the score of the target deep learning model, and the target deep learning model included in the training result of each slave node is a trained deep learning model;
the determining module is further configured to determine an optimal deep learning model from all the target deep learning models according to the scores included in all the training results.
10. An apparatus for exploring a deep learning model, the apparatus comprising:
a memory storing executable program code;
a processor coupled with the memory;
the processor calls the executable program code stored in the memory to execute the exploration method of the deep learning model according to any one of claims 1-8.
CN202010814501.9A 2020-08-13 2020-08-13 Deep learning model exploration method and device Active CN111931916B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010814501.9A CN111931916B (en) 2020-08-13 2020-08-13 Deep learning model exploration method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010814501.9A CN111931916B (en) 2020-08-13 2020-08-13 Deep learning model exploration method and device

Publications (2)

Publication Number Publication Date
CN111931916A true CN111931916A (en) 2020-11-13
CN111931916B CN111931916B (en) 2024-08-02

Family

ID=73311279

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010814501.9A Active CN111931916B (en) 2020-08-13 2020-08-13 Deep learning model exploration method and device

Country Status (1)

Country Link
CN (1) CN111931916B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113010312A (en) * 2021-03-11 2021-06-22 山东英信计算机技术有限公司 Hyper-parameter tuning method, device and storage medium
CN113239635A (en) * 2021-06-16 2021-08-10 中国银行股份有限公司 Model evaluation method and device
CN114004358A (en) * 2021-12-29 2022-02-01 粤港澳大湾区数字经济研究院(福田) Deep learning model training method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650925A (en) * 2016-11-29 2017-05-10 郑州云海信息技术有限公司 Deep learning framework Caffe system and algorithm based on MIC cluster
WO2017128961A1 (en) * 2016-01-30 2017-08-03 华为技术有限公司 Method and device for training model in distributed system
CN108229528A (en) * 2017-08-16 2018-06-29 北京市商汤科技开发有限公司 Clustering Model training method and device, electronic equipment, computer storage media
US20190235484A1 (en) * 2018-01-31 2019-08-01 Hitachi, Ltd. Deep learning architecture for maintenance predictions with multiple modes
WO2019182590A1 (en) * 2018-03-21 2019-09-26 Visa International Service Association Automated machine learning systems and methods

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017128961A1 (en) * 2016-01-30 2017-08-03 华为技术有限公司 Method and device for training model in distributed system
CN106650925A (en) * 2016-11-29 2017-05-10 郑州云海信息技术有限公司 Deep learning framework Caffe system and algorithm based on MIC cluster
CN108229528A (en) * 2017-08-16 2018-06-29 北京市商汤科技开发有限公司 Clustering Model training method and device, electronic equipment, computer storage media
US20190235484A1 (en) * 2018-01-31 2019-08-01 Hitachi, Ltd. Deep learning architecture for maintenance predictions with multiple modes
WO2019182590A1 (en) * 2018-03-21 2019-09-26 Visa International Service Association Automated machine learning systems and methods

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113010312A (en) * 2021-03-11 2021-06-22 山东英信计算机技术有限公司 Hyper-parameter tuning method, device and storage medium
CN113010312B (en) * 2021-03-11 2024-01-23 山东英信计算机技术有限公司 Super-parameter tuning method, device and storage medium
CN113239635A (en) * 2021-06-16 2021-08-10 中国银行股份有限公司 Model evaluation method and device
CN114004358A (en) * 2021-12-29 2022-02-01 粤港澳大湾区数字经济研究院(福田) Deep learning model training method
CN114004358B (en) * 2021-12-29 2022-06-14 粤港澳大湾区数字经济研究院(福田) Deep learning model training method

Also Published As

Publication number Publication date
CN111931916B (en) 2024-08-02

Similar Documents

Publication Publication Date Title
US11651259B2 (en) Neural architecture search for convolutional neural networks
US11853893B2 (en) Execution of a genetic algorithm having variable epoch size with selective execution of a training algorithm
US20190278600A1 (en) Tiled compressed sparse matrix format
CN111931916A (en) Exploration method and device of deep learning model
CN109919183B (en) A kind of image recognition method, device, device and storage medium based on small sample
JP6892424B2 (en) Hyperparameter tuning methods, devices and programs
CN111406264A (en) Neural architecture search
EP3803580B1 (en) Efficient incident management in large scale computer systems
WO2017039684A1 (en) Classifier
CN113822130B (en) Model training method, scene recognition method, computing device, and medium
CN110222824B (en) Intelligent algorithm model autonomous generation and evolution method, system and device
CN116402138A (en) Time sequence knowledge graph reasoning method and system for multi-granularity historical aggregation
KR20220032861A (en) Neural architecture search method and attaratus considering performance in hardware
CN118193587A (en) Database query processing method, device, equipment, storage medium and product based on deep learning
CN113343725B (en) Anti-collision method and system for multiple RFID readers
CN118838687A (en) Task scheduling method and AI cloud computing system
CN112508351B (en) Strong robustness item recommendation method, system, device and medium in attack environment
CN111353815B (en) Potential user prediction method and system
CN112417304A (en) Data analysis service recommendation method and system for constructing data analysis process
JP2020198135A (en) Hyper parameter tuning method, device and program
JP7462206B2 (en) Learning device, learning method, and learning program
JPWO2019026703A1 (en) Learned model integration method, device, program, IC chip, and system
CN112907004A (en) Learning planning method, device and computer storage medium
TWI871112B (en) Device and method for recommending pipelines for ensemble learning model
Kinnaird-Heether The Impact of Auction-Based Knowledge Distribution Mechanisms on Optimization Problem Solving in Complex Dynamic Environments with Cultural Algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant