CN111931916A - Exploration method and device of deep learning model - Google Patents

Exploration method and device of deep learning model Download PDF

Info

Publication number
CN111931916A
CN111931916A CN202010814501.9A CN202010814501A CN111931916A CN 111931916 A CN111931916 A CN 111931916A CN 202010814501 A CN202010814501 A CN 202010814501A CN 111931916 A CN111931916 A CN 111931916A
Authority
CN
China
Prior art keywords
model
deep learning
variant
historical
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010814501.9A
Other languages
Chinese (zh)
Other versions
CN111931916B (en
Inventor
赵仕嘉
林涛
董浩欣
杨鹤鸣
向雷
李晁铭
麦洪永
陈华荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Planning and Designing Institute of Telecommunications Co Ltd
Original Assignee
Guangdong Planning and Designing Institute of Telecommunications Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Planning and Designing Institute of Telecommunications Co Ltd filed Critical Guangdong Planning and Designing Institute of Telecommunications Co Ltd
Priority to CN202010814501.9A priority Critical patent/CN111931916B/en
Publication of CN111931916A publication Critical patent/CN111931916A/en
Application granted granted Critical
Publication of CN111931916B publication Critical patent/CN111931916B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an exploration method and a device of a deep learning model, wherein the method comprises the following steps: determining one cloud computing resource as a master node and a plurality of other cloud computing resources as a plurality of slave nodes, wherein the master node is used for scheduling the slave nodes to execute training operation on the deep learning model; executing training operation on the deep learning model based on each slave node to obtain a training result of the slave node, wherein the training result of each slave node comprises a target deep learning model and the grade of the target deep learning model, and the target deep learning model included in the training result of each slave node is the trained deep learning model; and determining the optimal deep learning model from all the target deep learning models according to the scores included in all the training results. Therefore, the deep learning model training method and the deep learning model training system can realize the parallel training operation of the deep learning model, improve the efficiency of deep learning model training, reduce the training time and facilitate the application of the deep learning model exploration technology in the service scene.

Description

Exploration method and device of deep learning model
Technical Field
The invention relates to the technical field of deep learning, in particular to a method and a device for exploring a deep learning model.
Background
In recent years, the deep learning technology has the characteristics of reducing the complexity of user use and the difficulty of user technical understanding, and is rapidly applied to business scenes of various industries. In addition, because the service scene applying the deep learning technology has variability, in order to fully exploit the potential of the deep learning technology and improve the precision of the deep learning technology in practical application, it is important to train different service scenes to obtain a deep learning model suitable for the service scene.
In practical application, in order to obtain a deep learning model suitable for a specific service scenario, a suitable deep learning model can be obtained by performing deep learning model exploration (by continuously training and verifying various deep learning network structures and various hyper-parameters, and then selecting an optimal deep learning model from training and verifying results). However, the search of the deep learning model requires a large amount of computing resources, the generation and training processes of the deep learning model have high relevance, the serialization of the generation process of the deep learning model is high, and other factors, so the search method of the conventional deep learning model has low efficiency and long search time, and is not beneficial to the application of the search technology of the deep learning model in business scenes.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a method and a device for exploring a deep learning model, which can determine a plurality of cloud computing resources to perform training operation of the deep learning model in parallel, thereby improving the efficiency of training the deep learning model, reducing the training time and being beneficial to the application of the deep learning model exploration technology in a service scene.
In order to solve the technical problem, a first aspect of the present invention discloses a method for exploring a deep learning model, the method comprising:
determining one cloud computing resource as a master node and a plurality of other cloud computing resources as a plurality of slave nodes, wherein the master node is used for scheduling the slave nodes to execute training operation on the deep learning model;
executing a training operation on the deep learning model based on each slave node to obtain a training result of the slave node, wherein the training result of each slave node comprises a target deep learning model and a score of the target deep learning model, and the target deep learning model included in the training result of each slave node is a trained deep learning model;
and determining an optimal deep learning model from all the target deep learning models according to the scores included in all the training results.
As an optional implementation manner, in the first aspect of the present invention, the performing a training operation on the deep learning model based on each slave node to obtain a training result of the slave node includes:
creating a first process and a second process corresponding to each slave node, and generating a plurality of deep learning models corresponding to each slave node based on the first process of each slave node;
and performing training and verification operation on each deep learning model corresponding to each slave node based on the second process of each slave node, the local cloud computing resource of each slave node and the determined hyper-parameter to obtain a training result corresponding to the slave node.
As an optional implementation manner, in the first aspect of the present invention, the generating, based on the first process of each slave node, a plurality of deep learning models corresponding to the slave node includes:
based on the first process of each slave node, selecting a plurality of historical models from the determined historical model pool as a plurality of base models corresponding to the slave node, wherein the historical models are deep learning models generated by all the slave nodes;
selecting a plurality of basic models from all the basic models corresponding to each slave node based on the determined simulated annealing method as a plurality of target basic models corresponding to the slave node;
executing model deformation operation on each target base model corresponding to each slave node to obtain a plurality of variant models corresponding to the target base model, and screening out a target variant model corresponding to the target base model from the plurality of variant models corresponding to the target base model;
scoring each target variant model, and screening a plurality of deep learning models corresponding to the slave nodes from all the target variant models corresponding to the slave nodes according to the score of each target variant model;
the model deformation operation refers to at least one of network structure deepening operation, network structure widening operation and jump layer structure adding operation which are randomly performed on the neural network model.
As an optional implementation manner, in the first aspect of the present invention, the screening out the target variant model corresponding to the target base model from the plurality of variant models corresponding to the target base model includes:
calculating the model distance between each variant model corresponding to the target base model and each historical model in the historical model pool;
judging whether each variant model corresponding to the target base model has a matching historical model, wherein the matching historical model corresponding to the variant model is the historical model of which the model distance from the variant model is smaller than a preset threshold value, and deleting the variant model from a plurality of variant models corresponding to the target base model when the variant model is judged to have the matching historical model;
and scoring each variant model corresponding to the target base model, and selecting the variant model with the highest score from the plurality of variant models corresponding to the target base model as the target variant model corresponding to the target base model.
As an optional implementation manner, in the first aspect of the present invention, the calculating a model distance between each variant model corresponding to the target base model and each historical model in the historical model pool includes:
calculating the distance between each variant model corresponding to the target base model and each historical model in the historical model pool;
calculating the layer jump distance between each variant model corresponding to the target base model and each historical model in the historical model pool;
and adding the common layer distance and the jump layer distance of each variant model corresponding to the target base model and each historical model in the historical model pool to serve as the model distance of the variant model and the historical model.
As an optional implementation manner, in the first aspect of the present invention, the calculating a common layer distance between each variant model corresponding to the target base model and each historical model in the historical model pool includes:
performing information coding on the common layer of each variant model corresponding to the calculated target base model to obtain a common layer information list corresponding to the variant model, and performing information coding on the common layer of each historical model in the historical model pool to obtain a common layer information list corresponding to the historical model;
constructing a common layer matrix corresponding to the variant model and the historical model according to a common layer information list corresponding to each variant model corresponding to the target base model and a common layer information list corresponding to each historical model in the historical model pool;
assigning values to each variant model corresponding to the target base model and each element of the common layer matrix corresponding to each historical model in the historical model pool from left to right and from top to bottom, wherein the formula for assigning each element of the common layer matrix is as follows:
matrixij=min(matrixij+dist(M1i,M2j),matrixi(j-1),matrix(i-1)j)
wherein, matrixijM1, an element representing the ith row and the jth column in the normal layer matrixiThe i-th common layer, M2, representing the variant modeljA jth common layer representing the history model;
where the function dist is used to compute M1iAnd M2jDistance between two layers, when M1iAnd M2jIn the case of layers of different types, dist (M1)i,M2j) When M1 is 1iAnd M2jIs the same type of layer, dist (M1)i,M2j) The value of (d) is calculated by:
Figure BDA0002632188850000031
wherein, akIs M1iInformation of the k-th parameter in the information coding of the indicated normal layer, bkIs M2jThe kth parameter information in the information coding of the indicated common layer, and n is the number of the parameter information contained in the information coding;
and taking the element of the lower right corner of the common layer matrix corresponding to each variant model corresponding to the target base model and each historical model in the historical model pool as the common layer distance between the variant model and the historical model.
As an optional implementation manner, in the first aspect of the present invention, the calculating a layer jump distance between each variant model corresponding to the target base model and each historical model in the historical model pool includes:
carrying out information coding on the jump layer of each variant model corresponding to the target basic model to obtain a jump layer information list corresponding to the variant model, and carrying out information coding on the jump layer of each historical model in the historical model pool to obtain a jump layer information list corresponding to the historical model;
constructing a layer jump matrix corresponding to the variant model and the historical model according to a layer jump information list corresponding to each variant model corresponding to the target base model and a layer jump information list corresponding to each historical model in the historical model pool, wherein the row number of the layer jump matrix is the length of the layer jump information list corresponding to the variant model, and the column number of the layer jump matrix is the length of the layer jump information list corresponding to the historical model;
assigning a value to each element of the layer jump matrix corresponding to each of the variety models corresponding to the target base model and each of the historical models in the historical model pool according to the following formula:
skip_connection_matrixpq=dist2(S1p,S2q)
wherein the skip _ connection _ matrix represents the skip layer matrix, and the skip _ connection _ matrix represents the skip layer matrixpqRepresenting the elements of the p-th row and q-th column in the layer-hopping matrix, S1pP-th jump representing the variant model, S2qRepresenting the qth hop level of the historical model;
wherein the function dist2 is used to calculate S1pAnd S2qDistance between two layers when S1pAnd S2qIs a different type of layer, dist2 (S1)p,S2q) When S1 is 1pAnd S2qIs the same type of layer, dist2 (S1)p,S2q) The value of (d) is calculated by:
Figure BDA0002632188850000041
wherein p issRepresenting the layer position index of the starting layer in the p-th layer jump in the variant model, qsAn index of layer positions in the model representing the starting layer in the q-th layer jump in the historical model, plRepresenting the depth of the p-th layer jump in the variant model, qlRepresenting the depth of a q-th layer jump in the historical model;
calculating the jump layer distance between each variant model corresponding to the target base model and each historical model in the historical model pool according to the following formula:
dists=sum(skip_connection_matrix)+|s1-s2|
wherein, distsRepresents the skip-layer distance between the variant model and the history model, sum (skip _ connection _ matrix) represents that each element in a skip-layer matrix skip _ connection _ matrix is added and summed, s1 represents the length of the skip-layer information list corresponding to the variant model, and s2 represents the length of the skip-layer information list corresponding to the history model.
As an optional implementation manner, in the first aspect of the present invention, the performing, on the basis of the second process of each slave node, the local cloud computing resource of each slave node, and the determined hyper-parameter, a training and verification operation on each deep learning model corresponding to the slave node to obtain a training result corresponding to the slave node includes:
drawing up a hyper-parameter space corresponding to each deep learning model aiming at each deep learning model corresponding to each slave node based on the second process of each slave node and the local cloud computing resource of each slave node, wherein the hyper-parameter space at least comprises batch processing quantity and learning rate;
setting the number of searching times corresponding to each deep learning model corresponding to each slave node;
constructing a set corresponding to each deep learning model corresponding to each slave node, wherein the set is used for storing scores of the deep learning models on a verification set after training;
setting an objective function corresponding to each deep learning model corresponding to each slave node according to the following formula:
F=max(SC)
f is a target function, and SC is the score of the deep learning model on the verification set after training;
randomly selecting a starting point in the hyper-parameter space corresponding to each deep learning model corresponding to each slave node, and then circularly searching in the hyper-parameter space through Gaussian process mapping to select a plurality of target hyper-parameters corresponding to the deep learning model, wherein the Gaussian process mapping can be expressed as:
T=G(C、R、F、J)
the method comprises the following steps that T is a target hyper-parameter corresponding to the deep learning model, each target hyper-parameter is a value of a hyper-parameter worth trying recommended by G, C is a hyper-parameter space corresponding to the deep learning model, R is a set corresponding to the deep learning model, F is a target function corresponding to the deep learning model, and J is the number of search times corresponding to the deep learning model;
training and verifying the deep learning model based on each target hyper-parameter corresponding to each deep learning model corresponding to each slave node to obtain a plurality of intermediate deep learning models corresponding to the deep learning model and a score corresponding to each intermediate deep learning model;
and selecting the intermediate deep learning model with the highest score and the score corresponding to the intermediate deep learning model from all the intermediate deep learning models corresponding to the slave nodes as training results corresponding to the slave nodes.
The invention discloses a second aspect of the exploration device of the deep learning model, the device includes:
the determining module is used for determining one cloud computing resource as a master node and a plurality of other cloud computing resources as a plurality of slave nodes, wherein the master node is used for scheduling the slave nodes to execute training operation on the deep learning model;
the training module is used for executing training operation on the deep learning model based on each slave node to obtain a training result of the slave node, the training result of each slave node comprises a target deep learning model and the score of the target deep learning model, and the target deep learning model included in the training result of each slave node is a trained deep learning model;
the determining module is further configured to determine an optimal deep learning model from all the target deep learning models according to the scores included in all the training results.
As an alternative embodiment, in the second aspect of the present invention, the training module comprises:
the creating submodule is used for creating a first process and a second process corresponding to each slave node;
the generation submodule is used for generating a plurality of deep learning models corresponding to each slave node based on the first process of the slave node;
and the training sub-module is used for performing training and verification operations on each deep learning model corresponding to each slave node based on the second process of each slave node, the local cloud computing resource of each slave node and the determined hyper-parameter to obtain a training result corresponding to the slave node.
As an optional implementation manner, in the second aspect of the present invention, the generating sub-module includes:
the selection unit is used for selecting a plurality of historical models from the determined historical model pool as a plurality of basic models corresponding to the slave node based on the first process of each slave node, wherein the historical models are deep learning models generated by all the slave nodes;
the selecting unit is further configured to select a plurality of the base models from all the base models corresponding to each slave node based on the determined simulated annealing method as a plurality of target base models corresponding to the slave node;
the deformation unit is used for executing model deformation operation on each target base model corresponding to each slave node to obtain a plurality of variant models corresponding to the target base model;
the first screening unit is used for screening a target variant model corresponding to the target basic model from a plurality of variant models corresponding to the target basic model;
the second screening unit is used for scoring each target variant model and screening a plurality of deep learning models corresponding to the slave nodes from all the target variant models corresponding to the slave nodes according to the score of each target variant model;
the model deformation operation refers to at least one of network structure deepening operation, network structure widening operation and jump layer structure adding operation which are randomly performed on the neural network model.
As an alternative embodiment, in the second aspect of the present invention, the first screening unit includes:
the calculation subunit is used for calculating the model distance between each variant model corresponding to the target base model and each historical model in the historical model pool;
a judging subunit, configured to judge whether a matching history model exists in each of the variant models corresponding to the target base model, where the matching history model corresponding to the variant model is the history model whose model distance from the variant model is smaller than a preset threshold, and delete the variant model from the plurality of variant models corresponding to the target base model when it is judged that the matching history model exists in the variant model;
and the selecting subunit is used for scoring each variant model corresponding to the target base model, and selecting the variant model with the highest score from the plurality of variant models corresponding to the target base model as the target variant model corresponding to the target base model.
As an alternative embodiment, in the second aspect of the present invention, the calculation subunit includes:
a calculation secondary subunit, configured to calculate a common layer distance between each variant model corresponding to the target base model and each historical model in the historical model pool;
the calculation secondary subunit is further configured to calculate a layer jump distance between each variant model corresponding to the target base model and each historical model in the historical model pool;
and the addition secondary subunit is used for adding the common layer distance and the skip layer distance of each variant model corresponding to the target base model and each historical model in the historical model pool to serve as the model distance of the variant model and the historical model.
As an optional implementation manner, in the second aspect of the present invention, a specific manner of calculating a common layer distance between each variant model corresponding to the target base model and each historical model in the historical model pool by the calculation secondary subunit is as follows:
performing information coding on the common layer of each variant model corresponding to the calculated target base model to obtain a common layer information list corresponding to the variant model, and performing information coding on the common layer of each historical model in the historical model pool to obtain a common layer information list corresponding to the historical model;
constructing a common layer matrix corresponding to the variant model and the historical model according to a common layer information list corresponding to each variant model corresponding to the target base model and a common layer information list corresponding to each historical model in the historical model pool;
assigning values to each variant model corresponding to the target base model and each element of the common layer matrix corresponding to each historical model in the historical model pool from left to right and from top to bottom, wherein the formula for assigning each element of the common layer matrix is as follows:
matrixij=min(matrixij+dist(M1i,M2j),matrixi(j-1),matrix(i-1)j)
wherein, matrixijM1, an element representing the ith row and the jth column in the normal layer matrixiThe i-th common layer, M2, representing the variant modeljA jth common layer representing the history model;
where the function dist is used to compute M1iAnd M2jDistance between two layers, when M1iAnd M2jIn the case of layers of different types, dist (M1)i,M2j) When M1 is 1iAnd M2jIs the same type of layer, dist (M1)i,M2j) The value of (d) is calculated by:
Figure BDA0002632188850000071
wherein, akIs M1iInformation of the k-th parameter in the information coding of the indicated normal layer, bkIs M2jThe kth parameter information in the information coding of the indicated common layer, and n is the number of the parameter information contained in the information coding;
and taking the element of the lower right corner of the common layer matrix corresponding to each variant model corresponding to the target base model and each historical model in the historical model pool as the common layer distance between the variant model and the historical model.
As an optional implementation manner, in the second aspect of the present invention, a specific manner of calculating the layer jump distance between each variant model corresponding to the target base model and each historical model in the historical model pool by the secondary calculation subunit is as follows:
carrying out information coding on the jump layer of each variant model corresponding to the target basic model to obtain a jump layer information list corresponding to the variant model, and carrying out information coding on the jump layer of each historical model in the historical model pool to obtain a jump layer information list corresponding to the historical model;
constructing a layer jump matrix corresponding to the variant model and the historical model according to a layer jump information list corresponding to each variant model corresponding to the target base model and a layer jump information list corresponding to each historical model in the historical model pool, wherein the row number of the layer jump matrix is the length of the layer jump information list corresponding to the variant model, and the column number of the layer jump matrix is the length of the layer jump information list corresponding to the historical model;
assigning a value to each element of the layer jump matrix corresponding to each of the variety models corresponding to the target base model and each of the historical models in the historical model pool according to the following formula:
skip_connection_matrixpq=dist2(S1p,S2q)
wherein the skip _ connection _ matrix represents the skip layer matrix, and the skip _ connection _ matrix represents the skip layer matrixpqRepresenting the elements of the p-th row and q-th column in the layer-hopping matrix, S1pP-th jump representing the variant model, S2qRepresenting the qth hop level of the historical model;
wherein the function dist2 is used to calculate S1pAnd S2qDistance between two layers when S1pAnd S2qIs a different type of layer, dist2 (S1)p,S2q) When S1 is 1pAnd S2qIs the same type of layer, dist2 (S1)p,S2q) The value of (d) is calculated by:
Figure BDA0002632188850000072
wherein p issRepresenting the layer position index of the starting layer in the p-th layer jump in the variant model, qsAn index of layer positions in the model representing the starting layer in the q-th layer jump in the historical model, plRepresenting the depth of the p-th layer jump in the variant model, qlRepresenting the depth of a q-th layer jump in the historical model;
calculating the jump layer distance between each variant model corresponding to the target base model and each historical model in the historical model pool according to the following formula:
dists=sum(skip_connection_matrix)+|s1-s2|
wherein, distsRepresents the skip-layer distance between the variant model and the history model, sum (skip _ connection _ matrix) represents that each element in a skip-layer matrix skip _ connection _ matrix is added and summed, s1 represents the length of the skip-layer information list corresponding to the variant model, and s2 represents the length of the skip-layer information list corresponding to the history model.
As an optional implementation manner, in the second aspect of the present invention, the training sub-module performs training and verification operations on each deep learning model corresponding to each slave node based on the second process of each slave node, the local cloud computing resource of each slave node, and the determined hyper-parameter, and a specific manner of obtaining a training result corresponding to the slave node is as follows:
drawing up a hyper-parameter space corresponding to each deep learning model aiming at each deep learning model corresponding to each slave node based on the second process of each slave node and the local cloud computing resource of each slave node, wherein the hyper-parameter space at least comprises batch processing quantity and learning rate;
setting the number of searching times corresponding to each deep learning model corresponding to each slave node;
constructing a set corresponding to each deep learning model corresponding to each slave node, wherein the set is used for storing scores of the deep learning models on a verification set after training;
setting an objective function corresponding to each deep learning model corresponding to each slave node according to the following formula:
F=max(SC)
f is a target function, and SC is the score of the deep learning model on the verification set after training;
randomly selecting a starting point in the hyper-parameter space corresponding to each deep learning model corresponding to each slave node, and then circularly searching in the hyper-parameter space through Gaussian process mapping to select a plurality of target hyper-parameters corresponding to the deep learning model, wherein the Gaussian process mapping can be expressed as:
T=G(C、R、F、J)
the method comprises the following steps that T is a target hyper-parameter corresponding to the deep learning model, each target hyper-parameter is a value of a hyper-parameter worth trying recommended by G, C is a hyper-parameter space corresponding to the deep learning model, R is a set corresponding to the deep learning model, F is a target function corresponding to the deep learning model, and J is the number of search times corresponding to the deep learning model;
training and verifying the deep learning model based on each target hyper-parameter corresponding to each deep learning model corresponding to each slave node to obtain a plurality of intermediate deep learning models corresponding to the deep learning model and a score corresponding to each intermediate deep learning model;
and selecting the intermediate deep learning model with the highest score and the score corresponding to the intermediate deep learning model from all the intermediate deep learning models corresponding to the slave nodes as training results corresponding to the slave nodes.
The third aspect of the invention discloses an exploration device of a deep learning model, which comprises:
a memory storing executable program code;
a processor coupled with the memory;
the processor calls the executable program codes stored in the memory to execute the exploration method of the deep learning model disclosed by the first aspect of the invention.
In a fourth aspect, the present invention discloses a computer storage medium storing computer instructions for executing the method for exploring a deep learning model disclosed in the first aspect of the present invention when the computer instructions are invoked.
Compared with the prior art, the invention has the following beneficial effects:
according to the embodiment of the invention, one cloud computing resource is determined as a master node, and other cloud computing resources are determined as a plurality of slave nodes, then the training operation of the deep learning model is carried out based on each slave node to obtain the training result of the slave node, and finally the optimal deep learning model is determined from all the training results according to the grade of the training result, so that the training operation of the deep learning model can be carried out in parallel, the training efficiency of the deep learning model is improved, the training time is reduced, and the application of the deep learning model exploration technology in a service scene is facilitated.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart illustrating a method for exploring a deep learning model according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating a method for exploring another deep learning model according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of an exploration apparatus of a deep learning model according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of an exploration device of another deep learning model disclosed in the embodiments of the present invention;
fig. 5 is a schematic structural diagram of a exploration apparatus for yet another deep learning model according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," and the like in the description and claims of the present invention and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, apparatus, product, or apparatus that comprises a list of steps or elements is not limited to those listed but may alternatively include other steps or elements not listed or inherent to such process, method, product, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The invention discloses a method and a device for exploring a deep learning model, which are characterized in that a cloud computing resource is determined to serve as a main node, and other cloud computing resources serve as a plurality of slave nodes, then training operation of the deep learning model is carried out on the basis of each slave node to obtain a training result of the slave node, and finally an optimal deep learning model is determined from all training results according to scores of the training results, so that the deep learning model can be trained in parallel, the deep learning model training efficiency is improved, the training time is shortened, and the application of a deep learning model exploration technology in a service scene is facilitated.
Example one
Referring to fig. 1, fig. 1 is a flowchart illustrating a method for exploring a deep learning model according to an embodiment of the present invention. As shown in fig. 1, the exploration method of the deep learning model may include the following operations:
101. determining one cloud computing resource as a master node and a plurality of other cloud computing resources as a plurality of slave nodes, wherein the master node is used for scheduling the plurality of slave nodes to execute training operation on the deep learning model.
In the above step 101, the master node is mainly responsible for interaction with the user, starting of the slave nodes, interaction of the slave nodes, and aggregation of training results of the deep learning model. The main center of the main node is an information circulation module which is used for managing the interaction of each module of the main node, the module of the main node comprises a parameter adjusting device, an evaluator, a training service module and the like, wherein the parameter adjusting device is used for generating a Network structure of a deep learning model, and the parameter adjusting device needs to specify a mode, such as Network Morphism, in an initialization stage. The evaluator is used for evaluating the network structure. The training service module is used for scheduling the slave nodes to train. The interaction of each module of the main node, the interaction of the main node and the slave nodes and the interaction of the main node and the user can be carried out through network communication. In the parallel training process, the master node sends the parameter adjuster to the slave nodes, so that the slave nodes can autonomously generate and train the deep learning model.
Specifically, the cloud computing resource starting and scheduling process can be described in the following, and a user configures related task information (for example, image classification related) through yml, and then starts a master node through a shell script, and the master node starts nodejs, and in the process, the user can use a web to view the cloud running state. The master node starts the message circulation module and the zmq module, and then starts interaction between the management modules based on typescript (typescript is java software, and network communication is performed here) and the slave nodes. The method comprises the steps that a main node analyzes shell request information of a user, a service case, a parameter adjusting device and a historical information table (used for storing training results uploaded by all slave nodes) are initialized according to the shell request information, a 16-layer initial structure (combination of a plurality of types of deep learning layers, such as convolution and sampling layers) is stored in the historical information table to serve as a basis for generating a network structure, the slave nodes are started through a training service module according to the obtained slave node information based on shell scripts, then the service case (comprising configuration parameters and service data corresponding to training, for example, when the case is image classification, the configuration parameters are image input dimensionality and image output dimensionality, the service data is an image classification training data set), the parameter adjusting device and the historical information table are transmitted to the slave nodes through a network. Next, the slave node starts to maintain a local history information table (for storing the training result of the slave node), a local parameter adjuster and a local service use case.
102. And performing a training operation on the deep learning model based on each slave node to obtain a training result of the slave node.
In the step 102, the training result of each slave node includes a target deep learning model and a score of the target deep learning model, and the target deep learning model included in the training result of each slave node is a trained deep learning model. Here, the score of the deep learning model may be an accuracy score of the deep learning model on the verification set. After each slave node receives the service use case, the parameter transferring device and the historical information table transmitted by the master node through the network, the local historical information table (used for storing the training result of the slave node), the local parameter transferring device and the local service use case can be maintained accordingly, and the deep learning model can be trained independently. Specifically, a local parameter adjuster and a local service use case are used for training a deep learning model, and then an obtained training result is stored in a local historical information table.
In an optional embodiment, performing a training operation on the deep learning model based on each slave node to obtain a training result of the slave node, includes:
creating a first process and a second process corresponding to each slave node, and generating a plurality of deep learning models corresponding to each slave node based on the first process of the slave node;
and performing training and verification operation on each deep learning model corresponding to each slave node based on the second process of each slave node, the local cloud computing resource of each slave node and the determined hyper-parameter to obtain a training result corresponding to the slave node.
In this alternative embodiment, a first process and a second process of the slave node are created in each slave node, the first process of the slave node is used for continuously and circularly executing the task of generating the deep learning model to continuously generate a new deep learning model, and the second process of the slave node is used for continuously and circularly executing the task of training and verifying the deep learning model generated by the first process.
Therefore, by implementing the optional embodiment, the first process and the second process corresponding to the slave node are generated in each slave node, the new deep learning model is continuously generated through the first process, and the newly generated deep learning model is continuously trained and verified through the second process to obtain the training result, so that the generation and the training of the deep learning model can be parallelized, the generation and the training of the deep learning model do not need to wait for each other, and the training efficiency of the deep learning model is improved.
103. And determining the optimal deep learning model from all the target deep learning models according to the scores included in all the training results.
In step 103, since the history information table is used to store all the training results uploaded from the nodes, an optimal deep learning model can be determined based on the information in the history information table. Here, the highest-scoring target deep learning model may be selected as the optimal deep learning model.
It can be seen that, by implementing the exploration method of the deep learning model described in fig. 1, a cloud computing resource is determined as a master node, and a plurality of other cloud computing resources are determined as a plurality of slave nodes, then the training operation of the deep learning model is performed based on each slave node to obtain the training result of the slave node, and finally the optimal deep learning model is determined from all the training results according to the scores of the training results, so that the training operation of the deep learning model can be performed in parallel, the efficiency of deep learning model training is improved, the training time is reduced, and the application of the deep learning model exploration technology in a service scene is facilitated. In addition, the first process and the second process corresponding to each slave node are generated in each slave node, so that the generation and the training of the deep learning model are parallelized, the generation and the training of the deep learning model do not need to wait for each other, and the training efficiency of the deep learning model is improved.
Example two
Referring to fig. 2, fig. 2 is a flowchart illustrating another deep learning model exploration method according to an embodiment of the present invention. As shown in fig. 2, the exploration method of the deep learning model may include the following operations:
201. determining one cloud computing resource as a master node and a plurality of other cloud computing resources as a plurality of slave nodes, wherein the master node is used for scheduling the plurality of slave nodes to execute training operation on the deep learning model.
202. A first process and a second process are created for each slave node.
203. And selecting a plurality of historical models from the determined historical model pool as a plurality of base models corresponding to the slave node based on the first process of each slave node, wherein the historical models are deep learning models generated by all the slave nodes.
In step 203, the historical model pool may be determined according to a historical information table, and the historical information table may include model numbers and model information (network structure, model scores, model weights, etc.) of the historical models. Here, the historical model of the top ten model scores may be selected as ten base models corresponding to the slave nodes.
204. And selecting a plurality of basic models from all the basic models corresponding to each slave node based on the determined simulated annealing method as a plurality of target basic models corresponding to the slave node.
In step 204, the simulated annealing method is an algorithm for finding a global optimal solution, and can effectively avoid trapping in a local optimal solution. And selecting a target basic model based on the simulated annealing method, and efficiently exploring a deep learning model with higher score.
205. And executing model deformation operation on each target base model corresponding to each slave node to obtain a plurality of variant models corresponding to the target base model, and screening out the target variant model corresponding to the target base model from the plurality of variant models corresponding to the target base model.
In step 205, the model morphing operation refers to randomly performing at least one of a network structure deepening operation, a network structure widening operation, and a layer jump structure adding operation on the neural network model. A plurality of variant models corresponding to the target basic model can be obtained by executing the model deformation operation, so that the exploration space can be enlarged, and the exploration result is more accurate.
In an optional embodiment, the screening out the target variant model corresponding to the target base model from the plurality of variant models corresponding to the target base model includes:
calculating the model distance between each variant model corresponding to the target base model and each historical model in the historical model pool;
judging whether each variant model corresponding to the target base model has a matching history model, wherein the matching history model corresponding to the variant model is a history model of which the model distance from the variant model is smaller than a preset threshold value, and deleting the variant model from a plurality of variant models corresponding to the target base model when the variant model is judged to have the matching history model;
and scoring each variant model corresponding to the target base model, and selecting the variant model with the highest score from the plurality of variant models corresponding to the target base model as the target variant model corresponding to the target base model.
In this alternative embodiment, since the model deformation operation is a random operation, the structure of the obtained variant model is also random, and if the structure deformation of the variant model is not large, the meaning of performing subsequent training and verification operations on the variant model is not large, which may cause a waste of computing resources. Therefore, the variant models with too small model distances are deleted by calculating the model distances between the variant models and the historical models, then the remaining variant models are scored, and the variant model with the highest score is selected from the remaining variant models to serve as the target variant model. Optionally, when the number of the historical models in the historical model pool is greater than 40, a process pool can be created in the first process, and each sub-process in the process pool is used for executing a task of calculating model distances between the variant models and the historical models, so that the model distances between each variant model and each historical model can be calculated in parallel, and the exploration efficiency of the deep learning model is improved.
Specifically, the scoring rules for the variant models are:
Figure BDA0002632188850000131
wherein n is the number of history models, diThe distance between the variant model and the ith historical model, and acc is the scoring result.
For example, if the number of historical models is 2, and the number of variant models and their distances are 2 and 4, respectively, the score is: (2+4)/2 ═ 3.
Therefore, by implementing the optional embodiment, the variant models with too small distance from the historical model are deleted, and the variant model with the highest score is selected from the rest variant models to serve as the target variant model, so that the target variant model can be screened from the multiple variant models, the subsequent data processing amount is reduced, and the training efficiency is improved.
In this optional embodiment, further optionally, calculating a model distance between each variant model corresponding to the target base model and each historical model in the historical model pool includes:
calculating the distance between each variant model corresponding to the target base model and each common layer of each historical model in the historical model pool;
calculating the layer jump distance between each variant model corresponding to the target base model and each historical model in the historical model pool;
and adding the common layer distance and the skip layer distance of each variant model corresponding to the target base model and each historical model in the historical model pool to serve as the model distances of the variant model and the historical model.
In this further alternative embodiment, the normal layer of the deep learning model includes a single convolution layer, a batch normalization layer, a pooling layer, etc., and the layer skipping of the deep learning model refers to a residual connection layer. Since the deep learning model is generally composed of a plurality of normal layers and a plurality of jump layers, the two models can be effectively compared by calculating a normal layer distance and a jump layer distance between the two deep learning models and then adding the normal layer distance and the jump layer distance as a model distance between the two models. Optionally, the number of layer jump layers of the variant model to be compared and the historical model may be counted first, and when the number of layer jump layers is greater than 5, a plurality of sub-processes are created in the first process of the slave node to calculate the layer jump distance between the two models in parallel, so that the layer jump distance between the two models can be calculated in parallel, and the training efficiency is improved.
Therefore, by implementing the further optional embodiment, the model distance between the two models is obtained by calculating the common layer distance and the skip layer distance between the two models, the two models can be effectively compared, and the target variant model can be accurately and effectively screened out.
In this further optional embodiment, yet further optionally, calculating a common layer distance between each variant model corresponding to the target base model and each historical model in the historical model pool includes:
performing information coding on the common layer of each variant model corresponding to the calculated target base model to obtain a common layer information list corresponding to the variant model, and performing information coding on the common layer of each historical model in the historical model pool to obtain a common layer information list corresponding to the historical model;
constructing a common layer matrix corresponding to the variant model and the historical model according to a common layer information list corresponding to each variant model corresponding to the target base model and a common layer information list corresponding to each historical model in the historical model pool;
assigning values to each variant model corresponding to the target base model and each element of a common layer matrix corresponding to each historical model in the historical model pool in a sequence from left to right and from top to bottom, wherein the formula for assigning each element of the common layer matrix is as follows:
matrixij=min(matrixij+dist(M1i,M2j),matrixi(j-1),matrix(i-1)j)
wherein, matrixijM1, an element representing the ith row and the jth column in the normal layer matrixiThe i-th common layer, M2, representing the variant modeljA jth common layer representing the history model;
where the function dist is used to compute M1iAnd M2jDistance between two layers, when M1iAnd M2jIn the case of layers of different types, dist (M1)i,M2j) When M1 is 1iAnd M2jIs the same type of layer, dist (M1)i,M2j) The value of (d) is calculated by:
Figure BDA0002632188850000141
wherein, akIs M1iInformation of the k-th parameter in the information coding of the indicated normal layer, bkIs M2jThe kth parameter information in the information coding of the indicated common layer, and n is the number of the parameter information contained in the information coding;
and taking the element of the lower right corner of the common layer matrix corresponding to each variant model corresponding to the target base model and each historical model in the historical model pool as the common layer distance between the variant model and the historical model.
In this still further optional embodiment, the process of encoding information of the normal layer of the model to obtain the normal layer information list corresponding to the model may be represented as:
M=(Lm1,Lm2,...,LmN)
wherein M represents a common layer information list of the model, N represents the number of common layers of the model, Lm representsiAnd coding the information of the I layer common layer of the model. For example, information of one active layer is encoded as a character string "ReLU".
Specifically, the specific process of constructing the common layer matrix corresponding to the variant model and the history model according to the common layer information list corresponding to the variant model and the common layer information list corresponding to the history model may be as follows:
constructing a common layer matrix with m1+1 rows and m2+1 columns according to the length m1 of the common layer information list corresponding to the variant model and the length m2 of the common layer information list corresponding to the history model, and then assigning 0 to m1 to the first row and 0 to m2 to the first column of the common layer matrix, thereby realizing the initialization of the common layer matrix and completing the construction of the common layer matrix.
Specifically, for function dist (M1)i,M2j) The description is as follows:
the layers of the deep learning model are usually convolutional layers, sampling layers, etc., e.g., M1iAs a convolutional layer, M2jFor the sampling layer, the two layers are of different types, then dist (M1)i,M2j) Has a value of 1. As another example, M1iThe corresponding information is encoded as "conv (32, 8)", M2jThe corresponding information is encoded as "conv (16, 4)", and a function dist (M1) is calculatedi,M2j) When the value (c) is greater than the value (c), it is determined whether the types of the two layers are the same according to the first 4 letters of the information codes, and the first 4 letters of the information codes of the two layers are both "conv", so that the two layers are the same type of layer. In addition, M1iThe corresponding information code "conv (32, 8)" indicates M1iThe first parameter information a1 (the number of convolution kernels) of the layer is 32, the second parameter information a2 (the size of the convolution kernels) is 8, and M2jThe corresponding information code "conv (16, 4)" indicates M2jSince the first parameter information b1 (the number of convolution kernels) of the layer is 16 and the second parameter information b2 (the size of the convolution kernels) is 4, dist (M1)i,M2j)=(32-16)/2*32+(8-4)/2*8=0.5。
It can be seen that, in the implementation of this further optional embodiment, the common layer information list is obtained by encoding the common layer of the models, then the common layer matrix is constructed according to the common layer information list, and the common layer matrix is assigned in a specific manner to obtain the common layer distance between the models, so that the common layer distance between the two models can be calculated.
In this further optional embodiment, yet further optional, calculating a layer jump distance between each variant model corresponding to the target base model and each historical model in the historical model pool includes:
carrying out information coding on the layer jump of each variant model corresponding to the target basic model to obtain a layer jump information list corresponding to the variant model, and carrying out information coding on the layer jump of each historical model in a historical model pool to obtain a layer jump information list corresponding to the historical model;
constructing a layer jump matrix corresponding to the variant model and the historical model according to a layer jump information list corresponding to each variant model corresponding to the target basic model and a layer jump information list corresponding to each historical model in a historical model pool, wherein the row number of the layer jump matrix is the length of the layer jump information list corresponding to the variant model, and the column number of the layer jump matrix is the length of the layer jump information list corresponding to the historical model;
assigning values to each element of the layer jump matrix corresponding to each variety model corresponding to the target base model and each historical model in the historical model pool according to the following formula:
skip_connection_matrixpq=dist2(S1p,S2q)
wherein the skip _ connection _ matrix represents the skip layer matrix, and the skip _ connection _ matrix represents the skip layer matrixpqRepresenting the elements of the p-th row and q-th column in the layer-hopping matrix, S1pP-th jump representing the variant model, S2qRepresenting the qth hop level of the historical model;
wherein the function dist2 is used to calculate S1pAnd S2qDistance between two layers when S1pAnd S2qIs a different type of layer, dist2 (S1)p,S2q) When S1 is 1pAnd S2qIs the same type of layer, dist2 (S1)p,S2q) The value of (d) is calculated by:
Figure BDA0002632188850000151
wherein p issIndicating the starting layer in the p-th layer jump in the variant modelLayer position index in model, qsAn index of layer positions in the model representing the starting layer in the q-th layer jump in the historical model, plRepresenting the depth of the p-th layer jump in the variant model, qlRepresenting the depth of a q-th layer jump in the historical model;
calculating the jump layer distance between each variant model corresponding to the target base model and each historical model in the historical model pool according to the following formula:
dists=sum(skip_connection_matrix)+|s1-s2|
wherein, distsRepresents the skip-layer distance between the variant model and the history model, sum (skip _ connection _ matrix) represents that each element in a skip-layer matrix skip _ connection _ matrix is added and summed, s1 represents the length of the skip-layer information list corresponding to the variant model, and s2 represents the length of the skip-layer information list corresponding to the history model.
In this still further optional embodiment, the process of performing information coding on the layer jump of the model to obtain the layer jump information list corresponding to the model may be represented as:
S=(Ls1,Ls2,...,LsN)
wherein S represents the jump layer information list of the model, N represents the jump layer number of the model, LsiAnd coding the information of the ith layer jump of the model.
Specifically, for the function dist2 (S1)p,S2q) The description is as follows:
for example, p-th jump layer of the variant model S1pIs the 3 rd layer and the end position is the 7 th layer, the layer jump S1pCorresponding ps=3,pl7-3-4, the qth jump of the historical model S2qIs the 4 th layer and the end position is the 9 th layer, the layer jump S2qCorresponding qs=4,qlWhen 9-4-5, dist2 (S1)p,S2q)=((4-3)+(5-4))/(4+5)=2/9。
It can be seen that, in the embodiment of the further optional embodiment, the layer jump information list is obtained by encoding the layer jump of the model, then the layer jump matrix is constructed according to the layer jump information list, the layer jump matrix is assigned in a specific manner, and finally the layer jump distance between the models is obtained by calculation according to the layer jump matrix, so that the layer jump distance between the two models can be calculated.
206. And scoring each target variant model, and screening a plurality of deep learning models corresponding to each slave node from all target variant models corresponding to the slave node according to the score of each target variant model.
In the step 206, the scoring rule for the target variant model may be consistent with the scoring rule for the variant model, and is not described in detail herein. Here, all the target variant models corresponding to each slave node may be classified according to the target base model on which the target variant model is based, and then the highest-scoring target variant model may be selected from all the target variant models in the classification corresponding to each target base model to form a plurality of deep learning models corresponding to the slave node.
207. And performing training and verification operation on each deep learning model corresponding to each slave node based on the second process of each slave node, the local cloud computing resource of each slave node and the determined hyper-parameter to obtain a training result corresponding to the slave node.
In another optional embodiment, performing a training and verification operation on each deep learning model corresponding to each slave node based on the second process of each slave node, the local cloud computing resource of each slave node, and the determined hyper-parameter to obtain a training result corresponding to the slave node, includes:
drawing up a hyper-parameter space corresponding to each deep learning model aiming at each deep learning model corresponding to each slave node based on the second process of each slave node and the local cloud computing resource of each slave node, wherein the hyper-parameter space at least comprises batch processing quantity and learning rate;
setting the search times corresponding to each deep learning model corresponding to each slave node;
constructing a set corresponding to each deep learning model corresponding to each slave node, wherein the set is used for storing scores of the deep learning models on a verification set after training;
setting an objective function corresponding to each deep learning model corresponding to each slave node according to the following formula:
F=max(SC)
f is a target function, and SC is the score of the deep learning model on the verification set after training;
randomly selecting a starting point in a hyper-parameter space corresponding to each deep learning model corresponding to each slave node, and then circularly searching in the hyper-parameter space through Gaussian process mapping to select a plurality of target hyper-parameters corresponding to the deep learning model, wherein the Gaussian process mapping can be expressed as:
T=G(C、R、F、J)
wherein T is a target hyper-parameter corresponding to the deep learning model, each target hyper-parameter is a value of a hyper-parameter worth trying recommended by G, C is a hyper-parameter space corresponding to the deep learning model, R is a set corresponding to the deep learning model, F is a target function corresponding to the deep learning model, and J is a search frequency corresponding to the deep learning model;
training and verifying the deep learning model based on each target hyper-parameter corresponding to each deep learning model corresponding to each slave node to obtain a plurality of intermediate deep learning models corresponding to the deep learning model and a score corresponding to each intermediate deep learning model;
and selecting the middle deep learning model with the highest score and the score corresponding to the middle deep learning model from all the middle deep learning models corresponding to the slave nodes as the training result corresponding to the slave nodes.
In this alternative embodiment, the predefined hyperparameter space may include a configuration range of the batch number [64, 512], and a configuration range of the learning rate [0.0002, 0.001 ]. The set number of searches defaults to 20. When the set search number is 20, 20 searches are performed in the hyper-parameter space to generate 20 target hyper-parameters. Gaussian process mapping is a common prediction algorithm used here to predict the value of the next hyper-parameter. The score corresponding to each intermediate deep learning model may be the highest precision that the intermediate deep learning model can achieve on the verification set.
Therefore, by implementing the alternative embodiment, a plurality of hyper-parameters can be selected from the hyper-parameter space to perform training and verification operations on the deep learning model, so as to obtain a training result corresponding to each hyper-parameter, and thus, the deep learning model can be trained and verified based on the hyper-parameters.
208. And determining the optimal deep learning model from all the target deep learning models according to the scores included in all the training results.
For the specific description of the above steps 201 and 208, reference may be made to the specific description of the above steps 101 and 103, which is not described in detail here.
It can be seen that, by implementing the exploration method of the deep learning model described in fig. 2, a cloud computing resource is determined as a master node, and a plurality of other cloud computing resources are determined as a plurality of slave nodes, then the training operation of the deep learning model is performed based on each slave node to obtain the training result of the slave node, and finally the optimal deep learning model is determined from all the training results according to the scores of the training results, so that the parallel deep learning model training operation can be realized, the deep learning model training efficiency is improved, the training time is reduced, and the application of the deep learning model exploration technology in a service scene is facilitated. In addition, the variant models are continuously generated, the generated variant models are screened based on model distances, and finally the screened variant models are trained and verified based on different hyper-parameters, so that the model training efficiency can be ensured while the model exploration space is enlarged.
EXAMPLE III
Referring to fig. 3, fig. 3 is a schematic structural diagram of an exploration apparatus of a deep learning model according to an embodiment of the present invention. As shown in fig. 3, the exploring apparatus of the deep learning model may include:
a determining module 301, configured to determine one cloud computing resource as a master node and multiple other cloud computing resources as multiple slave nodes, where the master node is configured to schedule the multiple slave nodes to perform a training operation on the deep learning model;
the training module 302 is configured to perform a training operation on the deep learning model based on each slave node to obtain a training result of the slave node, where the training result of each slave node includes a target deep learning model and a score of the target deep learning model, and the target deep learning model included in the training result of each slave node is a trained deep learning model;
the determining module 301 is further configured to determine an optimal deep learning model from all the target deep learning models according to the scores included in all the training results.
It can be seen that, in the exploration device implementing the deep learning model described in fig. 3, by determining one cloud computing resource as a master node and a plurality of other cloud computing resources as a plurality of slave nodes, then performing the training operation of the deep learning model based on each slave node to obtain the training result of the slave node, and finally determining the optimal deep learning model from all the training results according to the score of the training result, the training operation of the deep learning model can be performed in parallel, the efficiency of deep learning model training is improved, the training time is reduced, and the application of the deep learning model exploration technology in a service scene is facilitated.
In an alternative embodiment, training module 302 includes:
a creating submodule 3021 configured to create a first process and a second process corresponding to each slave node;
a generating submodule 3022, configured to generate, based on the first process of each slave node, a plurality of deep learning models corresponding to the slave node;
the training submodule 3023 is configured to perform training and verification operations on each deep learning model corresponding to each slave node based on the second process of each slave node, the local cloud computing resource of each slave node, and the determined hyper-parameter, so as to obtain a training result corresponding to the slave node.
Therefore, by implementing the exploration device of the deep learning model described in fig. 4, the first process and the second process corresponding to each slave node are generated in each slave node, the new deep learning model is continuously generated through the first process, and the newly generated deep learning model is continuously trained and verified through the second process to obtain the training result, so that the generation and the training of the deep learning model can be parallelized, the generation and the training of the deep learning model do not need to wait for each other, and the training efficiency of the deep learning model is improved.
In this optional embodiment, further optionally, the generating sub-module 3022 includes:
a selecting unit 30221, configured to select, based on a first process of each slave node, multiple historical models from the determined historical model pool as multiple base models corresponding to the slave node, where the historical models are deep learning models that have been generated by all the slave nodes;
the selecting unit 30221 is further configured to select, based on the determined simulated annealing method, a plurality of basic models from all basic models corresponding to each slave node as a plurality of target basic models corresponding to the slave node;
a deforming unit 30222, configured to perform a model deforming operation on each target base model corresponding to each slave node, to obtain a plurality of variant models corresponding to the target base model;
a first screening unit 30223, configured to screen a target variant model corresponding to the target base model from a plurality of variant models corresponding to the target base model;
a second screening unit 30224, configured to score each of the target variant models, and screen a plurality of deep learning models corresponding to each of the slave nodes from all of the target variant models corresponding to the slave node according to the score of each of the target variant models;
the model deformation operation refers to at least one of network structure deepening operation, network structure widening operation and jump layer structure adding operation which are randomly performed on the neural network model.
It can be seen that, by implementing the exploration device for the deep learning model described in fig. 4, through continuously generating the variant models, screening the generated variant models based on the model distance, and finally training and verifying the screened variant models based on different hyper-parameters, the exploration space of the models can be enlarged and the training efficiency of the models can be ensured.
In this further alternative embodiment, yet further alternatively, the first screening unit 30223 includes:
a computation subunit 302231, configured to calculate a model distance between each variant model corresponding to the target base model and each historical model in the historical model pool;
a determining subunit 302232, configured to determine whether there is a matching history model for each variant model corresponding to the target base model, where the matching history model corresponding to the variant model is a history model whose model distance from the variant model is smaller than a preset threshold, and delete the variant model from the multiple variant models corresponding to the target base model when it is determined that there is a matching history model for the variant model;
the selecting subunit 302233 is configured to score each variant model corresponding to the target base model, and select a variant model with the highest score from the multiple variant models corresponding to the target base model as the target variant model corresponding to the target base model.
It can be seen that, by implementing the deep learning model exploration device described in fig. 4, the variant models with too small distance from the historical model are deleted, and the variant model with the highest score is selected from the remaining variant models as the target variant model, so that the target variant model can be screened from the multiple variant models, the subsequent data processing amount is reduced, and the training efficiency is improved.
In this yet further alternative embodiment, yet further optionally, the calculating subunit 302231 includes:
a calculating secondary subunit 3022311, configured to calculate a common layer distance between each variant model corresponding to the target base model and each historical model in the historical model pool;
the secondary sub-unit 3022311 is used for calculating the jump distance between each variant model corresponding to the target base model and each historical model in the historical model pool;
and an adding secondary subunit 3022312, configured to add the common-layer distance and the skip-layer distance of each variant model corresponding to the target base model to each historical model in the historical model pool to obtain the model distance between the variant model and the historical model.
Therefore, by implementing the exploration device of the deep learning model described in fig. 4, the model distance between the two models is obtained by calculating the common layer distance and the jump layer distance between the two models, and the two models can be effectively compared, so that the target variant model can be accurately and effectively screened out.
In this still further optional embodiment, still further optional, the specific way for the second-level subunit 3022311 to calculate the common layer distance between each variant model corresponding to the target base model and each historical model in the historical model pool is as follows:
performing information coding on the common layer of each variant model corresponding to the calculated target base model to obtain a common layer information list corresponding to the variant model, and performing information coding on the common layer of each historical model in the historical model pool to obtain a common layer information list corresponding to the historical model;
constructing a common layer matrix corresponding to the variant model and the historical model according to a common layer information list corresponding to each variant model corresponding to the target base model and a common layer information list corresponding to each historical model in the historical model pool;
assigning values to each variant model corresponding to the target base model and each element of a common layer matrix corresponding to each historical model in the historical model pool in a sequence from left to right and from top to bottom, wherein the formula for assigning each element of the common layer matrix is as follows:
matrixij=min(matrixij+dist(M1i,M2j),matrixi(j-1),matrix(i-1)j)
wherein, matrixijM1, an element representing the ith row and the jth column in the normal layer matrixiRepresents the sameThe ith common layer of variant models, M2jA jth common layer representing the history model;
where the function dist is used to compute M1iAnd M2jDistance between two layers, when M1iAnd M2jIn the case of layers of different types, dist (M1)i,M2j) When M1 is 1iAnd M2jIs the same type of layer, dist (M1)i,M2j) The value of (d) is calculated by:
Figure BDA0002632188850000201
wherein, akIs M1iInformation of the k-th parameter in the information coding of the indicated normal layer, bkIs M2jThe kth parameter information in the information coding of the indicated common layer, and n is the number of the parameter information contained in the information coding;
and taking the element of the lower right corner of the common layer matrix corresponding to each variant model corresponding to the target base model and each historical model in the historical model pool as the common layer distance between the variant model and the historical model.
It can be seen that, by implementing the exploration apparatus for the deep learning model described in fig. 4, a common layer information list is obtained by encoding a common layer of the model, then a common layer matrix is constructed according to the common layer information list, and the common layer matrix is assigned in a specific manner to obtain a common layer distance between the models, so that the common layer distance between the two models can be calculated.
In this still further optional embodiment, still further optional, the specific way for the second-level subunit 3022311 to calculate the skip-layer distance between each variant model corresponding to the target base model and each historical model in the historical model pool is as follows:
carrying out information coding on the layer jump of each variant model corresponding to the target basic model to obtain a layer jump information list corresponding to the variant model, and carrying out information coding on the layer jump of each historical model in a historical model pool to obtain a layer jump information list corresponding to the historical model;
constructing a layer jump matrix corresponding to the variant model and the historical model according to a layer jump information list corresponding to each variant model corresponding to the target basic model and a layer jump information list corresponding to each historical model in a historical model pool, wherein the row number of the layer jump matrix is the length of the layer jump information list corresponding to the variant model, and the column number of the layer jump matrix is the length of the layer jump information list corresponding to the historical model;
assigning values to each element of the layer jump matrix corresponding to each variety model corresponding to the target base model and each historical model in the historical model pool according to the following formula:
skip_connection_matrixpq=dist2(S1p,S2q)
wherein the skip _ connection _ matrix represents the skip layer matrix, and the skip _ connection _ matrix represents the skip layer matrixpqRepresenting the elements of the p-th row and q-th column in the layer-hopping matrix, S1pP-th jump representing the variant model, S2qRepresenting the qth hop level of the historical model;
wherein the function dist2 is used to calculate S1pAnd S2qDistance between two layers when S1pAnd S2qIs a different type of layer, dist2 (S1)p,S2q) When S1 is 1pAnd S2qIs the same type of layer, dist2 (S1)p,S2q) The value of (d) is calculated by:
Figure BDA0002632188850000211
wherein p issRepresenting the layer position index of the starting layer in the p-th layer jump in the variant model, qsAn index of layer positions in the model representing the starting layer in the q-th layer jump in the historical model, plRepresenting the depth of the p-th layer jump in the variant model, qlRepresenting the depth of a q-th layer jump in the historical model;
calculating the jump layer distance between each variant model corresponding to the target base model and each historical model in the historical model pool according to the following formula:
dists=sum(skip_connection_matrix)+|s1-s2|
wherein, distsRepresents the skip-layer distance between the variant model and the history model, sum (skip _ connection _ matrix) represents that each element in a skip-layer matrix skip _ connection _ matrix is added and summed, s1 represents the length of the skip-layer information list corresponding to the variant model, and s2 represents the length of the skip-layer information list corresponding to the history model.
It can be seen that, by implementing the exploration device of the deep learning model described in fig. 4, a layer jump information list is obtained by encoding the layer jump of the model, then a layer jump matrix is constructed according to the layer jump information list, the layer jump matrix is assigned in a specific manner, and finally the layer jump distance between the models is obtained by calculation according to the layer jump matrix, so that the layer jump distance between the two models can be calculated.
In another optional embodiment, the training sub-module 3023 performs training and verification operations on each deep learning model corresponding to each slave node based on the second process of each slave node, the local cloud computing resource of each slave node, and the determined hyper-parameter, and a specific manner of obtaining a training result corresponding to the slave node is as follows:
drawing up a hyper-parameter space corresponding to each deep learning model aiming at each deep learning model corresponding to each slave node based on the second process of each slave node and the local cloud computing resource of each slave node, wherein the hyper-parameter space at least comprises batch processing quantity and learning rate;
setting the search times corresponding to each deep learning model corresponding to each slave node;
constructing a set corresponding to each deep learning model corresponding to each slave node, wherein the set is used for storing scores of the deep learning models on a verification set after training;
setting an objective function corresponding to each deep learning model corresponding to each slave node according to the following formula:
F=max(SC)
f is a target function, and SC is the score of the deep learning model on the verification set after training;
randomly selecting a starting point in a hyper-parameter space corresponding to each deep learning model corresponding to each slave node, and then circularly searching in the hyper-parameter space through Gaussian process mapping to select a plurality of target hyper-parameters corresponding to the deep learning model, wherein the Gaussian process mapping can be expressed as:
T=G(C、R、F、J)
wherein T is a target hyper-parameter corresponding to the deep learning model, each target hyper-parameter is a value of a hyper-parameter worth trying recommended by G, C is a hyper-parameter space corresponding to the deep learning model, R is a set corresponding to the deep learning model, F is a target function corresponding to the deep learning model, and J is a search frequency corresponding to the deep learning model;
training and verifying the deep learning model based on each target hyper-parameter corresponding to each deep learning model corresponding to each slave node to obtain a plurality of intermediate deep learning models corresponding to the deep learning model and a score corresponding to each intermediate deep learning model;
and selecting the middle deep learning model with the highest score and the score corresponding to the middle deep learning model from all the middle deep learning models corresponding to the slave nodes as the training result corresponding to the slave nodes.
It can be seen that, by implementing the exploration device for the deep learning model described in fig. 4, a plurality of hyper-parameters can be selected from the hyper-parameter space to perform training and verification operations on the deep learning model, so as to obtain a training result corresponding to each hyper-parameter, and thus, the training and verification of the deep learning model performed based on the hyper-parameters can be realized.
For the specific description of the exploration device of the deep learning model, reference may be made to the specific description of the exploration method of the deep learning model, which is not described in detail herein.
It can be seen that, in the exploration device implementing the deep learning model described in fig. 4, by determining one cloud computing resource as a master node and a plurality of other cloud computing resources as a plurality of slave nodes, then performing the training operation of the deep learning model based on each slave node to obtain the training result of the slave node, and finally determining the optimal deep learning model from all the training results according to the score of the training result, the training operation of the deep learning model can be performed in parallel, the efficiency of deep learning model training is improved, the training time is reduced, and the application of the deep learning model exploration technology in a service scene is facilitated. In addition, the first process and the second process corresponding to each slave node are generated in each slave node, so that the generation and the training of the deep learning model are parallelized, the generation and the training of the deep learning model do not need to wait for each other, and the training efficiency of the deep learning model is improved. In addition, the variant models are continuously generated, the generated variant models are screened based on model distances, and finally the screened variant models are trained and verified based on different hyper-parameters, so that the model training efficiency can be ensured while the model exploration space is enlarged.
EXAMPLE III
Referring to fig. 5, fig. 5 is a schematic structural diagram of an exploration apparatus of another deep learning model according to an embodiment of the present invention. As shown in fig. 5, the apparatus may include:
a memory 501 in which executable program code is stored;
a processor 502 coupled to a memory 501;
the processor 502 calls the executable program code stored in the memory 501 for executing the exploration method of the deep learning model described in the first embodiment or the second embodiment.
Example four
The embodiment of the invention discloses a computer-readable storage medium which stores a computer program for electronic data exchange, wherein the computer program enables a computer to execute the exploration method of the deep learning model described in the first embodiment or the second embodiment.
EXAMPLE five
An embodiment of the present invention discloses a computer program product, which includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute the method for exploring a deep learning model described in the first embodiment or the second embodiment.
The above-described embodiments of the apparatus are merely illustrative, and the modules described as separate components may or may not be physically separate, and the components shown as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above detailed description of the embodiments, those skilled in the art will clearly understand that the embodiments may be implemented by software plus a necessary general hardware platform, and may also be implemented by hardware. Based on such understanding, the above technical solutions may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, where the storage medium includes a Read-Only Memory (ROM), a Random Access Memory (RAM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (EPROM), a One-time Programmable Read-Only Memory (OTPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc-Read-Only Memory (CD-ROM), or other disk memories, CD-ROMs, or other magnetic disks, A tape memory, or any other medium readable by a computer that can be used to carry or store data.
Finally, it should be noted that: the exploration method and device of the deep learning model disclosed in the embodiments of the present invention are only the preferred embodiments of the present invention, and are only used for illustrating the technical solutions of the present invention, not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art; the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for exploring a deep learning model, the method comprising:
determining one cloud computing resource as a master node and a plurality of other cloud computing resources as a plurality of slave nodes, wherein the master node is used for scheduling the slave nodes to execute training operation on the deep learning model;
executing a training operation on the deep learning model based on each slave node to obtain a training result of the slave node, wherein the training result of each slave node comprises a target deep learning model and a score of the target deep learning model, and the target deep learning model included in the training result of each slave node is a trained deep learning model;
and determining an optimal deep learning model from all the target deep learning models according to the scores included in all the training results.
2. The method for exploring a deep learning model according to claim 1, wherein said performing a training operation on the deep learning model based on each slave node to obtain a training result of the slave node comprises:
creating a first process and a second process corresponding to each slave node, and generating a plurality of deep learning models corresponding to each slave node based on the first process of each slave node;
and performing training and verification operation on each deep learning model corresponding to each slave node based on the second process of each slave node, the local cloud computing resource of each slave node and the determined hyper-parameter to obtain a training result corresponding to the slave node.
3. The method for exploring a deep learning model according to claim 2, wherein the generating a plurality of deep learning models corresponding to each slave node based on the first process of the slave node includes:
based on the first process of each slave node, selecting a plurality of historical models from the determined historical model pool as a plurality of base models corresponding to the slave node, wherein the historical models are deep learning models generated by all the slave nodes;
selecting a plurality of basic models from all the basic models corresponding to each slave node based on the determined simulated annealing method as a plurality of target basic models corresponding to the slave node;
executing model deformation operation on each target base model corresponding to each slave node to obtain a plurality of variant models corresponding to the target base model, and screening out a target variant model corresponding to the target base model from the plurality of variant models corresponding to the target base model;
scoring each target variant model, and screening a plurality of deep learning models corresponding to the slave nodes from all the target variant models corresponding to the slave nodes according to the score of each target variant model;
the model deformation operation refers to at least one of network structure deepening operation, network structure widening operation and jump layer structure adding operation which are randomly performed on the neural network model.
4. The method for exploring a deep learning model as claimed in claim 3, wherein said step of screening out a target variant model corresponding to the target base model from a plurality of said variant models corresponding to the target base model comprises:
calculating the model distance between each variant model corresponding to the target base model and each historical model in the historical model pool;
judging whether each variant model corresponding to the target base model has a matching historical model, wherein the matching historical model corresponding to the variant model is the historical model of which the model distance from the variant model is smaller than a preset threshold value, and deleting the variant model from a plurality of variant models corresponding to the target base model when the variant model is judged to have the matching historical model;
and scoring each variant model corresponding to the target base model, and selecting the variant model with the highest score from the plurality of variant models corresponding to the target base model as the target variant model corresponding to the target base model.
5. The method for exploring a deep learning model as claimed in claim 4, wherein said calculating a model distance between each said variant model corresponding to the target base model and each said historical model in said historical model pool comprises:
calculating the distance between each variant model corresponding to the target base model and each historical model in the historical model pool;
calculating the layer jump distance between each variant model corresponding to the target base model and each historical model in the historical model pool;
and adding the common layer distance and the jump layer distance of each variant model corresponding to the target base model and each historical model in the historical model pool to serve as the model distance of the variant model and the historical model.
6. The method for exploring a deep learning model as claimed in claim 5, wherein said calculating a distance between a common layer of each said variant model corresponding to the target base model and each said historical model in said historical model pool comprises:
performing information coding on the common layer of each variant model corresponding to the calculated target base model to obtain a common layer information list corresponding to the variant model, and performing information coding on the common layer of each historical model in the historical model pool to obtain a common layer information list corresponding to the historical model;
constructing a common layer matrix corresponding to the variant model and the historical model according to a common layer information list corresponding to each variant model corresponding to the target base model and a common layer information list corresponding to each historical model in the historical model pool;
assigning values to each variant model corresponding to the target base model and each element of the common layer matrix corresponding to each historical model in the historical model pool from left to right and from top to bottom, wherein the formula for assigning each element of the common layer matrix is as follows:
matrixij=min(matrixij+dist(M1i,M2j),matrixi(j-1),matrix(i-1)j)
wherein, matrixijM1, an element representing the ith row and the jth column in the normal layer matrixiThe i-th common layer, M2, representing the variant modeljA jth common layer representing the history model;
where the function dist is used to compute M1iAnd M2jDistance between two layers, when M1iAnd M2jIn the case of layers of different types, dist (M1)i,M2j) When M1 is 1iAnd M2jIs the same type of layer, dist (M1)i,M2j) The value of (d) is calculated by:
Figure FDA0002632188840000031
wherein, akIs M1iInformation of the k-th parameter in the information coding of the indicated normal layer, bkIs M2jThe kth parameter information in the information coding of the indicated common layer, and n is the number of the parameter information contained in the information coding;
and taking the element of the lower right corner of the common layer matrix corresponding to each variant model corresponding to the target base model and each historical model in the historical model pool as the common layer distance between the variant model and the historical model.
7. The method for exploring a deep learning model according to claim 5 or 6, wherein said calculating a jump distance between each said variant model corresponding to the target base model and each said historical model in said historical model pool comprises:
carrying out information coding on the jump layer of each variant model corresponding to the target basic model to obtain a jump layer information list corresponding to the variant model, and carrying out information coding on the jump layer of each historical model in the historical model pool to obtain a jump layer information list corresponding to the historical model;
constructing a layer jump matrix corresponding to the variant model and the historical model according to a layer jump information list corresponding to each variant model corresponding to the target base model and a layer jump information list corresponding to each historical model in the historical model pool, wherein the row number of the layer jump matrix is the length of the layer jump information list corresponding to the variant model, and the column number of the layer jump matrix is the length of the layer jump information list corresponding to the historical model;
assigning a value to each element of the layer jump matrix corresponding to each of the variety models corresponding to the target base model and each of the historical models in the historical model pool according to the following formula:
skip_connection_matrixpq=dist2(S1p,S2q)
wherein the skip _ connection _ matrix represents the skip layer matrix, and the skip _ connection _ matrix represents the skip layer matrixpqRepresenting the elements of the p-th row and q-th column in the layer-hopping matrix, S1pP-th jump representing the variant model, S2qRepresenting the qth hop level of the historical model;
wherein the function dist2 is used to calculate S1pAnd S2qDistance between two layers when S1pAnd S2qIs a different type of layer, dist2 (S1)p,S2q) When S1 is 1pAnd S2qIs the same type of layer, dist2 (S1)p,S2q) The value of (d) is calculated by:
Figure FDA0002632188840000032
wherein p issRepresenting the layer position index of the starting layer in the p-th layer jump in the variant model, qsAn index of layer positions in the model representing the starting layer in the q-th layer jump in the historical model, plRepresenting the depth of the p-th layer jump in the variant model, qlRepresenting the depth of a q-th layer jump in the historical model;
calculating the jump layer distance between each variant model corresponding to the target base model and each historical model in the historical model pool according to the following formula:
dists=sum(skip_connection_matrix)+|s1-s2|
wherein, distsRepresents the skip-layer distance between the variant model and the history model, sum (skip _ connection _ matrix) represents that each element in a skip-layer matrix skip _ connection _ matrix is added and summed, s1 represents the length of the skip-layer information list corresponding to the variant model, and s2 represents the length of the skip-layer information list corresponding to the history model.
8. The method for exploring a deep learning model according to any one of claims 2 to 7, wherein the performing a training and verification operation on each deep learning model corresponding to each slave node based on the second process of each slave node, the local cloud computing resource of each slave node and the determined hyper-parameter to obtain a training result corresponding to the slave node comprises:
drawing up a hyper-parameter space corresponding to each deep learning model aiming at each deep learning model corresponding to each slave node based on the second process of each slave node and the local cloud computing resource of each slave node, wherein the hyper-parameter space at least comprises batch processing quantity and learning rate;
setting the number of searching times corresponding to each deep learning model corresponding to each slave node;
constructing a set corresponding to each deep learning model corresponding to each slave node, wherein the set is used for storing scores of the deep learning models on a verification set after training;
setting an objective function corresponding to each deep learning model corresponding to each slave node according to the following formula:
F=max(SC)
f is a target function, and SC is the score of the deep learning model on the verification set after training;
randomly selecting a starting point in the hyper-parameter space corresponding to each deep learning model corresponding to each slave node, and then circularly searching in the hyper-parameter space through Gaussian process mapping to select a plurality of target hyper-parameters corresponding to the deep learning model, wherein the Gaussian process mapping can be expressed as:
T=G(C、R、F、J)
the method comprises the following steps that T is a target hyper-parameter corresponding to the deep learning model, each target hyper-parameter is a value of a hyper-parameter worth trying recommended by G, C is a hyper-parameter space corresponding to the deep learning model, R is a set corresponding to the deep learning model, F is a target function corresponding to the deep learning model, and J is the number of search times corresponding to the deep learning model;
training and verifying the deep learning model based on each target hyper-parameter corresponding to each deep learning model corresponding to each slave node to obtain a plurality of intermediate deep learning models corresponding to the deep learning model and a score corresponding to each intermediate deep learning model;
and selecting the intermediate deep learning model with the highest score and the score corresponding to the intermediate deep learning model from all the intermediate deep learning models corresponding to the slave nodes as training results corresponding to the slave nodes.
9. An apparatus for exploring a deep learning model, the apparatus comprising:
the determining module is used for determining one cloud computing resource as a master node and a plurality of other cloud computing resources as a plurality of slave nodes, wherein the master node is used for scheduling the slave nodes to execute training operation on the deep learning model;
the training module is used for executing training operation on the deep learning model based on each slave node to obtain a training result of the slave node, the training result of each slave node comprises a target deep learning model and the score of the target deep learning model, and the target deep learning model included in the training result of each slave node is a trained deep learning model;
the determining module is further configured to determine an optimal deep learning model from all the target deep learning models according to the scores included in all the training results.
10. An apparatus for exploring a deep learning model, the apparatus comprising:
a memory storing executable program code;
a processor coupled with the memory;
the processor calls the executable program code stored in the memory to execute the exploration method of the deep learning model according to any one of claims 1-8.
CN202010814501.9A 2020-08-13 2020-08-13 Deep learning model exploration method and device Active CN111931916B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010814501.9A CN111931916B (en) 2020-08-13 2020-08-13 Deep learning model exploration method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010814501.9A CN111931916B (en) 2020-08-13 2020-08-13 Deep learning model exploration method and device

Publications (2)

Publication Number Publication Date
CN111931916A true CN111931916A (en) 2020-11-13
CN111931916B CN111931916B (en) 2024-08-02

Family

ID=73311279

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010814501.9A Active CN111931916B (en) 2020-08-13 2020-08-13 Deep learning model exploration method and device

Country Status (1)

Country Link
CN (1) CN111931916B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113010312A (en) * 2021-03-11 2021-06-22 山东英信计算机技术有限公司 Hyper-parameter tuning method, device and storage medium
CN113239635A (en) * 2021-06-16 2021-08-10 中国银行股份有限公司 Model evaluation method and device
CN114004358A (en) * 2021-12-29 2022-02-01 粤港澳大湾区数字经济研究院(福田) Deep learning model training method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650925A (en) * 2016-11-29 2017-05-10 郑州云海信息技术有限公司 Deep learning framework Caffe system and algorithm based on MIC cluster
WO2017128961A1 (en) * 2016-01-30 2017-08-03 华为技术有限公司 Method and device for training model in distributed system
CN108229528A (en) * 2017-08-16 2018-06-29 北京市商汤科技开发有限公司 Clustering Model training method and device, electronic equipment, computer storage media
US20190235484A1 (en) * 2018-01-31 2019-08-01 Hitachi, Ltd. Deep learning architecture for maintenance predictions with multiple modes
WO2019182590A1 (en) * 2018-03-21 2019-09-26 Visa International Service Association Automated machine learning systems and methods

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017128961A1 (en) * 2016-01-30 2017-08-03 华为技术有限公司 Method and device for training model in distributed system
CN106650925A (en) * 2016-11-29 2017-05-10 郑州云海信息技术有限公司 Deep learning framework Caffe system and algorithm based on MIC cluster
CN108229528A (en) * 2017-08-16 2018-06-29 北京市商汤科技开发有限公司 Clustering Model training method and device, electronic equipment, computer storage media
US20190235484A1 (en) * 2018-01-31 2019-08-01 Hitachi, Ltd. Deep learning architecture for maintenance predictions with multiple modes
WO2019182590A1 (en) * 2018-03-21 2019-09-26 Visa International Service Association Automated machine learning systems and methods

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113010312A (en) * 2021-03-11 2021-06-22 山东英信计算机技术有限公司 Hyper-parameter tuning method, device and storage medium
CN113010312B (en) * 2021-03-11 2024-01-23 山东英信计算机技术有限公司 Super-parameter tuning method, device and storage medium
CN113239635A (en) * 2021-06-16 2021-08-10 中国银行股份有限公司 Model evaluation method and device
CN114004358A (en) * 2021-12-29 2022-02-01 粤港澳大湾区数字经济研究院(福田) Deep learning model training method
CN114004358B (en) * 2021-12-29 2022-06-14 粤港澳大湾区数字经济研究院(福田) Deep learning model training method

Also Published As

Publication number Publication date
CN111931916B (en) 2024-08-02

Similar Documents

Publication Publication Date Title
US11651259B2 (en) Neural architecture search for convolutional neural networks
CN110366734B (en) Optimizing neural network architecture
CN108304440B (en) Game pushing method and device, computer equipment and storage medium
CN111931916B (en) Deep learning model exploration method and device
CN111382868B (en) Neural network structure searching method and device
CN110046706B (en) Model generation method and device and server
CN110728317A (en) Training method and system of decision tree model, storage medium and prediction method
CN111445008A (en) Knowledge distillation-based neural network searching method and system
CN115755954B (en) Routing inspection path planning method, system, computer equipment and storage medium
JP7542793B2 (en) Method and system for lightweight artificial intelligence inference models
CN111428854A (en) Structure searching method and structure searching device
CN112785005A (en) Multi-target task assistant decision-making method and device, computer equipment and medium
WO2021137910A2 (en) Computer architecture for resource allocation for course of action activities
CN110263136B (en) Method and device for pushing object to user based on reinforcement learning model
CN115858919A (en) Learning resource recommendation method and system based on project field knowledge and user comments
CN110222824B (en) Intelligent algorithm model autonomous generation and evolution method, system and device
US20230051955A1 (en) System and Method For Regularized Evolutionary Population-Based Training
KR20220032861A (en) Neural architecture search method and attaratus considering performance in hardware
CN113449176A (en) Recommendation method and device based on knowledge graph
CN110705889A (en) Enterprise screening method, device, equipment and storage medium
CN112417304B (en) Data analysis service recommendation method and system for constructing data analysis flow
US11478715B1 (en) User-controllable model-driven matchmaking
CN114219964A (en) Neural network architecture searching method and device, electronic equipment and storage medium
CN114443986A (en) Sorting method and device, sorting model training method and device, and electronic equipment
CN110879952B (en) Video frame sequence processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant