CN115545171A - Searching method, device, equipment and storage medium of neural network structure - Google Patents

Searching method, device, equipment and storage medium of neural network structure Download PDF

Info

Publication number
CN115545171A
CN115545171A CN202211387018.2A CN202211387018A CN115545171A CN 115545171 A CN115545171 A CN 115545171A CN 202211387018 A CN202211387018 A CN 202211387018A CN 115545171 A CN115545171 A CN 115545171A
Authority
CN
China
Prior art keywords
neural network
network structure
neuron
channel
search space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211387018.2A
Other languages
Chinese (zh)
Inventor
曹俊豪
张立平
王希予
裴积全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Information Technology Co Ltd
Original Assignee
Jingdong Technology Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Technology Information Technology Co Ltd filed Critical Jingdong Technology Information Technology Co Ltd
Priority to CN202211387018.2A priority Critical patent/CN115545171A/en
Publication of CN115545171A publication Critical patent/CN115545171A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a searching method, a device, equipment and a storage medium of a neural network structure, wherein the method comprises the following steps: generating an initial neural network structure based on the training attribute parameters in response to the training attribute parameters of the obtained training data set; executing a structure growing operation on the initial neural network structure to generate a target search space; the target search space comprises at least two first neural network structures, and the number of neurons in each first neural network structure is different and/or the number of channels of the neurons is different; screening each first neural network structure based on verification performance parameters respectively corresponding to each first neural network structure in the target search space to obtain a second neural network structure; and determining a target neural network structure based on the second neural network structure and the preset search condition. The embodiment of the invention solves the problem of overlarge traditional search space range and reduces the consumption of computing resources in the search process of the neural network.

Description

Searching method, device, equipment and storage medium of neural network structure
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a searching method, a searching device, searching equipment and a searching storage medium of a neural network structure.
Background
Neural Network Architecture Search (NAS) is a method for transforming the manual design process of a neural Network Architecture into an automatic optimization process. Specifically, the controller obtains a plurality of candidate neural network structures from a pre-constructed search space, the architecture optimization module ranks the candidate neural network structures according to training results corresponding to the candidate neural network structures respectively, and sequencing information is fed back to the controller, so that the controller optimizes a selection strategy of the candidate neural network structures. And when the loop termination condition is reached, taking the candidate neural network structure with the optimal training result as a target neural network structure.
In the process of implementing the invention, at least the following technical problems are found in the prior art:
the pre-constructed search space needs to contain all possible combinations of target neural network structures, so that the range of the traditional search space is often large, and a large amount of computing resources are consumed when the controller selects a candidate neural network structure from the search space each time. For example, when a traditional NAS method searches for a neural network structure based on a CIFAR10 dataset, 2000 GPUs are required to operate simultaneously for one day to obtain a target neural network structure. Therefore, the traditional neural network structure searching method has the disadvantages of high consumption of computing resources and low searching efficiency.
Disclosure of Invention
The embodiment of the invention provides a searching method, a searching device, searching equipment and a searching storage medium of a neural network structure, which are used for solving the problem of overlarge traditional searching space range, reducing the consumption of computing resources in the searching process of the neural network and improving the searching efficiency of the neural network structure.
According to an embodiment of the present invention, there is provided a method for searching a neural network structure, the method including:
responding to a training attribute parameter of an obtained training data set, and generating an initial neural network structure based on the training attribute parameter;
executing a structure growing operation on the initial neural network structure to generate a target search space; the target search space comprises at least two first neural network structures, and the number of neurons and/or the number of channels of the neurons in each first neural network structure are different;
screening each first neural network structure based on verification performance parameters respectively corresponding to each first neural network structure in the target search space to obtain a second neural network structure;
and determining a target neural network structure based on the second neural network structure and a preset search condition.
According to another embodiment of the present invention, there is provided a neural network structure search apparatus including:
the initial neural network structure generating module is used for responding to the training attribute parameters of the obtained training data set and generating an initial neural network structure based on the training attribute parameters;
the target search space generation module is used for executing structure growth operation on the initial neural network structure to generate a target search space; the target search space comprises at least two first neural network structures, and the number of neurons and/or the number of channels of the neurons in each first neural network structure are different;
a second neural network structure determining module, configured to screen each first neural network structure in the target search space based on a verification performance parameter corresponding to each first neural network structure, so as to obtain a second neural network structure;
and the target neural network structure determining module is used for determining the target neural network structure based on the second neural network structure and a preset search condition.
According to another embodiment of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform a method of searching for a neural network structure according to any of the embodiments of the present invention.
According to another embodiment of the present invention, a computer-readable storage medium is provided, which stores computer instructions for causing a processor to implement a searching method of a neural network structure according to any one of the embodiments of the present invention when the computer instructions are executed.
According to the technical scheme of the embodiment of the invention, the target search space is generated by executing structure growth operation on the initial neural network structure generated based on the training attribute parameters, wherein the target search space comprises at least two first neural network structures, each first neural network structure is screened based on the verification performance parameter corresponding to each first neural network structure in the target search space to obtain a second neural network structure, and the target neural network structure is determined based on the second neural network structure and the preset search condition to generate the target search space in the search process of the neural network structure, so that the problem that the traditional search space needs to be artificially constructed in advance is solved, and the range of the generated target search space is far smaller than the range of the traditional search space due to the fact that the target search space is generated based on the initial neural network structure, so that the calculation resource consumption in the neural network search process is reduced, and the search efficiency of the neural network structure is improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a searching method of a neural network structure according to an embodiment of the present invention;
FIG. 2A is a diagram illustrating an initial neural network architecture, according to an embodiment of the present invention;
FIG. 2B is a flowchart illustrating an embodiment of a method for searching a neural network structure according to the present invention;
FIG. 2C is a diagram illustrating a target search space according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a third neural network architecture according to an embodiment of the present invention;
FIG. 4 is a flow chart of another method for searching a neural network structure provided by an embodiment of the present invention;
FIG. 5A is a schematic view of a channel splitting structure according to an embodiment of the present invention;
FIG. 5B is a schematic diagram of a structure of a neuron split according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a searching apparatus of a neural network structure according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Fig. 1 is a flowchart of a method for searching a neural network structure according to an embodiment of the present invention, where the embodiment is applicable to a case where a neural network structure is searched, and the method may be executed by a searching apparatus of the neural network structure, where the searching apparatus of the neural network structure may be implemented in a form of hardware and/or software, and the searching apparatus of the neural network structure may be configured in a terminal device. As shown in fig. 1, the method includes:
s110, responding to the obtained training attribute parameters of the training data set, and generating an initial neural network structure based on the training attribute parameters.
Specifically, the training data set is used to represent a data set that needs to be processed through a neural network structure, and for example, the category of the training data set may be an image data set, and the training task to which the image data set belongs may be image classification, target detection, or image segmentation, and the like.
Specifically, the training attribute parameters may be used to characterize input attribute parameters and output attribute parameters of the training data set. In one embodiment, when the class of the training data set is an image data set and the belonging training task is image classification, the training attribute parameters include 3 and the number of classifications, where "3" represents the input attribute parameters of the image data set, including R, G and three channels B, and the "number of classifications" represents the output attribute parameters of the image data set. If the number of image classifications of the cifar10 dataset is 10 when the image dataset is the cifar10 dataset, the output attribute parameter is 10.
Unless otherwise mentioned, embodiments of the present invention are described with neurons and their concatenated BatchNorm and nonlinear functions taken as a whole, collectively referred to as neurons.
Specifically, the initial neural network structure generated based on the training attribute parameters includes a global pooling layer (global boosting) and a neuron (not including BatchNorm and a nonlinear function). Wherein, the initial neural network structure can be expressed as network 1 The neuron can be expressed as
Figure BDA0003930351740000051
Wherein ri 1 、ro 1 、ci 1 And co 1 The input resolution, the output resolution, the number of input channels and the number of output channels of the neuron are respectively represented. Illustratively, when the training dataset is a cifar10 dataset, the training attribute parameters of the training dataset include an input attribute parameter =3 and an output attribute parameter =10, and accordingly, the neuron is
Figure BDA0003930351740000061
Ci of 1 =3 and co 1 =10, and set neuron in advance
Figure BDA0003930351740000062
Ri of 1 =ro 1 =1*1, convolution kernel matrix is 1*1, and its initialization weight is 0. The global pooling layer is configured to enable the resolution of the image data input into the initial neural network structure to meet the requirement of the initial neural network structure on the resolution, and illustratively, the resolution of the image data is 32 × 32, and then after the image data passes through the global pooling layer, the resolution of the output data output by the global pooling layer is 1*1.
Fig. 2A is a schematic diagram of an initial neural network structure according to an embodiment of the present invention. Specifically, the initial neural network structure includes a global pooling layerAnd neurons
Figure BDA0003930351740000063
Wherein ri 1 =ro 1 =1*1,ci 1 =3,co 1 =10. The input attribute parameters of the image data input to the initial neural network structure are 3, the resolution is 32 × 32, the output attribute parameters of the output data output by the global pooling layer are 3, the resolution is 3 × 1, and the neurons are connected to the output of the global pooling layer
Figure BDA0003930351740000064
The resolution of the output data is 1*1, and the output class is 10.
And S120, performing structure growing operation on the initial neural network structure to generate a target search space.
In this embodiment, the target search space includes at least two first neural network structures, and the number of neurons and/or the number of channels of neurons in each first neural network structure are different.
In one embodiment, specifically, performing a structure growing operation on the initial neural network structure to generate the target search space includes: executing stage growth operation on the initial neural network structure to obtain a third neural network structure; wherein the stage growing operation comprises an operation of adding new neurons in the initial neural network structure; performing a deep growing operation on the third neural network structure to generate a target search space; wherein the deep growing operation comprises an operation of splitting the neurons and channels in the third neural network structure.
Specifically, in the stage growth operation, the initial resolution of the newly added neuron in the third neural network structure is a preset multiple of the resolution of the next neuron connected to the newly added neuron, wherein the preset multiple is 2 times or 3 times, for example. After the stage growing operation, the difference between the number of neurons in the third neural network structure and the number of neurons in the initial neural network structure is 1. It can be understood that, after a new neuron is added to the initial neural network structure, the resolutions of all neurons before the new neuron is added are correspondingly increased by a preset multiple, so as to obtain a third neural network structure. In this embodiment, the purpose of the stage growing operation is to increase the resolution of each neuron in the third neural network structure.
Specifically, a copy operation is performed on the third neural network structures to obtain at least two third neural network structures, and a deep growth operation is performed on each third neural network structure to obtain a target search space network s ' N ={network i ' 1 ,network i ' 2 ...network i ' j ...network i ' N Therein, network i ' j Representing and copying the j-th copy of the third neural network structure network ij A corresponding first neural network structure.
In the deep growing operation, for each first neural network structure in the target search space, the number of neurons in the first neural network structure is greater than the number of neurons in the third neural network structure, or the number of channels of a neuron in the first neural network structure is greater than the number of channels of the neuron in the third neural network structure. It can be understood that, when the number of channels of the input channel of the neuron a in the third neural network structure increases, the number of channels of the output channel of the previous neuron connected to the neuron a correspondingly increases, and when the number of channels of the output channel of the neuron a in the third neural network structure increases, the number of channels of the input channel of the next neuron connected to the neuron a correspondingly increases. In this embodiment, based on the third neural network structure, the growth of the neuron dimension or the channel dimension is performed to construct the target search space, wherein the network structure of each first neural network structure in the target search space becomes wider and deeper than the third neural network structure.
S130, screening the first neural network structures based on the verification performance parameters respectively corresponding to the first neural network structures in the target search space to obtain a second neural network structure.
Specifically, based on the verification data set, verification performance parameters corresponding to at least two first neural network structures in the target search space are determined, and for example, the verification performance parameters include but are not limited to at least one of accuracy, precision and recall. Specifically, the second neural network structure is a first neural network structure with the highest verification performance parameter in the target search space.
And S140, determining a target neural network structure based on the second neural network structure and the preset search condition.
In one embodiment, specifically, the determining the target neural network structure based on the second neural network structure and the preset search condition includes: if the training performance parameters of the second neural network structure meet the preset search conditions, taking the second neural network structure as a target neural network structure; and if the training performance parameters of the second neural network structure do not meet the preset search conditions, taking the second neural network structure as the initial neural network structure, and returning to execute the structure growth operation on the initial neural network structure to generate a target search space.
Wherein, specifically, based on the training data set, training performance parameters of the second neural network structure are determined, illustratively, the training performance parameters include, but are not limited to, at least one of accuracy, precision, and recall. The preset search condition comprises that the training performance parameter is greater than a preset parameter threshold, for example, if the training performance parameter is correct, the preset parameter threshold is 98%, and if the correct rate is greater than 98%, the training performance parameter meets the preset search condition, the second neural network structure is taken as a target neural network structure, and the search process of the neural network structure is completed; and if the accuracy is less than or equal to 98%, the training performance parameters do not meet the preset search conditions, taking the second neural network structure as the initial neural network structure, and returning to execute the step of executing the structure growing operation on the initial neural network structure to generate the target search space.
In one embodiment, specifically, a controller (e.g., a GPU) obtains training attribute parameters of a training data set received by an input device, generates an initial neural network structure based on the training attribute parameters, and performs a structure growing operation on the initial neural network structure to generate a target search space. The controller sends the target search space to a Central Processing Unit (CPU), the CPU screens all first neural network structures based on verification performance parameters corresponding to all first neural network structures in the target search space to obtain a second neural network structure, the second neural network structure is sent to a condition judgment device, the condition judgment device judges whether the training performance parameters of the second neural network structure meet preset search conditions or not based on a training data set, if yes, the second neural network structure is used as the target neural network structure, and the target neural network structure is sent to an output device, so that the output device outputs the target neural network structure. If not, the second neural network structure is used as the initial neural network structure, the initial neural network structure is sent to the controller, so that the controller executes structure growth operation on the received initial neural network structure to generate a target search space, and the steps are executed iteratively until the training performance parameters of the second neural network structure received by the condition judgment device meet the preset search conditions.
In one embodiment, specifically, the step of returning to perform the structure growing operation on the initial neural network structure to generate the target search space includes: judging whether the saturation parameters of the initial neural network structure meet preset saturation conditions or not; if yes, returning to the step of executing the stage growth operation on the initial neural network structure to obtain a third neural network structure; if not, the initial neural network structure is used as a third neural network structure, and the step of executing the deep growing operation on the third neural network structure to generate the target search space is returned.
The preset saturation condition comprises that the current verification accuracy of the initial neural network structure is smaller than the last verification accuracy corresponding to the initial neural network structure obtained by last iteration, and the number of the stage neuron sets in the initial neural network structure is smaller than the sum of the logarithm value of the image size corresponding to the training data set and the first value.
Wherein, in particular, based on the validation dataset, a current validation accuracy of the initial neural network structure is determined. At the initial neural netIn the structure of the collaterals, at least two neurons with the same input resolution form a stage neuron set. Wherein, for example, the first value is 1, and correspondingly, the number Q of stage neuron sets in the preset saturation condition satisfies the formula Q<[log 2 (r)]+1, where r represents the image size corresponding to the training data set.
Fig. 2B is a flowchart of an embodiment of a method for searching a neural network structure according to an embodiment of the present invention. Specifically, an initial neural network structure is generated based on training attribute parameters of a training data set, stage growth operation is performed on the initial neural network structure to obtain a third neural network structure, deep growth operation is performed on the third neural network structure to obtain a target search space, and each first neural network structure is screened based on verification performance parameters corresponding to each first neural network structure in the target search space to obtain a second neural network structure. Judging whether the training performance parameters of the second neural network structure meet preset search conditions, if not, taking the second neural network structure as the initial neural network structure, judging whether the training performance parameters of the initial neural network structure meet preset saturation conditions, if the training performance parameters of the initial neural network structure meet the preset saturation conditions, returning to the step of executing stage growing operation, if the training performance parameters of the initial neural network structure do not meet the preset saturation conditions, taking the initial neural network structure as a third neural network structure, returning to the step of executing deep growing operation until the training performance parameters of the second neural network structure meet the preset search conditions, and taking the second neural network structure as a target neural network structure.
Fig. 2C is a schematic diagram of a target search space according to an embodiment of the present invention. Specifically, three layers of target search spaces are shown in FIG. 2C. Where "1" represents an initial neural network structure generated based on the training attribute parameters, constituting a target search space 1. "11", "12" and "13" respectively denote 3 first neural network structures resulting from performing a structure growing operation on "1", constituting the target search space 2. Where "11" represents the first neural network structure in the target search space 2 with the highest verification performance parameter. "111" and "112" respectively denote 2 first neural network structures obtained by performing a structure growing operation on "11", which constitute the target search space 3.
Taking fig. 2C as an example, the search space pre-constructed in the prior art needs to include 9 neural network structures "1", "11", "12", "13", "111", "112", "121", "131", and "132", whereas the entire search space constructed based on the entire search procedure of the neural network structures provided in the above embodiment of the present invention only includes 6 neural network structures "1", "11", "12", "13", "111", and "112", which achieves the purpose of reducing the conventional search space range.
According to the technical scheme of the embodiment, a target search space is generated by performing structure growth operation on an initial neural network structure generated based on training attribute parameters, wherein the target search space comprises at least two first neural network structures, each first neural network structure is screened based on verification performance parameters corresponding to each first neural network structure in the target search space to obtain a second neural network structure, the target neural network structure is determined based on the second neural network structure and preset search conditions to generate the target search space in the search process of the neural network structure, the problem that the traditional search space needs to be artificially constructed in advance is solved, and the range of the generated target search space is far smaller than that of the traditional search space due to the fact that the target search space is generated based on the initial neural network structure, so that the calculation resource consumption in the neural network search process is reduced, and the search efficiency of the neural network structure is improved.
On the basis of the foregoing embodiment, specifically, the step of performing a stage growing operation on the initial neural network structure to obtain a third neural network structure includes: generating a new neuron based on the resolution corresponding to the last neuron in the initial neural network structure; the initial resolution corresponding to the newly added neuron is a preset multiple of the resolution corresponding to the last neuron; and generating a third neural network structure based on the newly added neurons, the newly added pooling layer and the initial neural network structure.
In an exemplary embodiment, the preset multiple may be 2 times or 3 times. The following embodiments of the present invention are each exemplified with the preset multiple being 2 times.
Wherein the third neural network structure can be represented as network i The newly added neuron can be expressed as
Figure BDA0003930351740000111
ri i =ro i =2ri m =2ro m Wherein ri m And ro m Representing the input and output resolution, ci, respectively, for the last neuron i And newly added neuron
Figure BDA0003930351740000112
The last neuron connected has the same number of output channels, co i And newly added neuron
Figure BDA0003930351740000113
The number of input channels of the next neuron connected is the same.
Wherein, in particular, the function of the newly added pooling layer is to enable the newly added neurons
Figure BDA0003930351740000114
Resolution of output data satisfies and increases neurons
Figure BDA0003930351740000115
Resolution requirements of the next neuron to connect.
In one embodiment, specifically, the size of the convolution kernel of the newly added neuron is a preset initialization size, and the initialization weight of the convolution kernel is an identity matrix. Therein, the preset initialization size is 3*3, for example.
The method has the advantages that the initialization weight of the convolution kernel is set to be the unit matrix according to the network state-fire theory, so that the performance of the old neural network can be kept by the newly generated neural network structure, the newly generated neural network structure is prevented from being retrained from zero, the training speed of the newly generated neural network structure is improved, and the searching efficiency of the neural network structure is improved.
Fig. 3 is a schematic diagram of a third neural network structure according to an embodiment of the present invention. Specifically, the third neural network structure is a network in the initial neural network structure 1 On the basis of the above-mentioned medicinal materials, the neuron is increased
Figure BDA0003930351740000121
And a pooling layer. Wherein ri 2 =ro 2 =2*2,ci 2 =3,co 2 =3。
In one embodiment, specifically, before performing the deep growing operation on the third neural network structure to generate the target search space, the method further includes: and executing incomplete training operation on the third neural network structure based on the first preset training times to obtain the trained third neural network structure. For example, the first preset number of training times may be 3 or 5.
The advantage of such an arrangement is that the prior art generally adopts a complete training mode, i.e., training the neural network structure to converge and then evaluating the network performance, which is time-consuming and requires setting of corresponding training strategies, such as learning rate, etc., to ensure that the neural network structure can be trained to converge. By adopting the incomplete training mode, the time consumption of training operation is reduced while the effect of complete training is achieved or even exceeded, and the searching efficiency of the neural network structure is further improved.
In one embodiment, specifically, performing a deep growing operation on the third neural network structure to generate the target search space includes: generating an initial search space based on the number of stage neuron sets corresponding to the third neural network structure; the input resolutions corresponding to the neurons in all stages in the neuron set in the same stage are the same, the input resolutions corresponding to the neurons in all stages are different, and the initial search space comprises at least two third neural network structures; for each third neural network structure in the initial search space, determining a target stage neuron set and a splitting type corresponding to the third neural network structure based on the identification serial number of the third neural network structure in the initial search space; performing deep growth operation on the third neural network structure based on the target stage neuron set and the split type to obtain a first neural network structure; a target search space is generated based on the at least two first neural network structures.
In the third neural network structure, at least two neurons with the same input resolution constitute a stage neuron set, taking fig. 3 as an example, the number of stage neuron sets corresponding to the third neural network structure is 2, and the input resolutions corresponding to the two stage neuron sets are 2*2 and 1*1, respectively.
Specifically, the number of copies is determined based on the number of stage neuron sets, and the initial neural search space is determined based on the number of copies and the third neural network structure. Wherein the number of copies N satisfies the formula: n = (ns-1) × 2, where ns represents the number of stage neuron sets. Accordingly, the initial search space network sN ={network i1 ,network i2 ...network iN }. Taking fig. 3 as an example, the number of copies is 2, and the initial search space includes 2 third neural network structures.
In one embodiment, specifically, determining a target stage neuron set and a split type corresponding to the third neural network structure based on the identification number of the third neural network structure in the initial search space includes: determining a target input resolution ratio based on the identification serial number of the third neural network structure in the initial search space, and combining a stage neuron set corresponding to the target input resolution ratio into a target stage neuron set; if the identification serial number meets the preset serial number range, setting the splitting type corresponding to the third neural network structure as a channel splitting type; and if the identification sequence number does not meet the preset sequence number range, setting the split type corresponding to the third neural network structure as the neuron split type.
Therein, exemplarily, for a third neural network structure network in the initial search space ij The target input resolution P satisfies the formula: p =2 [j/2] Wherein "," is present]"means rounding up. Specifically, the target input resolution of the target stage neuron set corresponding to the 1 st third neural network structure and the 2 nd third neural network structure in the initial search space is 2*2, the target input resolution of the target stage neuron set corresponding to the 3 rd third neural network structure and the 4 th third neural network structure is 4*4, and so on.
Specifically, the preset number range includes integer-times numbers of 2 or non-integer-times numbers of 2. Illustratively, the preset sequence number range is [2,4,6,8,10,12,14,16,18], or the preset sequence number range is [1,3,5,7,9,11,13,15,17,19]. In another embodiment, it is determined whether the identification number satisfies the formula mod (j, 2) =0, and if so, the splitting type corresponding to the third neural network structure is set as the channel splitting type, and if not, the splitting type corresponding to the third neural network structure is set as the neuron splitting type.
Specifically, the splitting type is a channel splitting type or a neuron splitting type. And if the splitting type is the channel splitting type, in the third neural network structure, performing channel splitting operation on channels corresponding to the neurons in each stage in the target stage neuron set to obtain a first neural network structure. And if the splitting type is the neuron splitting type, in the third neural network structure, respectively executing neuron splitting operation on each stage of neurons in the target stage neuron set to obtain a first neural network structure.
Fig. 4 is a flowchart of another method for searching a neural network structure according to an embodiment of the present invention, and this embodiment further refines technical features of "performing a deep growing operation on a third neural network structure based on a target stage neuron set and a splitting type to obtain a first neural network structure" in the foregoing embodiment. As shown in fig. 4, the method includes:
s210, responding to the obtained training attribute parameters of the training data set, and generating an initial neural network structure based on the training attribute parameters.
S220, performing stage growth operation on the initial neural network structure to obtain a third neural network structure.
And S230, generating an initial search space based on the number of the stage neuron sets corresponding to the third neural network structure.
S240, aiming at each third neural network structure in the initial search space, determining a target stage neuron set and a splitting type corresponding to the third neural network structure based on the identification serial number of the third neural network structure in the initial search space.
And S250, judging whether the splitting type is a channel splitting type, if so, executing S260, and if not, executing S280.
S260, determining channel average gradients corresponding to all channels of all stages of neurons in the target stage neuron set based on the verification data set and the third neural network structure.
Therein, for example, assume that the validation data set S = { (x) 1 ,y 1 ),(x 2 ,y 2 )...(x i ,y i )...(x n ,y n ) In which x i Representing the i-th verification image data, y, in the verification data set i Indicating the classification label corresponding to the ith verification image data. Determining a channel gradient set corresponding to each channel of each stage of neurons in a target stage neuron set in a third neural network structure based on the verification data set by adopting a chain rule, wherein the channel gradient set G corresponding to the jth channel j ={g 1j ,g 2j ...g ij ...g nj In which g is ij Indicating the channel gradient corresponding to the ith verification image data for the jth channel. Wherein the channel average gradient
Figure BDA0003930351740000151
S270, determining a growth channel set based on the average gradient of each channel, and respectively executing splitting operation on each channel in the growth channel set to obtain a first neural network structure.
Specifically, the average gradients of the channels are sorted in a descending order, and the channels with the preset selected number and the highest order are added to the growth channel set, or the channels with the average gradients exceeding a first preset gradient threshold are added to the growth channel set.
In one embodiment, specifically, the splitting operation is performed on each channel in the growth channel set to obtain the first neural network structure, including: for each channel in the growing channel set, executing parallel operation on a first split channel and a second split channel corresponding to the channel, and generating a channel split structure corresponding to the channel based on a summation function and the parallel first split channel and second split channel; based on the channel splitting structures, a first neural network structure is generated.
Fig. 5A is a schematic diagram of a channel splitting structure according to an embodiment of the present invention, and fig. 5A is an exemplary illustration of an input channel c1 of an ith neuron in a third neural network structure. Specifically, the input channel c1 performs splitting operation on the input channel c1 in the growth channel set, so as to obtain a first split channel c11 and a second split channel c12. The structure in the dotted box in the ith neuron in the first neural network structure in fig. 5A represents a channel splitting structure corresponding to the channel c1, specifically, the channel splitting structure includes a first splitting channel c11 and a second splitting channel c12 connected in parallel, and the outputs corresponding to the first splitting channel c11 and the second splitting channel c12 are respectively input to the neuron neu through a summation function i In (1).
It can be understood that the ith neuron neu in the third neural network structure i After the input channel of (3) is split, with the neuron neu i The number of output channels of the connected last neuron is correspondingly updated, and the ith neuron neu in the third neural network structure i After the output channel of (3) is split, neu with the neuron i The number of input channels of the next neuron connected is updated correspondingly.
In an embodiment, specifically, the first channel weight corresponding to the first split channel and the second channel weight corresponding to the second split channel may be preset or randomly generated.
In another embodiment, specifically, the first channel weight corresponding to the first split channel is a first preset ratio of the sum of the channel weight corresponding to the channel and the average gradient of the channel, and the second channel weight corresponding to the second split channel is a first preset ratio of the difference between the channel weight corresponding to the channel and the average gradient of the channel.
Wherein, the first preset proportion is 1/2. Illustratively, the first channel weight of the first split channel corresponding to the original channel c1
Figure BDA0003930351740000161
Second channel weight
Figure BDA0003930351740000162
Wherein, theta c1 Represents the channel weight, g, corresponding to the original channel c1 c1 The mean gradient of the channel corresponding to the original channel c1 is shown.
Wherein, specifically, the output y = theta of the original channel c1 c1 * x, where x represents the input data of the original channel c 1. After the channel splitting operation, the output y' = θ of the channel splitting structure corresponding to the original channel c1 c11 *x+θ c12 *x=y。
The advantage of setting up like this lies in, adopts the network state to penetrate the theory, and the output that the passageway split structure that former passageway split obtained is the same with the output of former passageway for the first neural network structure that generates has kept the network performance of the third neural network structure rather than corresponding, avoids follow-up needs to retrain from zero to first neural network structure, has improved the training speed of the neural network structure of new generation, and then has improved the search efficiency of neural network structure.
S280, determining the neuron average gradient corresponding to each stage of neurons in the target stage neuron set based on the verification data set and the third neural network structure.
Therein, for example, assume that the validation data set S = { (x) 1 ,y 1 ),(x 2 ,y 2 )...(x i ,y i )...(x n ,y n ) In which x i Representing the i-th verification image data, y, in the verification data set i Representing the ith verification image data pairThe corresponding classification label. Determining a neuron gradient set corresponding to each stage of neurons in a target stage neuron set in a third neural network structure based on a verification data set by adopting a chain rule, wherein the neuron gradient set R corresponding to the jth neuron j ={r 1j ,r 2j ...r ij ...r nj Wherein r is ij Representing the neuron gradient of the jth neuron corresponding to the ith verification image data. Wherein the mean gradient of neurons
Figure BDA0003930351740000171
S290, determining a growing neuron set based on the average gradient of each neuron, and respectively executing splitting operation on at least one stage neuron in the growing neuron set to obtain a first neural network structure.
Specifically, descending order sorting is carried out on the average gradient of each neuron, the neuron with the preset selected number which is sorted in the front is added into a growing neuron set, or the neuron with the average gradient of the neuron exceeding a first preset gradient threshold is added into the growing neuron set.
In one embodiment, specifically, for each stage neuron in the growing neuron set, a neuron division structure corresponding to the stage neuron is generated based on a series function, a summation function, and a first split neuron and a second split neuron corresponding to the stage neuron; based on the neuron division structures, a first neural network structure is generated.
Fig. 5B is a schematic diagram of a neuron splitting structure according to an embodiment of the present invention. Specifically, the ith neuron neu in the third neural network structure is shown on the left side of FIG. 5B i On the right side, the neuron neu i The neuron division structure obtained after the division operation is executed, specifically, the neuron division structure comprises a first division neuron neu i1 Second split neuron neu i2 A series function and a summation function.
In one embodiment, the first neuron weight corresponding to the first split neuron is a first preset neuron number value, and the second neuron weight corresponding to the second split neuron is a second preset neuron number value. The user can set the first preset neuron number value and the second preset neuron number value in a user-defined mode according to actual requirements.
In another embodiment, specifically, the first neuron weight corresponding to the first split neuron is a first preset proportion of a difference between the neuron mean gradient and the neuron weight corresponding to the stage neuron, and the second neuron weight corresponding to the second split neuron is a series function value formed by a zero value and a first preset proportion of a sum of the neuron mean gradient and the neuron weight corresponding to the stage neuron.
Wherein, the first preset proportion is 1/2. Illustratively, the stage neuron n1 corresponds to a first neuron weight of a first split neuron
Figure BDA0003930351740000181
Second neuron weight
Figure BDA0003930351740000182
Wherein, theta n1 Represents the neuron weight, r, corresponding to the stage neuron n1 n1 Represents the mean gradient of the neurons corresponding to stage neuron n 1.
Specifically, the output y = conv (θ) of the stage neuron n1 n1 X), where x represents the input data of the stage neuron n 1. After the neuron splitting operation, the output y1= conv (θ) corresponding to the first split neuron n11 X), output y2= conv (θ) corresponding to the second split neuron n12 Concat (x, y 1)) stage neuron n1 corresponds to the output y' = y1+ y2= y of the neuron division structure.
The advantage of setting up like this lies in, adopts the network state to penetrate the theory, and the output that the neuron splitting structure that stage neuron splitting obtained is the same with the output of stage neuron for the first neural network structure that generates has kept the network performance of the third neural network structure rather than corresponding, avoids follow-up needs to retrain from zero to first neural network structure, has improved the training speed of newly-generated neural network structure, and then has improved the search efficiency of neural network structure.
And S291, generating a target search space based on the at least two first neural network structures.
In one embodiment, specifically, the generating the target search space based on at least two first neural network structures includes: and respectively executing incomplete training operations on the at least two first neural network structures based on the second preset training times to obtain a target search space. For example, the second preset number of training times may be 3 or 5.
The advantage of such an arrangement is that the prior art generally adopts a complete training mode, i.e., training the neural network structure to converge and then evaluating the network performance, which is time-consuming and requires setting of corresponding training strategies, such as learning rate, etc., to ensure that the neural network structure can be trained to converge. By adopting the incomplete training mode, the time consumption of training operation is reduced while the effect of complete training is achieved or even exceeded, and the searching efficiency of the neural network structure is further improved.
S292, screening the first neural network structures based on the verification performance parameters respectively corresponding to the first neural network structures in the target search space to obtain a second neural network structure.
And S293, determining a target neural network structure based on the second neural network structure and a preset search condition.
According to the technical scheme of the embodiment, by judging the splitting type corresponding to the current third neural network structure, when the splitting type is the channel splitting type, a growing channel set is determined based on the average gradient of the channels corresponding to the channels of the neurons in each stage in the target stage neural set, and splitting operation is performed on each channel in the growing channel set to obtain the first neural network structure. When the splitting type is a neuron splitting type, determining a growing neuron set based on the neuron average gradient corresponding to each stage of neurons in a target stage neuron set, respectively executing splitting operation on each stage of neurons in the growing neuron set to obtain a first neural network structure, solving the splitting problem of each third neural network structure in an initial search space, and improving the accuracy of a growth channel set or a growth neuron set obtained by screening by adopting average gradient parameters to further improve the search efficiency of the neural network structure.
In addition, in the prior art, the initialization parameters of each unit structure in the neural network structure are randomly generated, and the neuron weight of each newly added neuron and the split neuron and the channel weight of the split channel in the embodiment of the present invention adopt the determined initialization parameters, so that the search result of the neural network structure in the embodiment of the present invention has reproducibility. In the prior art, the target neural network structure needs to be finely adjusted after the target neural network structure is obtained, but the network attitude technology is adopted in the embodiment of the invention, so that the network performance of the previous neural network structure is kept by the target neural network structure, and therefore, fine adjustment operation does not need to be carried out.
By adopting the searching method of the neural network structure provided by the embodiment of the invention to search on the cifar10 data set, the searching can be completed only by 1 GPU operating for 1 day, and the accuracy rate of the test data set is 92.7%. The network parameter and the accuracy are higher than those of a manually designed resnet network and a vgg network.
Fig. 6 is a schematic structural diagram of a searching apparatus of a neural network structure according to an embodiment of the present invention. As shown in fig. 6, the apparatus includes: an initial neural network structure generation module 310, a target search space generation module 320, a second neural network structure determination module 330, and a target neural network structure determination module 340.
The initial neural network structure generating module 310 is configured to generate an initial neural network structure based on a training attribute parameter in response to acquiring the training attribute parameter of the training data set;
a target search space generation module 320, configured to perform a structure growth operation on the initial neural network structure to generate a target search space; the target search space comprises at least two first neural network structures, and the number of neurons in each first neural network structure is different and/or the number of channels of the neurons is different;
a second neural network structure determining module 330, configured to screen each first neural network structure based on a verification performance parameter corresponding to each first neural network structure in the target search space, to obtain a second neural network structure;
and a target neural network structure determining module 340, configured to determine a target neural network structure based on the second neural network structure and a preset search condition.
According to the technical scheme of the embodiment, a target search space is generated by performing structure growth operation on an initial neural network structure generated based on training attribute parameters, wherein the target search space comprises at least two first neural network structures, each first neural network structure is screened based on verification performance parameters corresponding to each first neural network structure in the target search space to obtain a second neural network structure, the target neural network structure is determined based on the second neural network structure and preset search conditions to generate the target search space in the search process of the neural network structure, the problem that the traditional search space needs to be artificially constructed in advance is solved, and the range of the generated target search space is far smaller than that of the traditional search space due to the fact that the target search space is generated based on the initial neural network structure, so that the calculation resource consumption in the neural network search process is reduced, and the search efficiency of the neural network structure is improved.
On the basis of the above embodiment, the target neural network structure determining module 340 includes:
a target neural network structure determining unit, configured to take the second neural network structure as the target neural network structure if the training performance parameter of the second neural network structure satisfies the preset search condition;
and the return execution unit is used for taking the second neural network structure as the initial neural network structure if the training performance parameters of the second neural network structure do not meet the preset search conditions, and returning to execute the step of executing the structure growth operation on the initial neural network structure to generate a target search space.
On the basis of the above embodiment, the target search space generation module 320 includes:
a third neural network structure determining unit, configured to perform stage growth operation on the initial neural network structure to obtain a third neural network structure; wherein the stage growing operation comprises an operation of adding new neurons in the initial neural network structure;
the target search space generation unit is used for executing deep growth operation on the third neural network structure to generate a target search space; wherein the deep growing operation comprises an operation of splitting the neurons and channels in the third neural network structure.
On the basis of the above embodiment, the return execution unit is specifically configured to:
judging whether the saturation parameters of the initial neural network structure meet preset saturation conditions or not;
if yes, returning to the step of executing the stage growth operation of the initial neural network structure to obtain a third neural network structure;
if not, taking the initial neural network structure as a third neural network structure, returning to execute the step of executing the deep growth operation on the third neural network structure and generating a target search space;
the preset saturation condition comprises that the current verification accuracy of the initial neural network structure is smaller than the last verification accuracy corresponding to the initial neural network structure obtained by last iteration, and the number of the stage neuron sets in the initial neural network structure is smaller than the sum of the logarithm value of the image size corresponding to the training data set and the first numerical value.
On the basis of the foregoing embodiment, the third neural network structure determining unit is specifically configured to:
generating a new neuron based on the input resolution corresponding to the last neuron in the initial neural network structure; the initial resolution corresponding to the newly added neuron is a preset multiple of the resolution corresponding to the last neuron;
and generating a third neural network structure based on the newly added neurons, the newly added pooling layer and the initial neural network structure.
On the basis of the above embodiment, the size of the convolution kernel of the newly added neuron is a preset initialization size, and the initialization weight of the convolution kernel is an identity matrix.
On the basis of the above embodiment, the apparatus further includes:
and the first incomplete training module is used for executing incomplete training operation on the third neural network structure based on the first preset training times before deep growth operation is executed on the third neural network structure to generate a target search space, so as to obtain the trained third neural network structure.
On the basis of the above embodiment, the target search space generation unit includes:
an initial search space generation subunit, configured to generate an initial search space based on the number of stage neuron sets corresponding to the third neural network structure; the input resolutions corresponding to the neurons in all stages in the neuron set in the same stage are the same, the input resolutions corresponding to the neurons in all stages are different, and the initial search space comprises at least two third neural network structures;
the splitting type determining subunit is used for determining a target stage neuron set and a splitting type corresponding to each third neural network structure in the initial search space based on the identification serial number of the third neural network structure in the initial search space;
the first neural network structure determining subunit is used for executing deep growth operation on the third neural network structure based on the target stage neuron set and the split type to obtain a first neural network structure;
and the target search space generation subunit is used for generating a target search space based on the at least two first neural network structures.
On the basis of the above embodiment, the split type determination subunit is specifically configured to:
determining a target input resolution ratio based on the identification serial number of the third neural network structure in the initial search space, and combining a stage neuron set corresponding to the target input resolution ratio into a target stage neuron set;
if the identification serial number meets the preset serial number range, setting the splitting type corresponding to the third neural network structure as a channel splitting type;
and if the identification sequence number does not meet the preset sequence number range, setting the split type corresponding to the third neural network structure as the neuron split type.
On the basis of the above embodiment, the first neural network structure determines the subunits, specifically for:
if the splitting type is a channel splitting type, determining channel average gradients corresponding to all channels of all stages of neurons in the target stage neuron set respectively based on the verification data set and the third neural network structure;
and determining a growth channel set based on the average gradient of each channel, and respectively executing splitting operation on each channel in the growth channel set to obtain a first neural network structure.
On the basis of the above embodiment, the first neural network structure determines the subunits, specifically for:
for each channel in the growing channel set, executing parallel operation on a first split channel and a second split channel corresponding to the channel, and generating a channel split structure corresponding to the channel based on a summation function and the parallel first split channel and second split channel;
based on the channel splitting structures, a first neural network structure is generated.
On the basis of the above embodiment, the first channel weight corresponding to the first split channel is a first preset proportion of the sum of the channel weight corresponding to the channel and the average gradient of the channel, and the second channel weight corresponding to the second split channel is a first preset proportion of the difference between the channel weight corresponding to the channel and the average gradient of the channel.
On the basis of the above embodiment, the first neural network structure determines the subunits, specifically for:
if the splitting type is the neuron splitting type, determining neuron average gradients corresponding to the neurons in each stage in the target stage neuron set based on the verification data set and the third neural network structure;
and determining a growing neuron set based on the average gradient of each neuron, and respectively executing splitting operation on at least one stage of neurons in the growing neuron set to obtain a first neural network structure.
On the basis of the above embodiment, the first neural network structure determines the subunits, specifically for:
generating a neuron splitting structure corresponding to the stage neuron based on a series function, a summation function, and a first split neuron and a second split neuron corresponding to the stage neuron aiming at each stage neuron in the growing neuron set;
based on the neuron division structures, a first neural network structure is generated.
On the basis of the above embodiment, the first neuron weight corresponding to the first split neuron is a first preset proportion of the difference between the neuron average gradient and the neuron weight corresponding to the stage neuron, and the second neuron weight corresponding to the second split neuron is a series function value formed by a zero value and a first preset proportion of the sum of the neuron average gradient and the neuron weight corresponding to the stage neuron.
On the basis of the foregoing embodiment, the target search space generation subunit is specifically configured to:
and respectively executing incomplete training operations on the at least two first neural network structures based on the second preset training times to obtain a target search space.
The searching device of the neural network structure provided by the embodiment of the invention can execute the searching method of the neural network structure provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the executing method.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. The electronic device 10 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 7, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 11 performs the various methods and processes described above, such as the search method of the neural network structure.
In some embodiments, the neural network structure search method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the neural network structure search method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured by any other suitable means (e.g., by means of firmware) to perform the search method of the neural network structure.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
The computer program for implementing the neural network structure search method of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.
An embodiment of the present invention also provides a computer-readable storage medium storing computer instructions for causing a processor to execute a method for searching a neural network structure, the method including:
generating an initial neural network structure based on the training attribute parameters in response to the training attribute parameters of the obtained training data set;
executing a structure growing operation on the initial neural network structure to generate a target search space; the target search space comprises at least two first neural network structures, and the number of neurons in each first neural network structure is different and/or the number of channels of the neurons is different;
screening each first neural network structure based on verification performance parameters respectively corresponding to each first neural network structure in the target search space to obtain a second neural network structure;
and determining a target neural network structure based on the second neural network structure and the preset search condition.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the Internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (19)

1. A method for searching a neural network structure, comprising:
responding to a training attribute parameter of an obtained training data set, and generating an initial neural network structure based on the training attribute parameter;
executing a structure growing operation on the initial neural network structure to generate a target search space; the target search space comprises at least two first neural network structures, and the number of neurons and/or the number of channels of the neurons in each first neural network structure are different;
screening each first neural network structure based on verification performance parameters respectively corresponding to each first neural network structure in the target search space to obtain a second neural network structure;
and determining a target neural network structure based on the second neural network structure and a preset search condition.
2. The method of claim 1, wherein determining a target neural network structure based on the second neural network structure and a preset search condition comprises:
if the training performance parameters of the second neural network structure meet preset search conditions, taking the second neural network structure as a target neural network structure;
and if the training performance parameters of the second neural network structure do not meet the preset search conditions, taking the second neural network structure as an initial neural network structure, and returning to execute the step of executing the structure growth operation on the initial neural network structure to generate a target search space.
3. The method of claim 2, wherein performing a structure growing operation on the initial neural network structure to generate a target search space comprises:
executing stage growth operation on the initial neural network structure to obtain a third neural network structure; wherein the stage growing operation comprises an operation of adding new neurons in the initial neural network structure;
performing a deep growing operation on the third neural network structure to generate a target search space; wherein the deep growing operation comprises an operation to split neurons and channels in the third neural network structure.
4. The method of claim 3, wherein the step of returning to perform a structure growing operation on the initial neural network structure to generate a target search space comprises:
judging whether the saturation parameters of the initial neural network structure meet preset saturation conditions or not;
if yes, returning to the step of executing the stage growth operation on the initial neural network structure to obtain a third neural network structure;
if not, taking the initial neural network structure as a third neural network structure, and returning to execute the step of executing the deep growth operation on the third neural network structure to generate a target search space;
the preset saturation condition includes that the current verification accuracy of the initial neural network structure is smaller than the last verification accuracy corresponding to the initial neural network structure obtained through last iteration, the number of stage neural element sets in the initial neural network structure is smaller than the sum of a logarithm value and a first numerical value of an image size corresponding to the training data set, input resolutions corresponding to the neural elements in all stages in the same stage set are the same, and the input resolutions corresponding to the neural elements in all stages are different.
5. The method of claim 3, wherein performing a phase growth operation on the initial neural network structure to obtain a third neural network structure comprises:
generating a new neuron based on the resolution corresponding to the last neuron in the initial neural network structure; the initial resolution corresponding to the newly added neuron is a preset multiple of the resolution corresponding to the last neuron;
and generating a third neural network structure based on the newly added neurons, the newly added pooling layer and the initial neural network structure.
6. The method of claim 5, wherein the size of the convolution kernel of the newly added neuron is a preset initialization size, and the initialization weight of the convolution kernel is a unit matrix.
7. The method of claim 3, wherein prior to performing a deep growing operation on the third neural network structure to generate a target search space, the method further comprises:
and executing incomplete training operation on the third neural network structure based on the first preset training times to obtain the trained third neural network structure.
8. The method of claim 3, wherein performing a deep growing operation on the third neural network structure to generate a target search space comprises:
generating an initial search space based on the number of stage neuron sets corresponding to the third neural network structure; the input resolution ratios corresponding to the neurons in each stage in the neuron set in the same stage are the same, the input resolution ratios corresponding to the neurons in each stage are different, and the initial search space comprises at least two third neural network structures;
for each third neural network structure in the initial search space, determining a target stage neuron set and a splitting type corresponding to the third neural network structure based on the identification serial number of the third neural network structure in the initial search space;
performing a deep growth operation on the third neural network structure based on the target stage neuron set and the split type to obtain a first neural network structure;
generating a target search space based on at least two of the first neural network structures.
9. The method of claim 8, wherein the determining the target stage neuron set and the split type corresponding to the third neural network structure based on the identification number of the third neural network structure in the initial search space comprises:
determining a target input resolution ratio based on the identification serial number of the third neural network structure in the initial search space, and cooperating a stage neuron set corresponding to the target input resolution ratio as a target stage neuron set;
if the identification serial number meets a preset serial number range, setting a splitting type corresponding to the third neural network structure as a channel splitting type;
and if the identification sequence number does not meet the preset sequence number range, setting the splitting type corresponding to the third neural network structure as the neuron splitting type.
10. The method of claim 9, wherein performing a deep growing operation on the third neural network structure based on the target stage neuron set and the split type to obtain a first neural network structure comprises:
if the splitting type is a channel splitting type, determining channel average gradients corresponding to channels of the neurons at each stage in the target stage neuron set respectively based on a verification data set and the third neural network structure;
and determining a growth channel set based on the average gradient of each channel, and respectively executing splitting operation on each channel in the growth channel set to obtain a first neural network structure.
11. The method of claim 10, wherein the performing a splitting operation on each channel in the growing channel set to obtain a first neural network structure comprises:
for each channel in the growing channel set, performing parallel operation on a first split channel and a second split channel corresponding to the channel, and generating a channel split structure corresponding to the channel based on a summation function and the parallel first split channel and second split channel;
a first neural network structure is generated based on each of the channel splitting structures.
12. The method of claim 11, wherein the first channel weight corresponding to the first split channel is a first predetermined ratio of a sum of the channel weight corresponding to the channel and a channel average gradient, and the second channel weight corresponding to the second split channel is a first predetermined ratio of a difference between the channel weight corresponding to the channel and the channel average gradient.
13. The method of claim 9, wherein performing a deep growth operation on the third neural network structure based on the target stage neuron set and the split type to obtain a first neural network structure comprises:
if the splitting type is a neuron splitting type, determining neuron average gradients corresponding to neurons in each stage in the target stage neuron set respectively based on a verification data set and the third neural network structure;
and determining a growing neuron set based on the average gradient of each neuron, and respectively executing splitting operation on at least one stage neuron in the growing neuron set to obtain a first neural network structure.
14. The method of claim 13, wherein the performing the splitting operation on at least one stage neuron in the set of growing neurons to obtain a first neural network structure comprises:
for each stage neuron in the growing neuron set, generating a neuron splitting structure corresponding to the stage neuron based on a series function, a summation function, and a first split neuron and a second split neuron corresponding to the stage neuron;
and generating a first neural network structure based on each neuron splitting structure.
15. The method of claim 14, wherein the first neuron weight corresponding to the first split neuron is a first predetermined ratio of a difference between the neuron average gradient and the neuron weight corresponding to the stage neuron, and wherein the second neuron weight corresponding to the second split neuron is a series function of a first predetermined ratio of a sum of the neuron average gradient and the neuron weight corresponding to the stage neuron and a zero value.
16. The method of claim 8, wherein generating a target search space based on at least two of the first neural network structures comprises:
and respectively executing incomplete training operation on at least two first neural network structures based on a second preset training frequency to obtain a target search space.
17. An apparatus for searching a neural network structure, comprising:
the initial neural network structure generating module is used for responding to the training attribute parameters of the obtained training data set and generating an initial neural network structure based on the training attribute parameters;
the target search space generation module is used for executing structure growth operation on the initial neural network structure to generate a target search space; the target search space comprises at least two first neural network structures, and the number of neurons and/or the number of channels of the neurons in each first neural network structure are different;
a second neural network structure determining module, configured to screen each first neural network structure in the target search space based on a verification performance parameter corresponding to each first neural network structure, so as to obtain a second neural network structure;
and the target neural network structure determining module is used for determining the target neural network structure based on the second neural network structure and a preset search condition.
18. An electronic device, characterized in that the electronic device comprises:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the neural network structure searching method of any one of claims 1-16.
19. A computer-readable storage medium storing computer instructions for causing a processor to implement the neural network structure searching method of any one of claims 1 to 16 when executed.
CN202211387018.2A 2022-11-07 2022-11-07 Searching method, device, equipment and storage medium of neural network structure Pending CN115545171A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211387018.2A CN115545171A (en) 2022-11-07 2022-11-07 Searching method, device, equipment and storage medium of neural network structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211387018.2A CN115545171A (en) 2022-11-07 2022-11-07 Searching method, device, equipment and storage medium of neural network structure

Publications (1)

Publication Number Publication Date
CN115545171A true CN115545171A (en) 2022-12-30

Family

ID=84721326

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211387018.2A Pending CN115545171A (en) 2022-11-07 2022-11-07 Searching method, device, equipment and storage medium of neural network structure

Country Status (1)

Country Link
CN (1) CN115545171A (en)

Similar Documents

Publication Publication Date Title
US20190279088A1 (en) Training method, apparatus, chip, and system for neural network model
US20190294975A1 (en) Predicting using digital twins
WO2022042123A1 (en) Image recognition model generation method and apparatus, computer device and storage medium
CN111047563B (en) Neural network construction method applied to medical ultrasonic image
WO2023138188A1 (en) Feature fusion model training method and apparatus, sample retrieval method and apparatus, and computer device
EP3620982A1 (en) Sample processing method and device
EP4343616A1 (en) Image classification method, model training method, device, storage medium, and computer program
CN113657483A (en) Model training method, target detection method, device, equipment and storage medium
CN115294397A (en) Classification task post-processing method, device, equipment and storage medium
US20220383036A1 (en) Clustering data using neural networks based on normalized cuts
CN113032367A (en) Dynamic load scene-oriented cross-layer configuration parameter collaborative tuning method and system for big data system
CN115186738B (en) Model training method, device and storage medium
CN116228301A (en) Method, device, equipment and medium for determining target user
CN115545171A (en) Searching method, device, equipment and storage medium of neural network structure
CN114610953A (en) Data classification method, device, equipment and storage medium
CN115294405A (en) Method, device, equipment and medium for constructing crop disease classification model
CN114611609A (en) Graph network model node classification method, device, equipment and storage medium
CN114580517A (en) Method and device for determining image recognition model
CN115131633A (en) Model migration method and device and electronic equipment
CN114241243B (en) Training method and device for image classification model, electronic equipment and storage medium
US20240135698A1 (en) Image classification method, model training method, device, storage medium, and computer program
CN116151215B (en) Text processing method, deep learning model training method, device and equipment
CN115661847B (en) Table structure recognition and model training method, device, equipment and storage medium
CN117274778B (en) Image search model training method based on unsupervised and semi-supervised and electronic equipment
US20220398485A1 (en) Preventing data vulnerabilities during model training

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination