CN113326922B - Neural network generation method and device, electronic equipment and storage medium - Google Patents

Neural network generation method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113326922B
CN113326922B CN202110602323.8A CN202110602323A CN113326922B CN 113326922 B CN113326922 B CN 113326922B CN 202110602323 A CN202110602323 A CN 202110602323A CN 113326922 B CN113326922 B CN 113326922B
Authority
CN
China
Prior art keywords
path
trained
operator
training
oversized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110602323.8A
Other languages
Chinese (zh)
Other versions
CN113326922A (en
Inventor
苏修
游山
郑明凯
王飞
钱晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Priority to CN202110602323.8A priority Critical patent/CN113326922B/en
Publication of CN113326922A publication Critical patent/CN113326922A/en
Application granted granted Critical
Publication of CN113326922B publication Critical patent/CN113326922B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure provides a neural network generation method, a device, electronic equipment and a storage medium. Wherein the method comprises the following steps: determining a path to be trained from a path search space determined based on a plurality of oversized neural networks with the same structure; determining second operator internal parameters of each operator in the path to be trained based on first operator internal parameters of each operator in the path to be trained in a plurality of oversized neural networks and path weights of each operator in the path to be trained in a plurality of oversized neural networks; performing the training of the super-large neural networks by using the second operator internal parameters; and generating a target neural network based on the plurality of oversized neural networks after the training for multiple rounds.

Description

Neural network generation method and device, electronic equipment and storage medium
Technical Field
The disclosure relates to the technical field of deep learning, in particular to a neural network generation method, a device, electronic equipment and a storage medium.
Background
The automatic network structure searching is a method for determining the neural network structure based on a certain searching algorithm in a pre-constructed searching space, and aims to relieve the high cost and experience deviation of a manual design network so as to obtain the neural network structure with better performance.
In the current automatic network structure searching method, for the same operator, the same operator parameters (namely weight parameters corresponding to the operator in the neural network) are used in different paths, so that the performance of different network structures corresponding to different paths is difficult to evaluate accurately, and the difficulty of network structure searching is increased.
Disclosure of Invention
The embodiment of the disclosure at least provides a neural network generation method, a device, electronic equipment and a storage medium.
In a first aspect, an embodiment of the present disclosure provides a method for generating a neural network, including: determining a path to be trained from a path search space determined based on a plurality of oversized neural networks with the same structure; determining second operator internal parameters of each operator in the path to be trained based on first operator internal parameters of each operator in the path to be trained in a plurality of oversized neural networks and path weights of each operator in the path to be trained in a plurality of oversized neural networks; performing the training of the super-large neural networks by using the second operator internal parameters; and generating a target neural network based on the plurality of oversized neural networks after the training for multiple rounds.
Therefore, in the process of training the oversized neural network, the same operator is controlled to have different operator internal references in different paths, so that the performance evaluation of the paths is ensured to be more accurate, and the performance of the generated neural network is improved.
In a possible implementation manner, the determining the path to be trained from the path search space determined based on the plurality of oversized neural networks with the same structure includes: aiming at the condition that the present round of training is the first round of training, randomly determining a plurality of paths to be trained from the path searching space; aiming at the condition that the training of the present round is non-first round training, based on training results respectively corresponding to a plurality of to-be-trained paths determined by previous round training, determining a father path in the training of the present round from the plurality of to-be-trained paths determined by the previous round training; and performing mutation and/or cross processing on the father path to obtain a path to be trained corresponding to the round of training.
Therefore, the number of paths to be trained can be greatly reduced, paths with poor performance can be filtered out gradually in the process of multiple rounds of training, and the training efficiency is higher.
In a possible implementation manner, the determining, based on the first operator internal parameters of each operator in the path to be trained in the oversized neural networks and the path weights of each operator in the path to be trained in the oversized neural networks, the second operator internal parameters of each operator in the path to be trained includes: and for each operator in the path to be trained, weighting and summing the first operator internal parameters of the operator in the oversized neural networks by utilizing the path weights of the path to be trained in the oversized neural networks respectively, so as to obtain the second operator internal parameters of the operator in the path to be trained.
Therefore, different paths can utilize the first operator internal parameters in a self-mode, and the fact that the same operator is different in the second operator internal parameters in different paths is ensured.
In a possible implementation manner, the training of the plurality of oversized neural networks by using the second operator internal parameters includes: processing training sample data by using second operator internal parameters of each operator in the path to be trained to obtain a first processing result of the training sample data; determining a first loss based on the first processing result and tag information of the training sample data; and adjusting first operator internal parameters of each operator in the oversized neural networks respectively based on the first loss.
Therefore, the plurality of oversized neural networks can be trained synchronously, so that the first operator internal references of the operators in different oversized neural networks are kept different, and the second operator internal references of the same operator in different paths are controlled.
In a possible implementation manner, the processing the training sample data by using the second operator internal parameters of each operator in the path to be trained to obtain a first processing result of the training sample data includes: constructing a first neural network by using second operator internal parameters of each operator in the path to be trained; and processing the training sample data by using the first neural network to obtain a first processing result of the training sample data.
In a possible embodiment, the method further comprises: determining a path code of a path to be trained; and obtaining path weights of the paths to be trained in the super-large neural networks respectively based on the path codes.
Thus, the path is encoded, and the path weight is obtained by using the path encoding, so that the path weight obtaining mode is more objective and effective.
In a possible implementation manner, the determining the path code of the path to be trained includes: determining the single-hot coding of each operator in the path to be trained; and obtaining the path code of the path to be trained based on the single-hot codes of all operators in the path to be trained.
In a possible implementation manner, the obtaining, based on the path coding, path weights of the paths to be trained in the plurality of oversized neural networks respectively includes: and carrying out full connection processing on the path codes, and obtaining path weights of the paths to be trained in the ultra-large neural networks respectively based on the full connection processing result.
In a possible embodiment, the method further comprises: determining channel codes of the path to be trained based on the channel number of the output channels respectively corresponding to each operator in the path to be trained; the obtaining path weights of the path to be trained in the oversized neural networks based on the path codes comprises the following steps: carrying out data fusion processing on the path code and the channel code to obtain a fusion code; and obtaining path weights of the paths to be trained in the super-large neural networks respectively based on the fusion codes.
In this way, the structure of the target neural network can be searched in the channel dimension, so that the generated target neural network has better performance.
In a possible implementation manner, the performing data fusion processing on the path code and the channel code to obtain a fusion code includes: performing first full connection processing on the path code to obtain a transformation path code; performing second full connection processing on the channel codes to obtain transformation channel codes; and superposing the transformation path code and the transformation channel code to obtain the fusion code.
In a possible implementation manner, the obtaining, based on the fusion encoding, path weights of the paths to be trained in the plurality of oversized neural networks respectively includes: performing third full connection processing on the fusion code to obtain a transformation fusion code; and activating the transformation fusion codes to obtain path weights of the paths to be trained in the super-large neural networks respectively.
In a possible implementation manner, the obtaining, based on the path coding, path weights of the paths to be trained in the plurality of oversized neural networks respectively includes: and performing full connection processing on the path codes by using a weight prediction network corresponding to the round of training to obtain path weights of the paths to be trained in a plurality of oversized neural networks respectively.
In a possible embodiment, the method further comprises: aiming at the condition that the round of training is the first round of training, the pre-generated initial weight prediction network is used as the weight prediction network corresponding to the round of training.
In a possible embodiment, the method further comprises: aiming at the condition that the training is not the first training, the weight prediction network corresponding to the previous training is trained by utilizing the first loss generated during the previous training of the oversized neural network, so as to obtain the weight prediction network corresponding to the training.
In a possible implementation manner, the generating a target neural network based on the plurality of oversized neural networks after multiple rounds of training includes: determining a target path from at least partial paths based on verification precision corresponding to at least partial paths in the oversized neural network after multiple rounds of training; and generating the target neural network by utilizing first operator internal parameters of each target operator in the target path in the plurality of oversized neural networks after multiple rounds of training and path weights of the target path in the plurality of oversized neural networks after multiple rounds of training.
In a possible implementation manner, the determining, based on verification precision corresponding to at least some paths in the oversized neural network after multiple rounds of training, a target path from the at least some paths includes: determining alternative paths from a plurality of oversized neural networks after multiple rounds of training; wherein the alternative path comprises: the path to be trained determined in the last training round in the multiple training rounds; determining a verification accuracy of the alternative path using the verification sample data; and determining a target path from the alternative paths based on the verification accuracy of the alternative paths.
In a second aspect, an embodiment of the present disclosure further provides a generating device of a neural network, including: the first determining module is used for determining a path to be trained from a path search space determined based on a plurality of oversized neural networks with the same structure; the processing module is used for determining second operator internal parameters of each operator in the path to be trained based on first operator internal parameters of each operator in the path to be trained in a plurality of oversized neural networks and path weights of the path to be trained in a plurality of oversized neural networks; the training module is used for performing the round of training on the oversized neural network by utilizing the second operator internal parameters; and the generating module is used for generating a target neural network based on the plurality of oversized neural networks after the training for multiple rounds.
In a possible implementation manner, the first determining module is configured to, when determining a path to be trained from a path search space determined based on a plurality of oversized neural networks with identical structures: aiming at the condition that the present round of training is the first round of training, randomly determining a plurality of paths to be trained from the path searching space; aiming at the condition that the training of the present round is non-first round training, based on training results respectively corresponding to a plurality of to-be-trained paths determined by previous round training, determining a father path in the training of the present round from the plurality of to-be-trained paths determined by the previous round training; and performing mutation and/or cross processing on the father path to obtain a path to be trained corresponding to the round of training.
In a possible implementation manner, the processing module is configured to, when determining, based on the first operator internal parameters of each operator in the path to be trained in the plurality of oversized neural networks and the path weights of each operator in the path to be trained in the plurality of oversized neural networks, determine the second operator internal parameters of each operator in the path to be trained, where the second operator internal parameters are used for each operator in the path to be trained: and for each operator in the path to be trained, weighting and summing the first operator internal parameters of the operator in the oversized neural networks by utilizing the path weights of the path to be trained in the oversized neural networks respectively, so as to obtain the second operator internal parameters of the operator in the path to be trained.
In a possible implementation manner, the training module is configured to, when performing the present training on the plurality of oversized neural networks by using the second operator internal parameters: processing training sample data by using second operator internal parameters of each operator in the path to be trained to obtain a first processing result of the training sample data; determining a first loss based on the first processing result and tag information of the training sample data; and adjusting first operator internal parameters of each operator in the oversized neural networks respectively based on the first loss.
In a possible implementation manner, the training module is configured to, when processing training sample data by using second operator internal parameters of each operator in the path to be trained to obtain a first processing result of the training sample data: constructing a first neural network by using second operator internal parameters of each operator in the path to be trained; and processing the training sample data by using the first neural network to obtain a first processing result of the training sample data.
In a possible embodiment, the method further comprises: the second determining module is used for determining the path code of the path to be trained; and obtaining path weights of the paths to be trained in the super-large neural networks respectively based on the path codes.
In a possible implementation manner, the second determining module is configured, when determining the path code of the path to be trained, to: determining the single-hot coding of each operator in the path to be trained; and obtaining the path code of the path to be trained based on the single-hot codes of all operators in the path to be trained.
In a possible implementation manner, the second determining module is configured to, when obtaining path weights of the path to be trained in the plurality of oversized neural networks based on the path codes, respectively: and carrying out full connection processing on the path codes, and obtaining path weights of the paths to be trained in the ultra-large neural networks respectively based on the full connection processing result.
In a possible implementation manner, the second determining module is further configured to: determining channel codes of the path to be trained based on the channel number of the output channels respectively corresponding to each operator in the path to be trained; the second determining module is configured to, when obtaining path weights of the path to be trained in the plurality of oversized neural networks based on the path encoding, respectively: carrying out data fusion processing on the path code and the channel code to obtain a fusion code; and obtaining path weights of the paths to be trained in the super-large neural networks respectively based on the fusion codes.
In a possible implementation manner, the second determining module is configured to, when performing data fusion processing on the path code and the channel code to obtain a fusion code: performing first full connection processing on the path code to obtain a transformation path code; performing second full connection processing on the channel codes to obtain transformation channel codes; and superposing the transformation path code and the transformation channel code to obtain the fusion code.
In a possible implementation manner, the second determining module is configured to, when obtaining path weights of the path to be trained in the plurality of oversized neural networks based on the fusion codes, respectively: performing third full connection processing on the fusion code to obtain a transformation fusion code; and activating the transformation fusion codes to obtain path weights of the paths to be trained in the super-large neural networks respectively.
In a possible implementation manner, the second determining module is configured to, when obtaining path weights of the path to be trained in the plurality of oversized neural networks based on the path codes, respectively: and performing full connection processing on the path codes by using a weight prediction network corresponding to the round of training to obtain path weights of the paths to be trained in a plurality of oversized neural networks respectively.
In a possible implementation manner, the second determining module is further configured to: aiming at the condition that the round of training is the first round of training, the pre-generated initial weight prediction network is used as the weight prediction network corresponding to the round of training.
In a possible implementation manner, the second determining module is further configured to: aiming at the condition that the training is not the first training, the weight prediction network corresponding to the previous training is trained by utilizing the first loss generated during the previous training of the oversized neural network, so as to obtain the weight prediction network corresponding to the training.
In a possible implementation manner, the generating module is configured to, when generating the target neural network based on the plurality of oversized neural networks after multiple rounds of training: determining a target path from at least partial paths based on verification precision corresponding to at least partial paths in the oversized neural network after multiple rounds of training; and generating the target neural network by utilizing first operator internal parameters of each target operator in the target path in the plurality of oversized neural networks after multiple rounds of training and path weights of the target path in the plurality of oversized neural networks after multiple rounds of training.
In a possible implementation manner, the generating module is configured to, when determining a target path from at least some paths in the oversized neural networks after multiple rounds of training based on verification accuracy corresponding to the at least some paths respectively: determining alternative paths from a plurality of oversized neural networks after multiple rounds of training; wherein the alternative path comprises: the path to be trained determined in the last training round in the multiple training rounds; determining a verification accuracy of the alternative path using the verification sample data; and determining a target path from the alternative paths based on the verification accuracy of the alternative paths.
In a third aspect, an optional implementation manner of the disclosure further provides a computer device, a processor, and a memory, where the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the machine-readable instructions stored in the memory, where the machine-readable instructions, when executed by the processor, perform the steps in the first aspect, or any possible implementation manner of the first aspect, when executed by the processor.
In a fourth aspect, an alternative implementation of the present disclosure further provides a computer readable storage medium having stored thereon a computer program which when executed performs the steps of the first aspect, or any of the possible implementation manners of the first aspect.
The description of the effects of the generating apparatus, the computer device, and the computer-readable storage medium of the neural network refers to the description of the generating method of the neural network, and is not repeated here.
The foregoing objects, features and advantages of the disclosure will be more readily apparent from the following detailed description of the preferred embodiments taken in conjunction with the accompanying drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for the embodiments are briefly described below, which are incorporated in and constitute a part of the specification, these drawings showing embodiments consistent with the present disclosure and together with the description serve to illustrate the technical solutions of the present disclosure. It is to be understood that the following drawings illustrate only certain embodiments of the present disclosure and are therefore not to be considered limiting of its scope, for the person of ordinary skill in the art may admit to other equally relevant drawings without inventive effort.
Fig. 1 shows a flowchart of a method for generating a neural network according to an embodiment of the present disclosure;
FIG. 2 illustrates a flowchart of a particular method for obtaining path weights of paths to be trained in a plurality of oversized neural networks, respectively, provided by an embodiment of the present disclosure;
FIG. 3 illustrates an example of a weight prediction network provided by embodiments of the present disclosure;
FIG. 4 is a flow chart of a specific method for performing this round of training on an oversized neural network using a second operator internal reference, provided by an embodiment of the present disclosure;
FIG. 5 illustrates a flowchart of a particular method for generating a target neural network based on a plurality of oversized neural networks after multiple rounds of training, provided by embodiments of the present disclosure;
FIG. 6 shows a schematic diagram of a neural network generation device provided by an embodiment of the present disclosure;
fig. 7 shows a schematic diagram of a computer device provided by an embodiment of the disclosure.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, but not all embodiments. The components of the disclosed embodiments generally described and illustrated herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be made by those skilled in the art based on the embodiments of this disclosure without making any inventive effort, are intended to be within the scope of this disclosure.
According to research, when the automatic network structure search is performed on the neural network, if the neural network to be generated comprises n layers of networks, each layer of network has m optional operators, the path search space finally formed is provided withm n A path; the current automatic network structure searching method generally utilizes m operators of n layers of network layers to generate an oversized neural network, determines at least one path to be trained from a path searching space corresponding to the oversized neural network for a plurality of times, and trains the determined path to be trained after determining at least one path to be trained each time; and obtaining the oversized neural network after multiple rounds of training through the multiple rounds of training, and determining the network structure of the neural network based on the oversized neural network after multiple rounds of training. In this approach, for any optional operator of any network layer, all paths that include such operator share the operator's internal parameters of such operator, i.e., for any operator, their operator internal parameters in all paths that contain the operator are the same. However, the performance or effect of the same operator in different paths may also be different for different paths; the search method of the neural network structure of the multipath sharing operator internal reference may cause inaccurate performance evaluation of the paths, and further result in that the determined network structure of the neural network may not be an optimal structure, and the performance of the generated neural network is poor.
Based on the above research, the present disclosure provides a method for generating a neural network, which controls the same operator to have different operator internal references in different paths during the process of training the oversized neural network, thereby ensuring more accurate performance evaluation of the paths and improving the performance of the generated neural network.
The present invention is directed to a method for manufacturing a semiconductor device, and a semiconductor device manufactured by the method.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
For the sake of understanding the present embodiment, first, a detailed description will be given of a method for generating a neural network disclosed in an embodiment of the present disclosure, where an execution subject of the method for generating a neural network provided in the embodiment of the present disclosure is generally a computer device having a certain computing capability, where the computer device includes, for example: the terminal device, or server or other processing device, may be a User Equipment (UE), mobile device, user terminal, cellular telephone, cordless telephone, personal digital assistant (Personal Digital Assistant, PDA), handheld device, computing device, vehicle mounted device, wearable device, etc. In some possible implementations, the method of generating the neural network may be implemented by a processor invoking computer readable instructions stored in a memory.
The method for generating the neural network provided by the embodiment of the present disclosure is described below.
Referring to fig. 1, a flowchart of a method for generating a neural network according to an embodiment of the disclosure is shown, where the method includes steps S101 to S104, where:
s101: determining a path to be trained from a path search space determined based on a plurality of oversized neural networks with the same structure;
s102: determining second operator internal parameters of each operator in the path to be trained based on first operator internal parameters of each operator in the path to be trained in a plurality of oversized neural networks and path weights of each operator in the path to be trained in a plurality of oversized neural networks;
s103: performing the training of the super-large neural networks by using the second operator internal parameters;
s104: and generating a target neural network based on the plurality of oversized neural networks after the training for multiple rounds.
According to the embodiment of the disclosure, by constructing a plurality of oversized neural networks with the same structure, aiming at each operator in the oversized neural network, the operator has different first operator internal parameters in different oversized neural networks, path weights are respectively corresponding to any path comprising the operator in different oversized neural networks, then the first operator internal parameters of the operator in different oversized neural networks and the path weights respectively corresponding to the paths comprising the operator in different oversized neural networks are utilized to generate second operator internal parameters of the operator in the path, and then the second operator internal parameters are utilized to search the neural network structure, so that the operator internal parameters corresponding to the path can be guaranteed to exist in different paths of each operator, the problem of low path evaluation precision caused by the fact that the operator internal parameters are shared by a plurality of paths is solved, the path evaluation is more accurate, and the performance of the generated target neural network is improved. In addition, because the path evaluation is more accurate, the difficulty of network structure search is reduced, the network structure search efficiency based on the super network is improved, and the hardware resource consumption of the network structure search is reduced.
The following describes each of the above-mentioned S101 to S104 in detail.
For S101 described above: in the embodiment of the disclosure, a plurality of oversized neural networks with the same mechanisms are constructed, wherein the identical structures mean that the plurality of oversized neural networks comprise the same number of network layers, and the selectable operator types of each network layer are also the same; when the oversized neural network is generated, for example, an oversized neural network can be generated first, and the oversized neural network is initialized; copying the initialized oversized neural network to obtain K oversized neural networks; in this case, the same operator is referenced the same in the initial operators of the K oversized neural networks. In addition, K oversized neural networks with the same structure can be directly generated, and then initialization is respectively carried out on different oversized neural networks; in this case, the same operator may or may not have the same initial operator internal parameters in the K oversized neural networks. With the multi-round process of the oversized neural network, the operator internal parameters of the same operator in different oversized neural networks can be caused to present different conditions.
After the oversized neural network is generated, a path search space can be generated according to the oversized neural network. Here, there are a plurality of paths in the path search space; wherein each path includes an optional operator in each network layer in the oversized neural network. All paths included in the oversized neural network form a path search space, and a path searching process is performed, namely a target path is determined in the path search space, and the neural network formed by each operator in the target path is the target neural network to be generated.
Exemplary, if the oversized neural network includes n network layers, each having m optional operators, the paths in the generated search space have m n A strip.
Selectable operators include, for example: at least one of a convolution operator, a pooling operator, an identity mapping operator, and a predetermined function block; here, the predetermined functional block refers to a neural network having a certain specific function that has been trained; these neural networks can perform a certain function; such as mobile network mobilenet v2 blocks for refinement, detection and segmentation, an extremely efficient convolutional neural network shufflenet v2 for mobile devices, etc.
When the ultra-large neural network is trained for multiple rounds, a path to be trained in the training process of each round is determined from a path search space for each round. The path to be trained in the present training process may be determined from the path search space by using an evolutionary algorithm (Evolutionary Algorithms), or may be determined from the path search space by using a random sampling method. The updating of the first operator internal parameters corresponding to each operator in the path to be trained in the plurality of oversized neural networks can be realized in the training process of the round.
When determining the corresponding paths to be trained for each round of training from the path search space by adopting the evolution algorithm, for example, the N paths with highest precision from the multiple paths to be trained determined in the previous round of training can be selected as father paths for the training based on training results respectively corresponding to the M paths to be trained determined in the previous round of training, and then the N father paths are subjected to mutation and cross processing to generate multiple paths to be trained for the training. And executing the training of the preset number of rounds to obtain a plurality of oversized neural networks obtained after the training of the plurality of rounds.
Illustratively, in the first training round, M paths a 1-aM to be trained are determined from the path search space, and the M paths to be trained are respectively trained by using training sample data. And then respectively processing the verification sample data by using the trained M paths a1 to aM to obtain verification precision corresponding to the paths a1 to aM. And then determining N paths from a1 to aM according to the order of the verification precision from high to low, wherein the N paths are used as father paths for the second training.
And in the second training, performing mutation and/or intersection treatment on the N father paths obtained in the first training to obtain M paths b 1-bM to be trained corresponding to the second training. And training the corresponding paths to be trained by utilizing the training sample data for the second wheel training respectively. And then, respectively processing the verification sample data by using the trained M paths b 1-bM to obtain verification precision corresponding to the b 1-bM respectively. And then determining N paths from b1 to bM according to the order of the verification accuracy from high to low, wherein the N paths are used as father paths for the third training.
……
And (3) performing R-wheel training by circulating in the way, and obtaining a target path from M paths to be trained determined by the R-wheel training.
In the training process of each round, the process of training the path to be trained determined by the round may be specifically described in the following embodiments, which are not described herein.
In addition, in each round of training, the determined multiple paths to be trained can have the same operator or not.
Aiming at the condition that different paths to be trained determined by the wheel training do not have the same operator, synchronous training or asynchronous training can be carried out on a plurality of paths to be trained; for the situation that the same operator is arranged between different paths to be trained, which are determined by the round training, asynchronous training can be performed on different paths to be trained, which are provided with the same operator, for example, if the paths to be trained a1 and the paths to be trained a2 are provided with the same operator s1, the paths to be trained a1 in the paths to be trained can be trained first, and then the other paths to be trained a2 with the same operator s1 are trained; when another path a2 to be trained is trained, the first operator internal parameter corresponding to s1 is determined, for example, in the process of training a 1.
For S102 described above: when determining the second operator internal parameters of each operator in the to-be-trained path, for example, the path weights of each operator in the to-be-trained path in the plurality of oversized neural networks can be utilized to carry out weighted summation on the first operator internal parameters of each operator in the plurality of oversized neural networks to obtain the second operator internal parameters of each operator in the to-be-trained path.
For example, for any operator O in the oversized neural network, a first set of operator internal parameters for the operator O may be maintained, for example; assuming that the number of the oversized neural networks is K, the first operator internal parameters of the operator O in the K oversized neural networks are respectively: θ 1 ,…,θ K The first set of operator internal parameters of this operator O is then represented, for example, as: Θ= { θ 1 ,…,θ K }. The first operator internal parameter set Θ can be shared by all different paths comprising the operator O, but because different paths have different path weights in different oversized neural networks, the different paths use the operator internal parameter set of the operator O in a respective mode, so that the same operator has different performances in different paths.
For example, for the ith path in the search space, operator O references w in the second operator in that path i The following formula (1) is satisfied:
Figure BDA0003093318520000111
wherein lambda is i And (5) representing the path weight of the ith path in the K oversized neural networks respectively. Lambda (lambda) i,k Representing the path weight of the ith path in the kth oversized neural network.
λ i Belonging to the set delta K-1 And delta K-1 Expressed as:
Figure BDA0003093318520000112
Δ K-1 and representing path weight sets respectively corresponding to the ith path in the path search space in the K oversized neural networks.
That is, for any one path, the sum of path weights corresponding to the path weights in the K oversized neural networks is 1.
By considering all N paths including operator O at the same time, we can approximate the second operator argument of its operator O among the N paths as the following equation (2):
W=[w 1 ,…,w N ]≈ΘΛ (2)
wherein Θ represents the first operator internal parameter set of operator O; Λ represents N paths { a } comprising an operator O 1 ,a 2 ,…,a N Each path a in } j And respectively collecting path weights in K oversized neural networks.
By the method, different paths comprising the same operator are realized, and the operator internal parameters (second operator internal parameters) of the operator in the different paths can be different, so that different characteristics of the different paths can be reflected by the operator internal parameters of the same operator in the different paths.
Each operator in the path to be trained is respectively in a first operator internal parameter in a plurality of oversized neural networks, and can be obtained by directly reading operator internal parameter sets respectively corresponding to each maintained operator.
For the first training, the operator internal parameters of each operator in the plurality of oversized neural networks can be randomly initialized, and then the operator internal parameter set of each operator is generated based on the random initialization result.
In the multi-round training process, at least one path to be trained is trained by each round of training, namely, each round of training updates first operator internal references of each operator in the path to be trained, which are determined by the round of training, in a plurality of oversized neural networks, so that before each round of training is finished, the operator internal references of each operator in the path to be trained are updated based on the updated first operator internal references.
The method for updating the first operator internal parameters of each operator in the path to be trained in each training process can be described in the following embodiments, and will not be described in detail herein.
Referring to fig. 2, an embodiment of the present disclosure provides a specific method for acquiring path weights of paths to be trained in a plurality of oversized neural networks, including:
S201: determining a path code of a path to be trained;
in a specific implementation, when determining the path code of the path to be trained, for example, the following manner may be adopted: determining the single-hot coding of each operator in the path to be trained;
and obtaining the path code of the path to be trained based on the single-hot codes of all operators in the path to be trained.
Illustratively, for each network layer of the oversized neural network, assuming that the network layer has m optional operators, the optional operators in m are thermally independent coded. For example, if there are 6 optional operators, the independent thermal codes corresponding to the 6 optional operators are: 000001, 000010, 000100, 001000, 010000, 100000.
After the independent thermal codes corresponding to the operators in the path to be trained are obtained, the independent thermal codes of the operators in the path to be trained can be utilized to form a matrix or vector formed by 0 and 1, and the matrix or vector is used as the path code of the path to be trained.
S202: and obtaining path weights of the paths to be trained in the super-large neural networks respectively based on the path codes.
When the path weights of the paths to be trained in the oversized neural networks are obtained by using the path codes, for example, full-connection processing can be performed on the path codes, and the path weights of the paths to be trained in the oversized neural networks are obtained based on the result of the full-connection processing. Here, for example, the activation function may be used to perform activation processing on the result of performing full connection processing on the path code, so as to obtain path weights of paths to be trained in a plurality of oversized neural networks respectively.
For example, when performing full-connection processing on the path code and obtaining the path weights of the paths to be trained in the plurality of oversized neural networks respectively based on the result of the full-connection processing, for example, any one of the following modes I or II may be adopted:
i: for example, a weight prediction network may be constructed, where the weight prediction network includes a full connection layer and an activation layer; the full connection layer carries out full connection processing on the path codes; the activation layer is used for activating the full-connection processing result to obtain path weights of the paths to be trained in the super-large neural networks respectively. Wherein the fully-connected layer comprises, for example: a Multi-Layer Perceptron (MLP); the activation layer activates the output of the multi-layer sensor, for example, using an activation function Softmax.
Here, the weight prediction network is expressed as: pi, searching the ith path in the space aiming at the path, wherein the path weights lambda respectively correspond to the ith path in the K oversized neural network i The following formula (3) is satisfied:
Figure BDA0003093318520000131
wherein pi (·) represents the network to be weighted, σ represents the network parameters of the weighted network; a, a i Representing an ith path in the path search space; a represents a path search space; delta K-1 Representing the ith path a in the path search space i And respectively corresponding path weight sets in the K oversized neural networks. The above formula represents: determination of the path a to be trained from the path search space A i And the path a to be trained i Path weights respectively corresponding in a plurality of oversized neural networks satisfy pi (a i ;σ)。
II: among the various operators in the neural network, multiple output channels are typically included. For example, for a convolution operator, a convolution process may be performed using multiple convolution kernels to yield an output result having multiple channels. In addition to the type of operators in the neural network, the number of output channels of the network layer may also have a certain influence on the performance of the neural network. Generally, the greater the number of output channels, the more features are extracted (e.g., different convolution kernels can extract different features of the input data). In order to perform the neural network structure searching process, besides determining the types of the operators in the generated target neural network, the number of output channels of each operator can be searched in a finer granularity, so in another embodiment of the disclosure, paths with the same operators but different numbers of output channels of the operators are regarded as different paths, and the neural network structure searching is performed on the basis of the paths, so that a more preferable target path can be generated.
In a specific implementation, in order to achieve the purpose of searching the neural network structure on the granularity of the number of channels, in another embodiment of the present disclosure, channel codes in the path to be trained may also be determined based on output channels corresponding to each operator in the path to be trained; when the path weights of the paths to be trained in the oversized neural networks are obtained based on the path codes, for example, data fusion processing can be performed on the path codes and the channel codes to obtain fusion codes; and obtaining path weights of the paths to be trained in the super-large neural networks respectively based on the fusion codes.
Here, for example, for each operator in the path to be trained, the number of output channels corresponding to the operator is encoded as the channel of the operator. And then splicing the channel codes corresponding to the operators in the path to be trained to form the channel codes corresponding to the path to be trained.
When the data fusion processing is performed on the path code and the channel code, for example, a first full connection processing may be performed on the path code to obtain a transformed path code; performing second full connection processing on the channel codes to obtain transformation channel codes; and superposing the transformation path code and the transformation channel code to obtain the fusion code.
At this time, when obtaining path weights of the paths to be trained in the oversized neural networks respectively based on the fusion codes, for example, a third full-connection process may be performed on the fusion codes to obtain transform fusion codes; and activating the transformation fusion codes to obtain path weights of the paths to be trained in the super-large neural networks respectively.
Here, a weight prediction network may be pre-configured, and the weight prediction network may be used to perform weight prediction processing on the path code and the channel code, so as to obtain path weights corresponding to the paths to be trained in the plurality of oversized neural networks, respectively.
For example, referring to fig. 3, the disclosed examples provide a specific example of a weight prediction network, including a first fully-connected layer, a second fully-connected layer, a superposition processing block, a third fully-connected layer, and an activation layer Softmax; the first full-connection layer is used for carrying out first full-connection processing on the path code to obtain a transformation path code; the second full-connection layer is used for carrying out second full-connection processing on the channel codes to obtain transformation channel codes; the superposition processing block is used for superposing the transformation path code and the transformation channel code to obtain the fusion code; the third full-connection layer is used for carrying out third full-connection processing on the fusion codes to obtain transformation fusion codes; the activation layer is used for activating the transformation fusion codes to obtain path weights [ lambda ] of the paths to be trained in the oversized neural networks respectively 12 ,...,λ K ]。
And then, obtaining second operator internal references of corresponding operators in the first neural network constructed based on the path to be trained by utilizing first operator internal references respectively corresponding to the right prediction network and the convolution operator in the K oversized neural networks, and then carrying out convolution processing on the first feature map based on the second operator internal references to obtain a second feature map.
The first full connection layer, the second full connection layer and the third full connection layer adopt an MLP network, for example; different full connection layers have different full connection parameters; the activation layer performs activation processing on the output of the MLP network corresponding to the third fully connected layer, for example, by using an activation function Softmax.
Here, the weight prediction network is expressed as: pi, searching the ith path in the space aiming at the path, wherein the path weights lambda respectively correspond to the ith path in the K oversized neural network i The following formula (4) is satisfied:
Figure BDA0003093318520000141
wherein c i Output channel information indicating the ith path.
The other symbols have the same meaning as the above formula (3), and are not described herein.
The formula represents: determination of the path a to be trained from the path search space A i And the path a to be trained i With output channel information c i Path a to be trained i Path weights respectively corresponding in a plurality of oversized neural networks satisfy pi (a i ,c i ;σ)。
Here, for some two paths having the same operator, if the output channel information corresponding to the two paths is different, the two paths are also regarded as different paths.
For the weight prediction network in the above I or II, the weight prediction network and the oversized neural network may be jointly trained in each round of training of the oversized neural network.
And aiming at each round of training, carrying out full connection processing on the path codes of the paths to be trained determined by the round of training by utilizing a weight prediction network corresponding to the round of training, so as to obtain the path weights of the paths to be trained in a plurality of oversized neural networks in the round of training.
With the joint training of the oversized neural network and the weight prediction network, the weight paths in the oversized neural networks determined for the same path can be different in different rounds of training.
For example, in each round of training, for the case that the round of training is the first round of training, the initial weight prediction network generated in advance may be used as the weight prediction network corresponding to the round of training.
Aiming at the condition that the training is not first training, the weight prediction network corresponding to the previous training is trained by utilizing the first loss obtained during the previous training of the oversized neural network, so as to obtain the weight prediction network corresponding to the present training.
Training corresponding weight prediction network parameters of tau+1 th round
Figure BDA0003093318520000151
For example, the following formula (5) is satisfied:
Figure BDA0003093318520000152
wherein Θ is (τ) Representing first operator internal references respectively corresponding to each operator in a plurality of oversized neural networks in a path to be trained determined by tau training; pi (A; sigma) (τ) ) Representing a tau training determination weight prediction network,
Figure BDA0003093318520000153
and representing second operator internal references corresponding to each operator in the path to be trained determined by the tau training. />
Figure BDA0003093318520000154
And representing the first loss corresponding to the path to be trained determined by the tau training.
In addition, for the case of II above, to enable the weight prediction network to identify different paths with the same operator, and different channel data, regularization terms may be added to the objective function of the weight prediction network to further distinguish between different channel widths.
Assume that two paths to be trained have the same operators, and the two paths to be trained have different channel widths respectively: c 1 And c 2
The similarity between the two neural networks M1 and M2 respectively constructed based on the two paths to be trained satisfies the following formula (6):
Figure BDA0003093318520000155
wherein (a, c) 1 ) And (a, c) 2 ) Representing the two paths respectively. Lambda (lambda) 1 Representing path weights of one path in the K oversized neural networks respectively; lambda (lambda) 2 And representing path weights corresponding to the other path in the K plurality of oversized neural networks respectively. And Θ represents the first operator internal parameters corresponding to each operator in the two paths in the plurality of oversized neural networks respectively.
After transformation using the Cauchy-Schwarz inequality, the following equation (7) is obtained:
Figure BDA0003093318520000161
it can be seen that the similarity between the neural networks M1 and M2, which are respectively constructed based on the two paths, is mainly affected by the first operator internal reference of the operators in the paths in the plurality of oversized neural networks, and the internal product of the path weights of the two paths in the plurality of oversized neural networks, respectively.
Since a plurality of oversized neural networks and weight prediction networks are to be trained in combination, when the network parameters of the weight prediction networks are updated in each training round, |Θ T Θ is fixed and therefore only λ 1 And lambda (lambda) 2 The similarity between the neural networks M1 and M2 is affected.
In addition, L1 distance d (c 1 ,c 2 )=||c 1 -c 2 || 1 To determine the distance between any two sets of channel data; if d (c) 1 ,c 2 ) Smaller, then corresponding c 1 And c 2 Should be closer together.
Furthermore, embodiments of the present disclosure set a distance threshold ε, if d (c) 1 ,c 2 ) < ε, then c 1 And c 2 Should beWith similar representations; otherwise the two are different.
Using the formula, a conditional distribution function (8) for path weights is determined:
Figure BDA0003093318520000162
η is a superparameter for controlling the degree of concentration of losses; 1 [j≠k] Indicating that 1 is present when j+.k, and 0 is otherwise present.
By the above formula (7), a regularization direction is obtained, and different paths with the same operator and different channel widths can be distinguished. The learning objectives include: optimizing the similar conditional probabilities of different channels. The negative log likelihood can be expressed as: r is (r) c (σ)=-logP(c i |c k )。
Further, the objective function of the weight prediction network satisfies the following equation (9), for example:
Figure BDA0003093318520000163
wherein alpha is a hyper-parameter.
The network parameters of the weight prediction network are then updated using the loss function.
After the first operator internal parameters of each operator in the path to be trained in the oversized neural networks and the path weights of the path to be trained in the oversized neural networks are determined in the above manner, the second operator internal parameters of each operator in the path to be trained can be determined.
For S103 described above:
referring to fig. 4, an embodiment of the present disclosure provides a method for performing this round of training on an oversized neural network by using a second operator internal parameter, where the method may include:
s401: processing training sample data by using second operator internal parameters of each operator in the path to be trained to obtain a processing result;
S402: determining a first loss based on the processing result and tag information of the training sample data;
s403: and adjusting first operator internal parameters of each operator in the oversized neural networks respectively based on the first loss.
In a specific implementation, when the second operator internal parameters of each operator in the path to be trained are utilized to process the training sample data, for example, the first neural network can be constructed by utilizing the second operator internal parameters of each operator; the neural network to be trained comprises operators in a path to be trained, and operator internal references of the operators are set as second operator internal references corresponding to the operators.
And then inputting the training sample data into a first neural network, and processing the training sample data by using the first neural network to obtain a processing result of the training sample data.
Training sample data in the embodiments of the present disclosure, and tag information of the training sample data are related to functions of a target neural network to be generated; for example, if the target neural network is used for performing target object detection processing on the image, the training sample data includes the image, and the corresponding tag information includes the position of the target object in the image; if the function of the target neural network is to perform type detection processing on the target object in the image, training sample data comprises a sample image, and corresponding label information comprises the position of the target object in the sample image and the specific type of the target object; if the target neural network is used for carrying out key point recognition on the human face, training sample data comprises a sample human face image, and corresponding label information comprises specific positions of the forehead key points of the human face in the sample human face image; if the target neural network is used for converting the voice data into text data, the training sample data comprises sample voice, and the corresponding label information comprises text corresponding to the sample voice; if the target neural network is used for word segmentation processing of the text data, the training sample data comprises a sample text, and the corresponding label information comprises a word segmentation result of word segmentation of the sample text. Specifically, according to the function of the target neural network to be generated, the data type of the training sample data and the specific content of the corresponding labeling information are determined, which is not limited in the embodiments of the present disclosure.
In adjusting the first operator internal parameters of each operator in the plurality of oversized neural networks based on the first loss, the following method may be adopted, for example:
and adjusting first operator internal references of each operator in the path to be trained in the oversized neural network based on the first loss aiming at each oversized neural network in the oversized neural networks, and generating first operator internal references of each operator in the path to be trained in the oversized neural network center.
In another embodiment, a first set of operator internal references Θ= { θ is used in referencing a first operator internal reference of any operator O in a different oversized neural network 1 ,…,θ K When the maintenance is performed, for example, the new first operator internal parameter of the operator can be obtained, and the first operator internal parameter set Θ= { θ can be updated 1 ,…,θ K }。
When another path to be trained is trained, if the path to be trained also comprises the operator O, the updated first operator internal parameter set Θ= { θ can be obtained from 1 ,…,θ K And in the process, reading first operator internal parameters corresponding to the operator O in the plurality of oversized neural networks respectively.
When adjusting the first operator internal parameters of each operator in the plurality of oversized neural networks based on the first loss, for example, the following manner may be adopted:
For each oversized neural network in the oversized neural networks, constructing a second neural network by utilizing a first operator internal reference of each operator in the oversized neural network in the path to be trained;
and adjusting a first operator internal parameter of each operator in the second neural network by using the first loss.
When the first operator internal parameters of each operator in the second neural network are adjusted by using the first loss, for example, the first operator internal parameters of each operator in the second neural network can be adjusted by using the back propagation of the first loss gradient, so as to obtain new first operator internal parameters of each operator in the path to be trained in a plurality of oversized neural networks.
For S104 described above:
when generating a target neural network based on a plurality of oversized neural networks after multiple rounds of training, an optimal target path is determined from a path search space constructed based on the plurality of oversized neural networks, and then the target neural network is generated by utilizing a target operator included in the target path.
Referring to fig. 5, an embodiment of the present disclosure provides a specific method for generating a target neural network based on a plurality of oversized neural networks after multiple rounds of training, including:
S501: and determining a target path from the paths based on verification precision corresponding to each path in the oversized neural network after multiple rounds of training.
In a specific implementation, for example, in the process of training each round of the plurality of oversized neural networks, after the training of the path to be trained corresponding to the round of training is completed, the verification data set D may be utilized val And verifying the verification accuracy of the path to be trained after the training of the wheel. The verification accuracy may be used as the verification accuracy corresponding to the path to be trained when determining the target neural network.
In addition, after the oversized neural network is trained for multiple rounds, each trained path can be determined from the oversized neural network, and the verification data set D can be utilized val And verifying the verification accuracy of the path to be trained after the training of the wheel.
After determining the verification precision corresponding to each path, determining the path with the highest verification precision as the target path.
For example, in the case of performing full connection processing on path codes of paths to be trained by using the weight prediction network described in the above I, to obtain path weights of the paths to be trained in a plurality of oversized neural networks, respectively, an objective function in performing path search based on the plurality of oversized neural networks satisfies the following formula (10):
Figure BDA0003093318520000191
Where a represents the path search space. a denotes a path in the path search space. a, a * Representing the target path.
σ represents the network parameters of the weight prediction network.
And Θ represents a first operator internal reference set formed by operators in the operator oversized neural network in the first operator internal references in the oversized neural networks respectively.
Figure BDA0003093318520000192
Representing a second operator internal parameter.
Figure BDA0003093318520000193
And representing the optimal first operator internal parameters of each operator in the plurality of oversized neural networks after multiple rounds of training.
Figure BDA0003093318520000194
And the optimal network parameters of the weight prediction network after multiple rounds of training are represented.
ACC val Representing paths in an oversized neural network in a validation dataset D val And (3) the verification accuracy.
Figure BDA0003093318520000195
Representing the use of training sample data D train A first loss of the path to be trained determined for the oversized neural network training process.
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0003093318520000196
e represents a set of first losses corresponding to the paths in the path search space.
In the case of performing full connection processing on path coding and channel coding of a path to be trained by using the weight prediction network described in the above II, to obtain path weights of the path to be trained in a plurality of oversized neural networks, respectively, an objective function when performing path search based on the plurality of oversized neural networks satisfies, for example, the following formula (11):
Figure BDA0003093318520000197
Where c represents channel information of the path.
In another embodiment of the present disclosure, if an evolutionary algorithm is used to perform the path searching process, when determining the target path from at least some paths of the oversized neural networks based on verification accuracy corresponding to each of the at least some paths after multiple rounds of training, for example, the following manner may be adopted:
determining alternative paths from a plurality of oversized neural networks after multiple rounds of training; wherein the alternative path comprises: the path to be trained determined in the last training round in the multiple training rounds;
determining a verification accuracy of the alternative path using the verification sample data;
and determining a target path from the alternative paths based on the verification accuracy of the alternative paths.
The path with poor performance is gradually screened out in each iteration process through the evolution algorithm in the searching process, and the path reserved in each round is a path with good performance, so that the path to be trained obtained by the last round of training belongs to the path with good performance in a high probability after multiple rounds of training. Therefore, the path to be trained of the last training in the multiple training rounds can be determined to be an alternative path, and the target path is determined from the alternative paths according to the verification precision respectively corresponding to each alternative path.
S502: and generating the target neural network by utilizing first operator internal parameters of each target operator in the target path in the plurality of oversized neural networks after multiple rounds of training and path weights of the target path in the plurality of oversized neural networks after multiple rounds of training.
Here, the operator internal parameters of each target operator in the target path in the target neural network can be obtained by using the above formula (1), for example.
It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.
Based on the same inventive concept, the embodiments of the present disclosure further provide a device for generating a neural network, where the device in the embodiments of the present disclosure is similar to the method for generating a neural network according to the embodiments of the present disclosure in terms of the principle of solving the problem, so that the implementation of the device may refer to the implementation of the method, and the repetition is omitted.
Referring to fig. 6, a schematic diagram of a generating device of a neural network according to an embodiment of the disclosure is shown, where the device includes:
A first determining module 61, configured to determine a path to be trained from a path search space determined based on a plurality of oversized neural networks with the same structure;
a processing module 62, configured to determine, based on a first operator internal parameter of each operator in the path to be trained in the plurality of oversized neural networks, and path weights of each operator in the path to be trained in the plurality of oversized neural networks, a second operator internal parameter of each operator in the path to be trained;
the training module 63 is configured to perform this round of training on the plurality of oversized neural networks by using the second operator internal parameters;
the generating module 64 is configured to generate a target neural network based on the plurality of oversized neural networks after the training for multiple rounds.
In a possible implementation manner, the first determining module 61 is configured to, when determining a path to be trained from a path search space determined based on a plurality of oversized neural networks with identical structures:
aiming at the condition that the present round of training is the first round of training, randomly determining a plurality of paths to be trained from the path searching space;
aiming at the condition that the training of the present round is non-first round training, based on training results respectively corresponding to a plurality of to-be-trained paths determined by previous round training, determining a father path in the training of the present round from the plurality of to-be-trained paths determined by the previous round training;
And performing mutation and/or cross processing on the father path to obtain a path to be trained corresponding to the round of training.
In a possible implementation manner, the processing module 62 is configured to, when determining, based on the first operator internal parameters of each operator in the path to be trained in the plurality of oversized neural networks and the path weights of each operator in the path to be trained in the plurality of oversized neural networks, determine the second operator internal parameters of each operator in the path to be trained, where:
and for each operator in the path to be trained, weighting and summing the first operator internal parameters of the operator in the oversized neural networks by utilizing the path weights of the path to be trained in the oversized neural networks respectively, so as to obtain the second operator internal parameters of the operator in the path to be trained.
In a possible implementation manner, the training module 63 is configured to, when performing the present training on the plurality of oversized neural networks by using the second operator internal parameters:
processing training sample data by using second operator internal parameters of each operator in the path to be trained to obtain a first processing result of the training sample data;
Determining a first loss based on the first processing result and tag information of the training sample data;
and adjusting first operator internal parameters of each operator in the oversized neural networks respectively based on the first loss.
In a possible implementation manner, the training module 63 is configured to, when processing training sample data by using the second operator internal parameters of each operator in the path to be trained to obtain a first processing result of the training sample data:
constructing a first neural network by using second operator internal parameters of each operator in the path to be trained;
and processing the training sample data by using the first neural network to obtain a first processing result of the training sample data.
In a possible embodiment, the method further comprises: a second determining module 65, configured to determine a path code of a path to be trained;
and obtaining path weights of the paths to be trained in the super-large neural networks respectively based on the path codes.
In a possible implementation manner, the second determining module 65 is configured, when determining the path code of the path to be trained, to:
determining the single-hot coding of each operator in the path to be trained;
And obtaining the path code of the path to be trained based on the single-hot codes of all operators in the path to be trained.
In a possible implementation manner, the second determining module 65 is configured to, when obtaining path weights of the path to be trained in the plurality of oversized neural networks based on the path encoding:
and carrying out full connection processing on the path codes, and obtaining path weights of the paths to be trained in the ultra-large neural networks respectively based on the full connection processing result.
In a possible implementation manner, the second determining module 65 is further configured to:
determining channel codes of the path to be trained based on the channel number of the output channels respectively corresponding to each operator in the path to be trained;
the second determining module is configured to, when obtaining path weights of the path to be trained in the plurality of oversized neural networks based on the path encoding, respectively:
carrying out data fusion processing on the path code and the channel code to obtain a fusion code;
and obtaining path weights of the paths to be trained in the super-large neural networks respectively based on the fusion codes.
In a possible implementation manner, the second determining module 65 is configured to, when performing data fusion processing on the path code and the channel code to obtain a fusion code:
performing first full connection processing on the path code to obtain a transformation path code; performing second full connection processing on the channel codes to obtain transformation channel codes;
and superposing the transformation path code and the transformation channel code to obtain the fusion code.
In a possible implementation manner, the second determining module 65 is configured to, when obtaining path weights of the path to be trained in the plurality of oversized neural networks based on the fusion codes, respectively:
performing third full connection processing on the fusion code to obtain a transformation fusion code;
and activating the transformation fusion codes to obtain path weights of the paths to be trained in the super-large neural networks respectively.
In a possible implementation manner, the second determining module 65 is configured to, when obtaining path weights of the path to be trained in the plurality of oversized neural networks based on the path encoding:
And performing full connection processing on the path codes by using a weight prediction network corresponding to the round of training to obtain path weights of the paths to be trained in a plurality of oversized neural networks respectively.
In a possible implementation manner, the second determining module 65 is further configured to: aiming at the condition that the round of training is the first round of training, the pre-generated initial weight prediction network is used as the weight prediction network corresponding to the round of training.
In a possible implementation manner, the second determining module 65 is further configured to:
aiming at the condition that the training is not the first training, the weight prediction network corresponding to the previous training is trained by utilizing the first loss generated during the previous training of the oversized neural network, so as to obtain the weight prediction network corresponding to the training.
In a possible implementation manner, the generating module 64 is configured to, when generating the target neural network based on the plurality of oversized neural networks after the training for multiple rounds:
determining a target path from at least partial paths based on verification precision corresponding to at least partial paths in the oversized neural network after multiple rounds of training;
and generating the target neural network by utilizing first operator internal parameters of each target operator in the target path in the plurality of oversized neural networks after multiple rounds of training and path weights of the target path in the plurality of oversized neural networks after multiple rounds of training.
In a possible implementation manner, the generating module 64 is configured to, when determining the target path from at least some paths in the oversized neural networks after multiple rounds of training based on verification accuracy corresponding to the at least some paths respectively:
determining alternative paths from a plurality of oversized neural networks after multiple rounds of training; wherein the alternative path comprises: the path to be trained determined in the last training round in the multiple training rounds;
determining a verification accuracy of the alternative path using the verification sample data;
and determining a target path from the alternative paths based on the verification accuracy of the alternative paths. .
The process flow of each module in the apparatus and the interaction flow between the modules may be described with reference to the related descriptions in the above method embodiments, which are not described in detail herein.
The embodiment of the disclosure further provides a computer device, as shown in fig. 7, which is a schematic structural diagram of the computer device provided by the embodiment of the disclosure, including:
a processor 71 and a memory 72; the memory 72 stores machine readable instructions executable by the processor 71, the processor 71 being configured to execute the machine readable instructions stored in the memory 72, the machine readable instructions when executed by the processor 71, the processor 71 performing the steps of:
Determining a path to be trained from a path search space determined based on a plurality of oversized neural networks with the same structure;
determining second operator internal parameters of each operator in the path to be trained based on first operator internal parameters of each operator in the path to be trained in a plurality of oversized neural networks and path weights of each operator in the path to be trained in a plurality of oversized neural networks;
performing the training of the super-large neural networks by using the second operator internal parameters;
and generating a target neural network based on the plurality of oversized neural networks after the training for multiple rounds.
The memory 72 includes a memory 721 and an external memory 722; the memory 721 is also referred to as an internal memory, and is used for temporarily storing operation data in the processor 71 and data exchanged with the external memory 722 such as a hard disk, and the processor 71 exchanges data with the external memory 722 via the memory 721.
The specific execution process of the above instruction may refer to steps of the neural network generation method described in the embodiments of the present disclosure, which are not described herein.
The disclosed embodiments also provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor performs the steps of the neural network generation method described in the method embodiments above. Wherein the storage medium may be a volatile or nonvolatile computer readable storage medium.
The embodiments of the present disclosure further provide a computer program product, where the computer program product carries a program code, where instructions included in the program code may be used to perform steps of a method for generating a neural network described in the foregoing method embodiments, and specifically reference may be made to the foregoing method embodiments, which are not described herein.
Wherein the above-mentioned computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in essence or a part contributing to the prior art or a part of the technical solution, or in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the embodiments of the present disclosure. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Finally, it should be noted that: the foregoing examples are merely specific embodiments of the present disclosure, and are not intended to limit the scope of the disclosure, but the present disclosure is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, it is not limited to the disclosure: any person skilled in the art, within the technical scope of the disclosure of the present disclosure, may modify or easily conceive changes to the technical solutions described in the foregoing embodiments, or make equivalent substitutions for some of the technical features thereof; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (17)

1. A method for generating a neural network, comprising:
acquiring image sample data;
determining a path to be trained from a path search space determined based on a plurality of oversized neural networks with the same structure, wherein the path to be trained comprises an optional operator of each network layer in a plurality of network layers of the oversized neural network, and the operator is used for processing image sample data input to the corresponding network layer in the neural network formed by the path to be trained;
Determining second operator internal parameters of each operator in the path to be trained based on first operator internal parameters of each operator in the path to be trained in a plurality of oversized neural networks and path weights of each operator in the path to be trained in a plurality of oversized neural networks;
performing the training of the super-large neural networks by using the second operator internal parameters; wherein, this round of training includes at least the following steps:
constructing a first neural network by using second operator internal parameters of each operator in the path to be trained;
performing target object detection or type detection processing on the image sample data by using the first neural network to obtain a first processing result of the image sample data;
determining a first loss based on the first processing result and tag information of the image sample data corresponding to target object detection or type detection of the target object;
based on the first loss, adjusting first operator internal parameters of each operator in the oversized neural networks respectively;
and generating a target neural network based on the plurality of oversized neural networks after the training for multiple rounds.
2. The method according to claim 1, wherein determining the path to be trained from the path search space determined based on the plurality of structurally identical oversized neural networks comprises:
aiming at the condition that the present round of training is the first round of training, randomly determining a plurality of paths to be trained from the path searching space;
aiming at the condition that the training of the present round is non-first round training, based on training results respectively corresponding to a plurality of to-be-trained paths determined by previous round training, determining a father path in the training of the present round from the plurality of to-be-trained paths determined by the previous round training;
and performing mutation and/or cross processing on the father path to obtain a path to be trained corresponding to the round of training.
3. The generating method according to claim 1 or 2, wherein the determining the second operator internal parameters of each operator in the path to be trained based on the first operator internal parameters of each operator in the path to be trained in the plurality of oversized neural networks and the path weights of each operator in the path to be trained in the plurality of oversized neural networks, comprises:
and for each operator in the path to be trained, weighting and summing the first operator internal parameters of the operator in the oversized neural networks by utilizing the path weights of the path to be trained in the oversized neural networks respectively, so as to obtain the second operator internal parameters of the operator in the path to be trained.
4. A method of generating as claimed in any one of claims 1 to 3, further comprising: determining a path code of a path to be trained;
and obtaining path weights of the paths to be trained in the super-large neural networks respectively based on the path codes.
5. The method of generating of claim 4, wherein the determining the path code of the path to be trained comprises:
determining the single-hot coding of each operator in the path to be trained;
and obtaining the path code of the path to be trained based on the single-hot codes of all operators in the path to be trained.
6. The generating method according to claim 4 or 5, wherein the obtaining path weights of the path to be trained in the plurality of oversized neural networks based on the path codes includes:
and carrying out full connection processing on the path codes, and obtaining path weights of the paths to be trained in the ultra-large neural networks respectively based on the full connection processing result.
7. The method of generating according to any one of claims 4 to 6, further comprising:
determining channel codes of the path to be trained based on the channel number of the output channels respectively corresponding to each operator in the path to be trained;
The obtaining path weights of the path to be trained in the oversized neural networks based on the path codes comprises the following steps:
carrying out data fusion processing on the path code and the channel code to obtain a fusion code;
and obtaining path weights of the paths to be trained in the super-large neural networks respectively based on the fusion codes.
8. The method of generating as claimed in claim 7, wherein said performing a data fusion process on said path code and said channel code to obtain a fusion code comprises:
performing first full connection processing on the path code to obtain a transformation path code; and
performing second full connection processing on the channel code to obtain a conversion channel code;
and superposing the transformation path code and the transformation channel code to obtain the fusion code.
9. The generating method according to claim 7 or 8, wherein the obtaining path weights of the path to be trained in the plurality of oversized neural networks based on the fusion codes includes:
performing third full connection processing on the fusion code to obtain a transformation fusion code;
And activating the transformation fusion codes to obtain path weights of the paths to be trained in the super-large neural networks respectively.
10. The generating method according to any one of claims 4-9, wherein the obtaining path weights of the path to be trained in the plurality of oversized neural networks based on the path codes includes:
and performing full connection processing on the path codes by using a weight prediction network corresponding to the round of training to obtain path weights of the paths to be trained in a plurality of oversized neural networks respectively.
11. The method of generating of claim 10, further comprising:
aiming at the condition that the round of training is the first round of training, the pre-generated initial weight prediction network is used as the weight prediction network corresponding to the round of training.
12. The generation method according to claim 10 or 11, characterized by further comprising:
aiming at the condition that the training is not the first training, the weight prediction network corresponding to the previous training is trained by utilizing the first loss generated during the previous training of the oversized neural network, so as to obtain the weight prediction network corresponding to the training.
13. The method of generating according to any one of claims 1 to 12, wherein the generating a target neural network based on the plurality of oversized neural networks after the plurality of rounds of training includes:
determining a target path from at least partial paths based on verification precision corresponding to at least partial paths in the oversized neural network after multiple rounds of training;
and generating the target neural network by utilizing first operator internal parameters of each target operator in the target path in the plurality of oversized neural networks after multiple rounds of training and path weights of the target path in the plurality of oversized neural networks after multiple rounds of training.
14. The method of generating according to claim 13, wherein determining a target path from at least some paths of the plurality of oversized neural networks after the multiple rounds of training based on verification accuracy corresponding to the at least some paths, respectively, comprises:
determining alternative paths from a plurality of oversized neural networks after multiple rounds of training; wherein the alternative path comprises: the path to be trained determined in the last training round in the multiple training rounds;
determining a verification accuracy of the alternative path using the verification sample data;
And determining a target path from the alternative paths based on the verification accuracy of the alternative paths.
15. A neural network generation apparatus, comprising:
acquiring image sample data;
the first determining module is used for determining a path to be trained from a path search space determined based on a plurality of oversized neural networks with the same structure; the path to be trained comprises an optional operator of each network layer in a plurality of network layers of the oversized neural network, and the operator is used for processing image sample data input to the corresponding network layer in the neural network formed by the path to be trained;
the processing module is used for determining second operator internal parameters of each operator in the path to be trained based on first operator internal parameters of each operator in the path to be trained in a plurality of oversized neural networks and path weights of the path to be trained in a plurality of oversized neural networks;
the training module is used for performing the round of training on the oversized neural network by utilizing the second operator internal parameters; wherein, this round of training includes at least the following steps:
Constructing a first neural network by using second operator internal parameters of each operator in the path to be trained;
performing target object detection or type detection processing on the image sample data by using the first neural network to obtain a first processing result of the image sample data;
determining a first loss based on the first processing result and tag information of the image sample data corresponding to target object detection or type detection of the target object;
based on the first loss, adjusting first operator internal parameters of each operator in the oversized neural networks respectively;
and the generating module is used for generating a target neural network based on the plurality of oversized neural networks after the training for multiple rounds.
16. A computer device, comprising: a processor, a memory storing machine-readable instructions executable by the processor for executing the machine-readable instructions stored in the memory, which when executed by the processor, perform the steps of the method of generating a neural network as claimed in any one of claims 1 to 14.
17. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when run by a computer device, performs the steps of the method of generating a neural network according to any of claims 1 to 14.
CN202110602323.8A 2021-05-31 2021-05-31 Neural network generation method and device, electronic equipment and storage medium Active CN113326922B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110602323.8A CN113326922B (en) 2021-05-31 2021-05-31 Neural network generation method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110602323.8A CN113326922B (en) 2021-05-31 2021-05-31 Neural network generation method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113326922A CN113326922A (en) 2021-08-31
CN113326922B true CN113326922B (en) 2023-06-13

Family

ID=77422827

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110602323.8A Active CN113326922B (en) 2021-05-31 2021-05-31 Neural network generation method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113326922B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116306796B (en) * 2023-05-17 2023-09-15 北京智源人工智能研究院 Model self-growth training acceleration method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110689115A (en) * 2019-09-24 2020-01-14 上海寒武纪信息科技有限公司 Neural network model processing method and device, computer equipment and storage medium
CN110956262A (en) * 2019-11-12 2020-04-03 北京小米智能科技有限公司 Hyper network training method and device, electronic equipment and storage medium
WO2020082663A1 (en) * 2018-10-26 2020-04-30 北京图森未来科技有限公司 Structural search method and apparatus for deep neural network
WO2021043193A1 (en) * 2019-09-04 2021-03-11 华为技术有限公司 Neural network structure search method and image processing method and device
CN112686371A (en) * 2020-12-25 2021-04-20 深圳前海微众银行股份有限公司 Network structure search method, device, equipment, storage medium and program product
CN112700003A (en) * 2020-12-25 2021-04-23 深圳前海微众银行股份有限公司 Network structure search method, device, equipment, storage medium and program product

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11507846B2 (en) * 2018-03-26 2022-11-22 Nvidia Corporation Representing a neural network utilizing paths within the network to improve a performance of the neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020082663A1 (en) * 2018-10-26 2020-04-30 北京图森未来科技有限公司 Structural search method and apparatus for deep neural network
WO2021043193A1 (en) * 2019-09-04 2021-03-11 华为技术有限公司 Neural network structure search method and image processing method and device
CN110689115A (en) * 2019-09-24 2020-01-14 上海寒武纪信息科技有限公司 Neural network model processing method and device, computer equipment and storage medium
CN110956262A (en) * 2019-11-12 2020-04-03 北京小米智能科技有限公司 Hyper network training method and device, electronic equipment and storage medium
CN112686371A (en) * 2020-12-25 2021-04-20 深圳前海微众银行股份有限公司 Network structure search method, device, equipment, storage medium and program product
CN112700003A (en) * 2020-12-25 2021-04-23 深圳前海微众银行股份有限公司 Network structure search method, device, equipment, storage medium and program product

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LOCALLY FREE WEIGHT SHARING FOR NETWORK WIDTH SEARCH;Xiu Su 等;ICLR 2021;全文 *
基于神经网络结构搜索的目标识别方法;卞伟伟 等;空军工程大学学报;全文 *

Also Published As

Publication number Publication date
CN113326922A (en) 2021-08-31

Similar Documents

Publication Publication Date Title
Ruthotto et al. An introduction to deep generative modeling
US11562244B2 (en) Robust pruned neural networks via adversarial training
Zhou et al. Context-aware variational trajectory encoding and human mobility inference
WO2022252455A1 (en) Methods and systems for training graph neural network using supervised contrastive learning
CN113822315A (en) Attribute graph processing method and device, electronic equipment and readable storage medium
CN113326922B (en) Neural network generation method and device, electronic equipment and storage medium
Vivekanandan et al. An intelligent genetic algorithm for mining classification rules in large datasets
Sun et al. Sparse attention with learning to hash
Benmeziane et al. Multi-objective hardware-aware neural architecture search with Pareto rank-preserving surrogate models
US11721413B2 (en) Method and system for performing molecular design using machine learning algorithms
Yang et al. Predictive clinical decision support system with RNN encoding and tensor decoding
Eyraud et al. TAYSIR Competition: Transformer+\textscrnn: Algorithms to Yield Simple and Interpretable Representations
Janakarajan et al. A fully differentiable set autoencoder
Kaloga et al. Multiview variational graph autoencoders for canonical correlation analysis
CN116471281A (en) Decentralised service combination method considering node selfiness
Liu et al. Semi‐supervised breast histopathological image classification with self‐training based on non‐linear distance metric
CN112396477B (en) Construction method and device of business prediction model
Chen et al. Unsupervised Speaker Verification Using Pre-Trained Model and Label Correction
Esfahanizadeh et al. InfoShape: Task-based neural data shaping via mutual information
Liu et al. Evolving transferable neural pruning functions
Ma et al. Augmenting Recurrent Graph Neural Networks with a Cache
US20230281964A1 (en) Deep metric learning model training with multi-target adversarial examples
Yang et al. ScrollNet: DynamicWeight Importance for Continual Learning
Wang et al. Pre-training a graph recurrent network for language representation
Amjad Applications of Information Theory and Factor Graphs for Machine Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant