CN111563591B - Super network training method and device - Google Patents

Super network training method and device Download PDF

Info

Publication number
CN111563591B
CN111563591B CN202010383356.3A CN202010383356A CN111563591B CN 111563591 B CN111563591 B CN 111563591B CN 202010383356 A CN202010383356 A CN 202010383356A CN 111563591 B CN111563591 B CN 111563591B
Authority
CN
China
Prior art keywords
network
super
cut
feature extraction
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010383356.3A
Other languages
Chinese (zh)
Other versions
CN111563591A (en
Inventor
希滕
张刚
温圣召
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010383356.3A priority Critical patent/CN111563591B/en
Publication of CN111563591A publication Critical patent/CN111563591A/en
Application granted granted Critical
Publication of CN111563591B publication Critical patent/CN111563591B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to the field of artificial intelligence and discloses a training method and device for a super network. The method comprises the following steps: acquiring sample data; taking the super-network to be trained as an initial current super-network, and iteratively executing cutting training operations for a plurality of times until the connection number reserved by each feature extraction layer of the current super-network is 1; training the cut out super network based on sample data in response to determining that the cut out super network does not reach a preset convergence condition; the clipping training operation includes: training the current super network; performing feature extraction on the image data by using the trained current super network to obtain a first feature map; cutting the feature extraction layer in the trained super network for N times respectively, and carrying out feature extraction on the image data by using the cut super network to obtain N groups of second feature images; and determining the cut super network corresponding to the group of second feature images with the smallest distance with the first feature images as a new current super network. The method improves the accuracy of the super network.

Description

Super network training method and device
Technical Field
The embodiment of the disclosure relates to the technical field of computers, in particular to the technical field of artificial intelligence, and particularly relates to a training method and device of a super network.
Background
With the development of artificial intelligence technology and data storage technology, deep neural networks have achieved important achievements in many fields of task. The structure of the neural network model has an important influence on the performance of the neural network model, and the design of the traditional neural network model structure depends on expert knowledge.
NAS (Neural Architecture Search, network structure search) is a technique that automatically searches for optimal network results by evaluating the performance of different network structures. NAS requires an independent assessment of the performance of each sub-network and thus search efficiency is low. In order to solve the problem of NAS search efficiency, a super-network comprising a plurality of complete neural network structures may be trained, all network structures of the super-network sharing parameters of the super-network. However, since all network structures in the super network coexist, there is a problem of mutual exclusion in the performance of different network structures in the training of the super network. Although the super-network training solves the problem of network structure searching efficiency, the performance of the sub-network obtained based on the super-network training is different from that of the sub-network independently trained, so that the optimal model structure cannot be accurately searched based on the super-network.
Disclosure of Invention
Embodiments of the present disclosure provide a training method and apparatus for a super network, an electronic device, and a computer-readable storage medium.
According to a first aspect, there is provided a training method of a super network, comprising: acquiring sample data; taking the super-network to be trained as an initial current super-network, and iteratively executing cutting training operations for a plurality of times until the connection number reserved by each feature extraction layer of the current super-network is 1, thereby obtaining a cut super-network; training the cut out super network based on sample data in response to determining that the cut out super network does not reach a preset convergence condition; wherein, the clipping training operation includes: training the current super network based on the sample data; performing feature extraction on the image data to be processed by using the trained current super network to obtain a first feature map; determining the number N of connections contained in the feature extraction layers aiming at each feature extraction layer of the current super network after training, respectively cutting the feature extraction layers in the super network after training for N times to obtain N cut super networks, respectively carrying out feature extraction on image data to be processed by utilizing the N cut super networks to obtain N corresponding groups of second feature graphs, wherein one of the N connections contained in the feature extraction layers is cut in each cut; determining the cut super network corresponding to the group of second feature images with the smallest distance between the N groups of second feature images and the first feature image as a new current super network; in response to determining that the number of connections of the feature extraction layer in the new current super network is greater than 1, a next crop training operation is performed.
According to a second aspect, there is provided a training device for a super network, comprising: an acquisition unit configured to acquire sample data; the first training unit is configured to take the super-network to be trained as an initial current super-network, and iteratively execute cutting training operations for a plurality of times until the connection numbers reserved by each feature extraction layer of the current super-network are 1, so as to obtain a cut super-network; a second training unit configured to train the cut out super network based on the sample data in response to determining that the cut out super network does not reach a preset convergence condition; wherein the first training unit comprises: a training subunit configured to perform the following steps in the crop training operation: training the current super network based on the sample data; a feature extraction subunit configured to perform the following steps in the crop training operation: performing feature extraction on the image data to be processed by using the trained current super network to obtain a first feature map; a clipping subunit configured to perform the following steps in a clipping training operation: determining the number N of connections contained in the feature extraction layers aiming at each feature extraction layer of the current super network after training, respectively cutting the feature extraction layers in the super network after training for N times to obtain N cut super networks, respectively carrying out feature extraction on image data to be processed by utilizing the N cut super networks to obtain N corresponding groups of second feature graphs, wherein one of the N connections contained in the feature extraction layers is cut in each cut; a determination subunit configured to perform the following steps in the crop training operation: determining the cut super network corresponding to the group of second feature images with the smallest distance between the N groups of second feature images and the first feature image as a new current super network; an iteration subunit configured to perform the following steps in the crop training operation: in response to determining that the number of connections of the feature extraction layer in the new current super network is greater than 1, a next crop training operation is performed.
According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of training the super network provided in the first aspect.
According to a fourth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the training method of the super network provided by the first aspect.
The method of the application improves the accuracy of the super network.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
Other features, objects and advantages of the present disclosure will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the following drawings:
FIG. 1 is a flow chart of one embodiment of a training method of the super network of the present disclosure;
FIG. 2 shows a schematic diagram of a cropped structure of a feature extraction layer of a super network;
FIG. 3 is a flow chart of another embodiment of a training method of the super network of the present disclosure;
FIG. 4 is a schematic diagram of an embodiment of a training device of the super network of the present disclosure;
fig. 5 is a block diagram of an electronic device used to implement the training method of the super network of an embodiment of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be noted that, for convenience of description, only the portions related to the present application are shown in the drawings.
It should be noted that, without conflict, the embodiments of the present disclosure and features of the embodiments may be combined with each other. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The method or apparatus of the present disclosure may be applied to a terminal device or a server, or may be applied to a system architecture including a terminal device, a network, and a server. The medium used by the network to provide a communication link between the terminal device and the server may include various connection types, such as a wired, wireless communication link, or fiber optic cable, among others.
The terminal device may be a user end device on which various client applications may be installed. Such as image processing class applications, search applications, voice service class applications, etc. The terminal device may be hardware or software. When the terminal device is hardware, it may be a variety of electronic devices including, but not limited to, smartphones, tablets, electronic book readers, laptop and desktop computers, and the like. When the terminal device is software, it can be installed in the above-listed electronic device. Which may be implemented as a plurality of software or software modules, or as a single software or software module. The present invention is not particularly limited herein.
The server may be a server running various services, such as a server running a service based on object detection and recognition of data of images, video, voice, text, digital signals, etc., text or voice recognition, signal conversion, etc. The server may obtain various media data as training sample data for the deep learning task, such as image data, audio data, text data, and the like. The server can also train the super network by utilizing training sample data according to specific deep learning tasks, sample sub-networks from the super network for evaluation, and determine the structure and parameters of the neural network model for executing the deep learning tasks according to the evaluation results of the sub-networks.
The server can also send the determined data such as the structure and parameters of the neural network model to the terminal equipment. And the terminal equipment deploys and runs the neural network model locally according to the received data so as to execute the corresponding deep learning task.
The server may be hardware or software. When the server is hardware, the server may be implemented as a distributed server cluster formed by a plurality of servers, or may be implemented as a single server. When the server is software, it may be implemented as a plurality of software or software modules (e.g., a plurality of software or software modules for providing distributed services), or as a single software or software module. The present invention is not particularly limited herein.
It should be noted that, the method for training the super network provided by the embodiments of the present disclosure may be performed by a terminal device or a server, and accordingly, the training apparatus of the super network may be disposed in the terminal device or the server.
Referring to fig. 1, a flow 100 of one embodiment of a training method for a super network according to the present disclosure is shown. The training method of the super network comprises the following steps:
step 101, sample data is acquired.
In this embodiment, the execution subject of the training method of the super network may acquire sample data for training. The sample data may be image data, which may be collected in advance and stored in a database. The execution body may acquire sample data from a database. Alternatively, the sample data may be stored locally on the execution body, and the execution body may read the locally stored sample data.
And 102, taking the super network to be trained as an initial current super network, and iteratively executing cutting training operations for a plurality of times until the connection number reserved by each feature extraction layer of the current super network is 1, thereby obtaining the cut super network.
A super network to be trained may be acquired. The super-network to be trained can be built based on optional structural elements of layers of the neural network model, and a complete neural network model can be built by sampling one connection in each layer. Here, a connection means a structure formed by connecting an optional structural unit in a layer immediately above the layer in which the connection is located with an optional structural unit in the layer in which the connection is located. For example layer C comprises A, B two optional structural units, the optional structural unit G of the upper layer of layer C is connected to one of the structural forming layers C to which a is connected, and the optional structural unit G of the upper layer of layer C is connected to the other of the structural forming layers C to which B is connected.
Each layer of the super network to be trained may comprise at least one connection. The cutting training operation can be performed for a plurality of times through iteration, and in the process of training the super network, the connection of each layer is cut in sequence until each layer of the super network only keeps 1 connection time, and the cutting training operation is stopped. At this time, each layer of the tailored super-network only contains one connection, and the tailored super-network structure can be used as a neural network model for executing deep learning tasks.
Specifically, the crop training operation includes the following steps 1021, 1022, 1023, 1024, and 1025:
at step 1021, the current super-network is trained based on the sample data.
In each clipping training operation, the current super-network is first trained with sample data. The method comprises the steps of sampling a plurality of sub-networks from a current super-network, respectively training each sampled sub-network based on sample data, testing the performance of each sub-network after convergence of each sub-network, calculating the performance of the current super-network based on the performance of each sub-network, feeding back the performance of the current super-network forward to iteratively adjust the parameters of the current super-network, and stopping iterative update of the parameters of the current super-network when the update rate of the parameters of the current super-network is smaller than a preset range or the number of times of iterative update of the current super-network reaches a frequency threshold value, so as to obtain the trained current super-network.
And step 1022, performing feature extraction on the image data to be processed by using the trained current super network to obtain a first feature map.
The trained current super-network may include a plurality of feature extraction layers, wherein the feature extraction layers are used to extract features of the input image data. Each feature extraction layer may include at least one connection. In this embodiment, feature extraction may be performed on the image data to be processed based on all the connections in the super network after training. For a feature extraction layer comprising at least two connections, the feature map extracted therefrom may comprise a feature map generated by the respective connections of the feature extraction layer, or may be a weighted or averaged calculation of a feature map generated by the respective connections of the feature extraction layer.
In this embodiment, the current super network may include a nonlinear layer for image classification or regression according to the features extracted by the feature extraction layer. The nonlinear layer may be preceded by a fully connected layer that connects the individual connections of the last feature extraction layer of the super network. The feature map output by the last feature extraction layer may be used as the first feature map, or the feature map output by the full connection layer after the last feature extraction layer may be used as the first feature map.
Step 1023, determining the number N of connections contained in the feature extraction layers for each feature extraction layer of the trained current super network, respectively cutting the feature extraction layers in the trained super network for N times to obtain N cut super networks, and respectively carrying out feature extraction on image data to be processed by utilizing the N cut super networks to obtain N groups of corresponding second feature graphs.
The number of connections included in each feature extraction layer in the trained current super-network may be preset, or the number of connections of each feature extraction layer may be obtained by detecting the structure of the trained current super-network. And executing N times of cutting on one feature extraction layer in the trained current super network by determining that the number of the contained connections is N, wherein each time of cutting is respectively used for cutting one of the N contained connections of the feature extraction layer. And performing cutting for N times to obtain N cut super networks.
And respectively carrying out feature extraction on the image data by utilizing N cut super networks, and extracting a group of second feature graphs from each cut super network. Here, the feature map extracted by the last feature extraction layer of the cut-out super network may be used as the second feature map extracted by the cut-out super network, or the feature map output by the full connection layer after the last feature extraction layer may be used as the second feature map extracted by the cut-out super network.
FIG. 2 is a schematic diagram showing a cut-out of the feature extraction layer of the current super-network and the extraction of a second feature map based on the cut-out super-network during each cut-out training operation.
As shown in fig. 2, one feature extraction layer E of the super network includes three connections a, b, and c, and a, b, and c are respectively clipped to obtain a clipped super network X, Y, Z, where the feature extraction layer E of the clipped super network X retains b and c, the feature extraction layer E of the clipped super network Y retains a and c, and the feature extraction layer E of the clipped super network Z retains a and b. The second feature maps extracted by the super network X, Y, Z are Fx, fy, and Fz, respectively.
Step 1024, determining the cut super network corresponding to the group of the second feature maps with the smallest distance between the first feature maps in the N groups of the second feature maps as the new current super network.
The distance between a set of second feature maps and the first feature maps corresponding to each tailored super-network may be calculated. Here, the distance between feature maps can be measured by the similarity of the two. Alternatively, the feature maps may be converted into feature vectors, and the distance between the feature vectors may be calculated as the distance between the feature maps.
And after calculating the distance between each group of second feature images and the first feature images, selecting a group of cut super networks corresponding to the second feature images with the smallest distance as new current super networks. And reserving the connection in the cut super network corresponding to the group of second characteristic diagrams with the smallest distance.
In each clipping training operation, the connection with the smallest difference between the characteristic diagram extracted by the clipped super-network and the characteristic diagram extracted by the super-network before clipping is selected for clipping, and compared with other connections, the clipped connection has smaller influence on the performance of the super-network, so that the dependence of the retained connection in the same characteristic extraction layer on the clipped connection can be reduced by continuing to train the super-network based on the retained connection, thereby improving the coexistence of all sub-networks in the super-network and improving the performance of all sub-networks in the super-network after training.
In response to determining that the number of connections of the feature extraction layer in the new current super network is greater than 1, a next crop training operation is performed, step 1025.
After updating the current super-network, if it is determined that the number of connections of the feature extraction layer in the current super-network is greater than 1, step 1021 may be returned to perform the next clipping training operation.
For each feature extraction layer of the current super network, training and clipping are sequentially performed according to the steps 1021, 1022, 1023, 1024 and 1025 until each feature extraction layer of the current super network only includes one connection, and the clipping training operation can be stopped.
And step 103, training the cut out super network based on the sample data in response to determining that the cut out super network does not reach the preset convergence condition.
After step 102, the number of connections reserved by each feature extraction layer of the cut-out super-network is 1, and at this time, it may be determined whether the cut-out super-network meets a preset convergence condition, where the preset convergence condition may include that performance indexes such as accuracy, memory occupancy, or delay of the super-network reach a preset threshold, or may include that the number of parameter iterations of the super-network reaches a preset number of times threshold.
If the cut out super network does not reach the preset convergence condition, the cut out super network can be trained continuously based on the sample data.
In the above embodiment, since only one connection is reserved for each feature extraction layer in the cut-out super network, and the reserved connection is one connection with the largest difference between the feature extraction layer and the feature graph extracted by the super network in the same feature extraction layer, the reserved connection of each feature extraction layer is less susceptible to other structures or parameters in the super network than other connections in the super network, and the sub-network constructed by the reserved connection in the cut-out super network is more independent from the performance of other sub-networks in the non-cut super network, thereby reducing the gap between the performance of the sub-network obtained based on super network training and the performance of the sub-network trained independently, and improving the accuracy of the super network. The performance of the searched sub-network can be ensured while the searching efficiency of the network structure is improved by utilizing the super-network.
In some optional implementations of the foregoing embodiments, the foregoing method for training a super network may further include: and in response to determining that the cut-out super-network reaches a preset convergence condition and the connection number of each feature extraction layer in the cut-out super-network is 1, constructing a target neural network model based on each feature extraction layer of the cut-out super-network.
When the cut out super network reaches a preset convergence condition through a plurality of cutting out training operations and the connection number reserved by each feature extraction layer in the cut out super network is 1, the training of the super network can be stopped, and the trained super network is obtained.
Each feature extraction layer of the target neural network model can be correspondingly constructed based on each feature extraction layer in the trained super network, and parameters of each feature extraction layer in the trained super network can be synchronized into the target neural network model. For other layers in the target neural network model, such as a full-connection layer, a classifier and the like, sampling can be carried out from optional structures of corresponding layers in the super network, the complete target neural network model is constructed by searching out the optimal performance after being combined and stacked with the characteristic extraction layer of the target neural network model, and corresponding parameters of the other layers can be synchronized from the super network after training.
Therefore, as only the structures of other layers except the feature extraction layer are required to be searched from the super network, the efficiency of searching the optimal target neural network model based on the super network can be effectively improved, the performance of the target neural network model searched based on the super network is improved, and the performance of the searched target neural network model is closer to that of the neural network model with the same structure independently trained.
The target neural network model may be further utilized to process image data to be processed in the deep learning task, or the structure and parameters of the target neural network model may be sent to devices (such as terminal devices) at other ends to deploy the target neural network model at the devices at other ends. Because the accuracy of the target neural network is higher, a more accurate image processing result can be obtained.
In some optional implementations of the above embodiment, the clipping operation may further include: and responding to the connection quantity of the feature extraction layer in the new current super network as 1, and storing the weight parameters corresponding to the connection in the feature extraction layer in the new current super network. At this time, in the above step 203, the cut-out super network may be trained as follows: and taking the weight parameters corresponding to the connection in each feature extraction layer in the cut-out super network as initial weight parameters in the cut-out super network, and carrying out iterative updating on the weight parameters in the cut-out super network based on sample data.
Specifically, when the clipping training operation is performed a plurality of times such that the number of connections of each feature extraction layer of the super network is 1, the weight parameter of one connection held by each feature extraction layer may be saved as the weight parameter of the clipping completed super network, and then the training operation of the clipping completed super network may be performed by sampling the sub-network from the clipping completed super network, training the sub-network, evaluating the performance of the sub-network to obtain the performance of the clipping completed super network, and iteratively updating the parameters of the clipping completed super network based on the performance of the super network.
Therefore, the parameters of the super network can be further optimized on the basis of the weight parameters obtained in the cutting training of the super network, the convergence speed of the super network can be improved, and therefore the calculation resources occupied by the super network training are reduced.
Referring to fig. 3, a flow chart of another embodiment of a training method of a super network according to the present disclosure is shown. As shown in fig. 3, a flow 300 of the training method of the super network of the present embodiment includes the following steps:
in step 301, sample data is acquired.
In this embodiment, the execution subject of the training method of the super network may acquire sample data for training. The sample data may be image data.
And 302, taking the super network to be trained as an initial current super network, and iteratively executing cutting training operations for a plurality of times until the connection number reserved by each feature extraction layer of the current super network is 1, thereby obtaining the cut super network.
The clipping operation includes the following steps 3021, 3022, 3023, 3024, 3025, and 3026.
Step 3021, training a current super network based on sample data.
And 3022, performing feature extraction on the image data to be processed by using the trained current super network to obtain a first feature map.
Step 3023, determining, for each feature extraction layer of the trained current super network, the number of connections N included in the feature extraction layer, cutting the feature extraction layer in the trained super network N times to obtain N cut super networks, and performing feature extraction on image data to be processed by using the N cut super networks to obtain N sets of corresponding second feature graphs. Wherein each cropping is performed separately for one of the N connections comprised by the feature extraction layer.
And 3024, determining the cut super network corresponding to the group of the second feature maps with the smallest distance between the first feature maps in the N groups of the second feature maps as a new current super network.
In response to determining that the number of connections of the feature extraction layer in the new current super network is greater than 1, a next crop training operation is performed, step 3025.
The steps 301, 3021, 3022, 3023, 3024, and 3025 are identical to the steps 101, 1021, 1022, 1023, 1024, and 1025 of the foregoing embodiments, respectively, and are not described herein.
Step 3026, determining a group of second feature maps with the smallest distance from the first feature maps in the N groups of second feature maps as target second feature maps, and storing a weight parameter corresponding to one cut connection in the cut super network corresponding to the target second feature maps.
In this embodiment, after N sets of second feature maps are obtained for each feature extraction layer, a set of second feature maps with the smallest distance from the first feature map may be used as the target second feature map, and a parameter corresponding to a cut out connection in the cut-out super network corresponding to the target second feature map may be stored as a final parameter of the cut out connection obtained in the cutting training process.
As an example, referring back to fig. 2, the feature extraction layer E of the super network performs clipping N times to obtain a super network X, Y, Z, calculates distances between the second feature maps FX, FY, and FZ extracted from the super network X, Y, Z and the first feature map extracted from the super network before clipping, respectively, where the second feature map with the smallest distance between the second feature map and the first feature map extracted from the super network before clipping is FY, and stores parameters of the connection b clipped in the super network Y corresponding to the second feature map FY. Therefore, the weight parameters corresponding to the cut-out connection can be stored, the structure of the super network before cutting can be restored after the super network finishes cutting training, and the trained super network can be obtained based on the stored weight parameters. Therefore, the number of parameters needing iteration in the cutting training process can be gradually reduced, and the training efficiency of the super network is improved. And because the tailored connections have a higher consistency with the overall performance of the super network, the tailored connections have less impact on the performance of the super network. And, the connection that some were cut out has been trained and optimized corresponding weight parameter through several times between cutting out, through saving the super network that the connection that is cut out correspondent weight parameter reconfigurated has good performance.
In step 303, training the cut out super network based on the sample data in response to determining that the cut out super network does not reach the preset convergence condition.
Step 303 in this embodiment corresponds to step 103 in the foregoing embodiment, and the specific implementation of step 303 may refer to the description of step 103, which is not repeated herein.
Optionally, the process 300 of the above training method of the super network may further include:
and step 304, generating a trained super network according to the saved weight parameters corresponding to the cut-out connection in each feature extraction layer of the super network to be trained and the weight parameters corresponding to the reserved connection in each feature extraction layer of the cut-out super network after the cut-out super network training is finished.
The structure of the super network before the cutting training can be restored, and the weight parameters of the corresponding connections in the super network are determined according to the saved weight parameters corresponding to the cut connections in the feature extraction layers of the super network to be trained and the weight parameters corresponding to the reserved connections in the feature extraction layers of the cut super network after the cutting super network training is completed. Thus, a complete training-completed super-network can be obtained.
The trained super-network comprises a plurality of sub-network structures, and can be applied to rapidly evaluate the performance of the sub-network so as to search for the optimal sub-network structure. And the parameters of the super network after training are subjected to repeated iterative updating in the cutting training process, so that the super network has good performance, and the optimal sub-network search result based on the super network after training is more accurate.
Referring to fig. 4, as an implementation of the above-mentioned training method of the super network, the present disclosure provides an embodiment of a training apparatus of the super network, where the embodiment of the apparatus corresponds to the embodiments of the above-mentioned methods, and the apparatus may be specifically applied to various electronic devices.
As shown in fig. 4, the training apparatus 400 of the super network of the present embodiment includes an acquisition unit 401, a first training unit 402, and a second training unit 403. Wherein the acquisition unit 401 is configured to acquire sample data; the first training unit 402 is configured to iteratively perform a plurality of clipping training operations with the super-network to be trained as an initial current super-network until the connection numbers reserved by each feature extraction layer of the current super-network are all 1, thereby obtaining a clipping-completed super-network; the second training unit 403 is configured to train the cut out super network based on the sample data in response to determining that the cut out super network does not reach the preset convergence condition. The first training unit 402 includes: training subunit 4021 is configured to perform the following steps in a crop training operation: training the current super network based on the sample data; the feature extraction subunit 4022 is configured to perform the following steps in the crop training operation: performing feature extraction on the image data to be processed by using the trained current super network to obtain a first feature map; the clipping subunit 4023 is configured to perform the following steps in the clipping training operation: determining the number N of connections contained in the feature extraction layers aiming at each feature extraction layer of the current super network after training, respectively cutting the feature extraction layers in the super network after training for N times to obtain N cut super networks, respectively carrying out feature extraction on image data to be processed by utilizing the N cut super networks to obtain N corresponding groups of second feature graphs, wherein one of the N connections contained in the feature extraction layers is cut in each cut; a determination subunit 4024 configured to perform the following steps in the crop training operation: determining the cut super network corresponding to the group of second feature images with the smallest distance between the N groups of second feature images and the first feature image as a new current super network; an iteration subunit 4025 configured to perform the following steps in the crop training operation: in response to determining that the number of connections of the feature extraction layer in the new current super network is greater than 1, a next crop training operation is performed.
In some embodiments, the apparatus further comprises: a construction unit configured to: and in response to determining that the cut-out super-network reaches a preset convergence condition and the connection number of each feature extraction layer in the cut-out super-network is 1, constructing a target neural network model based on each feature extraction layer of the cut-out super-network.
In some embodiments, the first training unit 402 further includes: a first storage subunit configured to perform the following steps in a crop training operation: in response to determining that the number of connections of the feature extraction layer in the new current super network is 1, saving weight parameters corresponding to the connections in the feature extraction layer in the new current super network; and the second training unit 403 is configured to train the tailored super-network as follows: and taking the weight parameters corresponding to the connection in each feature extraction layer in the cut-out super network as initial weight parameters in the cut-out super network, and carrying out iterative updating on the weight parameters in the cut-out super network based on sample data.
In some embodiments, the first training unit 402 further includes: a second save subunit configured to perform the following steps in the crop training operation: and determining a group of second feature images with the smallest distance with the first feature images in the N groups of second feature images as target second feature images, and storing a weight parameter corresponding to one cut connection in the cut super network corresponding to the target second feature images.
In some embodiments, the apparatus further comprises: the generation unit is configured to generate a training-completed super network according to the saved weight parameters corresponding to the cut-out connection in each feature extraction layer of the super network to be trained and the weight parameters corresponding to the reserved connection in each feature extraction layer of the cut-out super network after the cut-out super network training is completed.
The above-described apparatus 400 corresponds to the steps in the method embodiments described above. Thus, the operations, features and technical effects that can be achieved by the above-described training method for the super network are equally applicable to the apparatus 400 and the units contained therein, and are not described herein again.
According to an embodiment of the present application, the present application also provides an electronic device and a readable storage medium.
As shown in fig. 5, a block diagram of an electronic device of a training method of a super network according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.
As shown in fig. 5, the electronic device includes: one or more processors 501, memory 502, and interfaces for connecting components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 501 is illustrated in fig. 5.
Memory 502 is a non-transitory computer readable storage medium provided by the present application. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for training a super network provided by the present application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of training a super network provided by the present application.
The memory 502 is used as a non-transitory computer readable storage medium for storing a non-transitory software program, a non-transitory computer executable program, and modules, such as program instructions/units/modules (e.g., the acquisition unit 401, the first training unit 402, and the second training unit 403 shown in fig. 5) corresponding to the training method of the super network in the embodiment of the present application. The processor 501 executes various functional applications of the server and data processing, i.e., implements the super-network training method in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 502.
Memory 502 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created from the use of the electronic device for generating the structure of the neural network, and the like. In addition, memory 502 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 502 may optionally include memory located remotely from processor 501, which may be connected via a network to an electronic device used to generate the architecture of the neural network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the training method of the super network may further include: an input device 503 and an output device 504. The processor 501, memory 502, input devices 503 and output devices 504 may be connected by a bus 505 or otherwise, in fig. 5 by way of example by bus 505.
The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device used to generate the neural network structure, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, and the like. The output means Y04 may include a display device, an auxiliary lighting means (e.g., LED), a haptic feedback means (e.g., vibration motor), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by those skilled in the art that the scope of the application referred to in this disclosure is not limited to the specific combination of features described above, but encompasses other embodiments in which features described above or their equivalents may be combined in any way without departing from the spirit of the application. Such as the above-mentioned features and the technical features disclosed in the present application (but not limited to) having similar functions are replaced with each other.

Claims (12)

1. A method of training a super network, comprising:
acquiring sample data;
taking a super-network to be trained as an initial current super-network, and iteratively executing cutting training operations for a plurality of times until the connection number reserved by each feature extraction layer of the current super-network is 1, thereby obtaining a cut super-network;
training the cut out super network based on sample data in response to determining that the cut out super network does not reach a preset convergence condition;
wherein, the clipping training operation comprises:
training the current super network based on the sample data;
performing feature extraction on the image data to be processed by using the trained current super network to obtain a first feature map;
determining the number N of connections contained in the feature extraction layer for each feature extraction layer of the current super network after training, respectively cutting the feature extraction layer in the super network after training for N times to obtain N cut super networks, respectively carrying out feature extraction on image data to be processed by using the N cut super networks to obtain N corresponding groups of second feature graphs, wherein one of the N connections contained in the feature extraction layer is cut in each cut;
Determining the cut super network corresponding to the group of second feature images with the smallest distance between the N groups of second feature images and the first feature image as a new current super network;
in response to determining that the number of connections of the feature extraction layer in the new current super network is greater than 1, a next crop training operation is performed.
2. The method of claim 1, wherein the method further comprises:
and in response to determining that the cut-out super-network reaches a preset convergence condition and the connection number of each feature extraction layer in the cut-out super-network is 1, constructing a target neural network model based on each feature extraction layer of the cut-out super-network.
3. The method of claim 1, wherein the crop training operation further comprises:
in response to determining that the number of connections of the feature extraction layer in the new current super network is 1, saving weight parameters corresponding to the connections in the feature extraction layer in the new current super network; and
training the tailored super network based on sample data, including:
and taking the weight parameters corresponding to the connection in each feature extraction layer in the cut-out super network as initial weight parameters in the cut-out super network, and carrying out iterative updating on the weight parameters in the cut-out super network based on the sample data.
4. The method of claim 1, wherein the crop training operation further comprises:
and determining a group of second feature images with the smallest distance with the first feature images in the N groups of second feature images as target second feature images, and storing a weight parameter corresponding to one cut connection in the cut super network corresponding to the target second feature images.
5. The method of claim 4, wherein the method further comprises:
and generating a trained super network according to the saved weight parameters corresponding to the cut-out connection in each feature extraction layer of the super network to be trained and the weight parameters corresponding to the reserved connection in each feature extraction layer of the cut-out super network after the cut-out super network is trained.
6. A training method and device for a super network comprises the following steps:
an acquisition unit configured to acquire sample data;
the first training unit is configured to take a super-network to be trained as an initial current super-network, and iteratively execute cutting training operations for a plurality of times until the connection numbers reserved by each feature extraction layer of the current super-network are 1, so as to obtain a cut super-network;
A second training unit configured to train the cut out super network based on sample data in response to determining that the cut out super network does not reach a preset convergence condition;
wherein the first training unit comprises:
a training subunit configured to perform the following steps in the crop training operation: training the current super network based on the sample data;
a feature extraction subunit configured to perform the following steps in the crop training operation: performing feature extraction on the image data to be processed by using the trained current super network to obtain a first feature map;
a clipping subunit configured to perform the following steps in the clipping training operation: determining the number N of connections contained in the feature extraction layer for each feature extraction layer of the current super network after training, respectively cutting the feature extraction layer in the super network after training for N times to obtain N cut super networks, respectively carrying out feature extraction on image data to be processed by using the N cut super networks to obtain N corresponding groups of second feature graphs, wherein one of the N connections contained in the feature extraction layer is cut in each cut;
A determination subunit configured to perform the following steps in the crop training operation: determining the cut super network corresponding to the group of second feature images with the smallest distance between the N groups of second feature images and the first feature image as a new current super network;
an iteration subunit configured to perform the following steps in the crop training operation: in response to determining that the number of connections of the feature extraction layer in the new current super network is greater than 1, a next crop training operation is performed.
7. The apparatus of claim 6, wherein the apparatus further comprises:
a construction unit configured to: and in response to determining that the cut-out super-network reaches a preset convergence condition and the connection number of each feature extraction layer in the cut-out super-network is 1, constructing a target neural network model based on each feature extraction layer of the cut-out super-network.
8. The apparatus of claim 6, wherein the first training unit further comprises:
a first save subunit configured to perform the following steps in the crop training operation: in response to determining that the number of connections of the feature extraction layer in the new current super network is 1, saving weight parameters corresponding to the connections in the feature extraction layer in the new current super network; and
The second training unit is configured to train the tailored super-network as follows:
and taking the weight parameters corresponding to the connection in each feature extraction layer in the cut-out super network as initial weight parameters in the cut-out super network, and carrying out iterative updating on the weight parameters in the cut-out super network based on the sample data.
9. The apparatus of claim 6, wherein the first training unit further comprises:
a second save subunit configured to perform the following steps in the crop training operation: and determining a group of second feature images with the smallest distance with the first feature images in the N groups of second feature images as target second feature images, and storing a weight parameter corresponding to one cut connection in the cut super network corresponding to the target second feature images.
10. The apparatus of claim 9, wherein the apparatus further comprises:
the generation unit is configured to generate a trained super network according to the saved weight parameters corresponding to the cut-out connection in each feature extraction layer of the super network to be trained and the weight parameters corresponding to the reserved connection in each feature extraction layer of the cut-out super network after the cut-out super network is trained.
11. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.
12. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-5.
CN202010383356.3A 2020-05-08 2020-05-08 Super network training method and device Active CN111563591B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010383356.3A CN111563591B (en) 2020-05-08 2020-05-08 Super network training method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010383356.3A CN111563591B (en) 2020-05-08 2020-05-08 Super network training method and device

Publications (2)

Publication Number Publication Date
CN111563591A CN111563591A (en) 2020-08-21
CN111563591B true CN111563591B (en) 2023-10-20

Family

ID=72073381

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010383356.3A Active CN111563591B (en) 2020-05-08 2020-05-08 Super network training method and device

Country Status (1)

Country Link
CN (1) CN111563591B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113554104B (en) * 2021-07-28 2022-09-30 哈尔滨工程大学 Image classification method based on deep learning model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110503192A (en) * 2018-05-18 2019-11-26 百度(美国)有限责任公司 The effective neural framework of resource
CN110782010A (en) * 2019-10-18 2020-02-11 北京小米智能科技有限公司 Neural network construction method and device and storage medium
EP3629246A1 (en) * 2018-09-27 2020-04-01 Swisscom AG Systems and methods for neural architecture search
CN110956262A (en) * 2019-11-12 2020-04-03 北京小米智能科技有限公司 Hyper network training method and device, electronic equipment and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110889487A (en) * 2018-09-10 2020-03-17 富士通株式会社 Neural network architecture search apparatus and method, and computer-readable recording medium
CN111105029B (en) * 2018-10-29 2024-04-16 北京地平线机器人技术研发有限公司 Neural network generation method, generation device and electronic equipment
US11604992B2 (en) * 2018-11-02 2023-03-14 Microsoft Technology Licensing, Llc Probabilistic neural network architecture generation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110503192A (en) * 2018-05-18 2019-11-26 百度(美国)有限责任公司 The effective neural framework of resource
EP3629246A1 (en) * 2018-09-27 2020-04-01 Swisscom AG Systems and methods for neural architecture search
CN110782010A (en) * 2019-10-18 2020-02-11 北京小米智能科技有限公司 Neural network construction method and device and storage medium
CN110956262A (en) * 2019-11-12 2020-04-03 北京小米智能科技有限公司 Hyper network training method and device, electronic equipment and storage medium

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
Efficient Neural Architecture Search via Parameter Sharing;Hieu Pham 等;《Proceedings of the 35th International Conference on Machine Learning》;第80卷;4095-410 4 *
Hieu Pham等.Efficient Neural Architecture Search via Parameter Sharing.《Proceedings of the 35th International Conference on Machine Learning》.2018,第80卷4095-4104. *
Neural Architecture Search: A Survey;Thomas Elsken 等;《Journal of Machine Learning Research》;第20卷(第1期);1997-2017 *
Thomas Elsken等.Neural Architecture Search: A Survey.《Journal of Machine Learning Research》.2019,第20卷(第1期),1997-2017. *
一种基于强化学 习的限定代价下卷积神 经网结构自动化设计方法;许强等;《集成技术》(第03期);42-54 *
张选杨.深度神经网络架构优化与设计.《中国优秀硕士学位论文全 文数据库 信息科技辑》.2020,(第01期),I140-260. *
深度神经网络架 构优化与设计;张选杨;《中国优秀硕士学位论文全 文数据库 信息科技辑》(第01期);I140-260 *
许强等.一种基于强化学习的限定代价下卷积神经网结构自动化设计方法.《集成技术》.2019,(第03期),42-54. *

Also Published As

Publication number Publication date
CN111563591A (en) 2020-08-21

Similar Documents

Publication Publication Date Title
CN111582453B (en) Method and device for generating neural network model
CN111539514B (en) Method and apparatus for generating a structure of a neural network
CN111582454B (en) Method and device for generating neural network model
CN111539479B (en) Method and device for generating sample data
CN111639710A (en) Image recognition model training method, device, equipment and storage medium
CN111582477B (en) Training method and device for neural network model
CN111950254B (en) Word feature extraction method, device and equipment for searching samples and storage medium
EP3905146A1 (en) Method, apparatus, device and storage medium for constructing knowledge graph
CN112559870B (en) Multi-model fusion method, device, electronic equipment and storage medium
CN111967569B (en) Neural network structure generation method and device, storage medium and electronic equipment
CN111104514A (en) Method and device for training document label model
CN111695519B (en) Method, device, equipment and storage medium for positioning key point
CN111639753B (en) Method, apparatus, device and storage medium for training image processing super network
CN111563592B (en) Neural network model generation method and device based on super network
EP3836141A2 (en) Method and apparatus for extracting video clip
CN111652354B (en) Method, apparatus, device and storage medium for training super network
CN112507090A (en) Method, apparatus, device and storage medium for outputting information
CN111782785B (en) Automatic question and answer method, device, equipment and storage medium
JP2022006189A (en) Image processing method, pre-training model training method, equipment, and electronic device
CN111563591B (en) Super network training method and device
CN113792876A (en) Backbone network generation method, device, equipment and storage medium
CN111582452A (en) Method and device for generating neural network model
CN111767990A (en) Neural network processing method and device
CN111553283B (en) Method and device for generating model
CN111339344B (en) Indoor image retrieval method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant