CN113642730A

CN113642730A - Convolutional network pruning method and device and electronic equipment

Info

Publication number: CN113642730A
Application number: CN202111007647.3A
Authority: CN
Inventors: 黄晨荃
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2021-08-30
Filing date: 2021-08-30
Publication date: 2021-11-12

Abstract

The application relates to a convolution network pruning method, which comprises the following steps: acquiring a convolutional neural network model; decomposing the data into independent pruning models according to the dependency relationship among operators, and decomposing the data into corresponding substructure sets according to a matched decomposition mode; randomly selecting target substructures from the corresponding substructure sets by each pruning model to form different subnetworks, training each subnetwork according to training data, and adjusting the network parameters of the subnetworks until the currently trained subnetwork is converged; obtaining scale parameters of a target model, and selecting a target sub-network from each sub-network to form a current population; screening the current population according to the verification data to obtain a candidate population, selecting a sample to perform variation and interactive breeding operation to obtain an updated sample, adding the updated sample into the candidate population to obtain an updated current population, returning to the step of screening the current population according to the verification data until a convergence condition is reached to obtain a target pruning network, and improving the processing efficiency of the neural network.

Description

Convolutional network pruning method and device and electronic equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a convolutional network pruning method and apparatus, an electronic device, and a computer-readable storage medium.

Background

The neural network has excellent effects in various aspects such as face recognition, voice recognition, image processing and the like, but on the other hand, the neural network usually needs huge computing resources, and the application of the neural network to a mobile terminal is limited. Therefore, a series of technologies for reducing the calculation amount of the neural network are developed, for example, the neural network is pruned, so that the network is light, the network operation efficiency is improved, the power consumption is reduced, and the model can operate on equipment with lower calculation power.

In the traditional convolution network pruning method, the optimal model meeting the conditions is searched in all models in the search space, and the problem of low processing efficiency often exists.

Disclosure of Invention

The embodiment of the application provides a convolutional network pruning method and device, electronic equipment and a computer readable storage medium, and improves the processing efficiency of a neural network.

A convolutional network pruning method, comprising:

acquiring a convolutional neural network model;

decomposing the convolutional neural network model into independent pruneable models according to the dependency relationship among operators of the convolutional neural network model, and decomposing each pruneable model into a corresponding substructure set according to a matched decomposition mode;

randomly selecting target substructures from the corresponding substructure set by each pruning model to form different subnetworks, training each subnetwork according to training data, and adjusting network parameters of the corresponding subnetwork until the currently trained subnetwork is converged;

obtaining target model scale parameters, and selecting a target sub-network from each sub-network based on the target model scale parameters to form a current population;

screening the current population according to verification data to obtain a candidate population, selecting a sample from the candidate population to perform variation and interactive breeding operation to change the number of output channels of the sample to obtain an updated sample, adding the updated sample into the candidate population to obtain an updated current population, returning to the step of screening the current population according to the verification data until a convergence condition is reached, and obtaining a target pruning network.

A convolutional network pruning device, comprising:

the acquiring module is used for acquiring a convolutional neural network model;

the decomposition module is used for decomposing the convolutional neural network model into independent pruning models according to the dependency relationship among operators of the convolutional neural network model, and decomposing each pruning model into a corresponding substructure set according to a matched decomposition mode;

the training module is used for randomly selecting a target substructure from the corresponding substructure set by each pruning model to form different subnetworks, training each subnetwork according to training data, and adjusting network parameters of the corresponding subnetwork until the currently trained subnetwork is converged;

the target pruning network determining module is used for obtaining a target model scale parameter, selecting a target sub-network from various sub-networks based on the target model scale parameter to form a current population, screening the current population according to verification data to obtain a candidate population, selecting a sample from the candidate population to perform variation and interactive breeding operations to change the number of output channels of the sample to obtain an updated sample, adding the updated sample into the candidate population to obtain an updated current population, and returning to the step of screening the current population according to the verification data until a convergence condition is reached to obtain the target pruning network.

An electronic device comprising a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, causes the processor to perform the steps of:

acquiring a convolutional neural network model;

A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, causes the processor to perform the steps of:

acquiring a convolutional neural network model;

According to the convolutional network pruning method, the convolutional network pruning device, the electronic equipment and the computer readable storage medium, the convolutional neural network model is decomposed into independent pruneable models according to the dependency relationship among operators of the convolutional neural network model by acquiring the convolutional neural network model, and each pruneable model is decomposed into a corresponding substructure set according to a matched decomposition mode; randomly selecting target substructures from the corresponding substructure sets by each pruning model to form different subnetworks, training each subnetwork according to training data, and adjusting network parameters of the corresponding subnetwork until the currently trained subnetwork is converged; obtaining target model scale parameters, and selecting a target sub-network from each sub-network based on the target model scale parameters to form a current population; screening a current population according to verification data to obtain a candidate population, selecting a sample from the candidate population to perform mutation and interactive breeding operation to change the number of output channels of the sample to obtain an updated sample, adding the updated sample into the candidate population to obtain the updated current population, returning to the step of screening the current population according to the verification data until a convergence condition is reached to obtain a target pruning network, decomposing the model into independent pruning models, decomposing and forming different sub-networks to realize super-networking, simultaneously training all the sub-networks in a super-networking space by only training the super-networking, actually representing a pruning strategy aiming at an original model by each sub-network in the super-networking space, and improving the efficiency of network pruning development by representing the training convergence of the model after all the pruning strategies in the space, by carrying out variation and interactive breeding operation on the samples, the function of evolutionary search is further realized, a target subnetwork can be quickly found in an oversized search space, the working time and the working complexity are greatly reduced, and the processing efficiency of the neural network is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a diagram of an exemplary implementation of a pruning method for convolutional networks;

FIG. 2 is a schematic flow chart diagram illustrating a pruning method for convolutional networks in one embodiment;

FIG. 3 is a diagram illustrating a decomposition of a pruneable model into corresponding sets of substructures according to the number of output channels, according to one embodiment;

FIG. 4 is a schematic diagram of model superscreening in one embodiment;

FIG. 5 is a schematic diagram of a sub-network sampling from a super-network in one embodiment;

FIG. 6 is a network layer diagram illustrating the existence of dependencies in one embodiment;

FIG. 7 is a network layer diagram illustrating the existence of dependencies in another embodiment;

FIG. 8 is a flow diagram illustrating obtaining updated samples in one embodiment;

FIG. 9 is a diagram of an example of an updated sample of variations;

FIG. 10 is a diagram illustrating an example of an update sample obtained from an interactive breeding operation in one embodiment;

FIG. 11 is a schematic flow chart diagram illustrating a method for pruning a convolutional network in accordance with an exemplary embodiment;

FIG. 12 is a block diagram of the architecture of the convolutional network pruning device in one embodiment;

fig. 13 is a block diagram showing an internal configuration of an electronic device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Fig. 1 is a diagram of an application environment of a convolutional network pruning method in one embodiment. As shown in fig. 1, the application environment includes a terminal 110 and a server 120, and the convolutional network pruning method may be performed by the terminal 110 or the server 120 independently, or by cooperation of the terminal 110 or the server 120. The terminal 110 or the server 120 acquires a convolutional neural network model; decomposing the convolutional neural network model into independent pruning models according to the dependency relationship among operators of the convolutional neural network model, and decomposing each pruning model into a corresponding substructure set according to a matched decomposition mode; randomly selecting target substructures from the corresponding substructure sets by each pruning model to form different subnetworks, training each subnetwork according to training data, and adjusting network parameters of the corresponding subnetwork until the currently trained subnetwork is converged; obtaining target model scale parameters, and selecting a target sub-network from each sub-network based on the target model scale parameters to form a current population; screening the current population according to the verification data to obtain a candidate population, selecting a sample from the candidate population to perform mutation and interactive breeding operation to obtain an updated sample, adding the updated sample into the candidate population to obtain an updated current population, returning to the step of screening the current population according to the verification data until a convergence condition is reached, and obtaining the target pruning network. The terminal 110 may be a terminal device including a mobile phone, a tablet computer, a PDA (Personal Digital Assistant), a vehicle-mounted computer, a wearable device, and the like. The terminal device may be a convolutional neural network that combines various different business functions, including but not limited to face recognition, voice recognition, image processing, etc. Where the server 120 may be a server or a cluster of servers.

FIG. 2 is a flow diagram of a method of pruning a convolutional network in one embodiment. The convolutional network pruning method shown in fig. 2 can be applied to the terminal 110 or the server 120, and includes:

step 202, a convolutional neural network model is obtained.

Among them, Convolutional Neural Networks (CNN) are a type of feed-forward Neural network that contains Convolutional calculation and has a deep structure, and a Convolutional Neural network model is a model including a Convolutional Neural network. The convolutional neural network model can be a model with different functions and different network structures, and the number of the neural network layers, the number of output channels corresponding to the neural network layers and the weight precision corresponding to the neural network layers can be different. The convolutional neural network model is a network model to be pruned, and network pruning is to remove nodes or connections in a network by using a certain strategy, so that the parameter quantity and complexity of the network are reduced, the network is light, the network operation efficiency is improved, the power consumption is reduced, and the model can operate on equipment with lower calculation power.

And 204, decomposing the convolutional neural network model into independent pruning models according to the dependency relationship among operators of the convolutional neural network model, and decomposing each pruning model into a corresponding substructure set according to a matched decomposition mode.

The network model is a directed graph structure, operators in the network have interdependence relation, and certain operators generate dependency on feature size. If the output characteristic sizes of different input operators corresponding to the addition operation are consistent, and the number of output channels between different convolution layers is consistent, the dependency relationship is generated. The operator is a mapping of function space to function space, and the operator may be a convolutional layer.

Specifically, the network structure of the convolutional neural network model is analyzed, so as to determine operators with mutual dependency relationship, and for the operators with dependency relationship, pruning operation must be performed at the same time, otherwise, an error of size conflict occurs. And according to the model structure, finding and integrating operators with dependency relations into a whole to form an independent pruning model. In one embodiment, the instructions sent by the user can be received, the layers with shape dependence are integrated according to the instructions, and the user can input corresponding instructions on the interface according to the visualized network structure to manually form an independent pruning model.

The super-networking is to convert a single model into a special structure containing a series of sub-models, decompose each pruneable model into a corresponding sub-structure set according to a matching decomposition mode, namely to carry out super-networking, wherein the super-networking is mainly to decompose the number of channels of the model and decompose each pruneable model into a corresponding sub-structure set according to the number of output channels. The specific decomposition method can be customized, as shown in fig. 3, the number of convolutional layer original channels is 12, the weights of 3,6, 9, and 12 channels can be respectively taken to form a new convolutional layer, or the weights of 4, 8, and 12 channels can be respectively taken to form a new convolutional layer. The weight is a parameter of the convolution layer, and is a matrix vector with the shape of [ output channel, input channel, convolution kernel width, convolution kernel height ]. For example, if the input channel of a convolutional layer is 3, the output channel is 8, and the width and height of the convolution are both 3, the shape of the weight parameter of this convolutional layer is [ 8,3,3,3 ]. It can be appreciated that there are shared weights for the new convolutional layers resulting from the decomposition.

And step 206, randomly selecting target substructures from the corresponding substructure sets by the pruning models to form different subnetworks, training the subnetworks according to the training data, and adjusting the network parameters of the corresponding subnetworks until the currently trained subnetworks converge.

Specifically, the first pruneable model selects an a1 target substructure from the corresponding first substructure set, the second pruneable model selects a B1 target substructure from the corresponding second substructure set, and so on until each pruneable model selects a target substructure from the corresponding substructure set, and then each randomly selected target substructure forms a sub-network in the order of the original pruneable model. And then repeating the steps to form the next sub-network until the sub-networks are exhausted to form each sub-network.

And performing super-networking on all layers of the pruning model in the network according to the decomposition mode, so that the convolutional neural network model can be super-networked. After the convolutional neural network model is subjected to super-networking, the overall structure in the network is unchanged. Assuming that a convolutional neural network model comprises 10 mutually independent pruneable models in shape, and each pruneable whole is decomposed into a super-network structure comprising 4 substructures according to the super-network strategy, the whole network comprises 4^10 ^ 1048576 different combinations, that is, the super-network space comprises 1048579 different sub-networks.

Fig. 4 is a schematic diagram of model superreticulation, a first layer of an original pruning model has 5 channels, the first layer is decomposed into three optional channels [ 3,4,5 ] after superreticulation, a second layer is decomposed into three optional channels [ 3,6,8 ] from 8 channels, and other layers are similar, and the formed supernet has 3 × 3 — 81 subnetworks in total. Fig. 5 shows a sub-network sampled from the super-network, the channels sampled at each layer being shown in the dashed box in fig. 4.

After the convolutional neural network model is subjected to super-networking, the network needs to be trained, the training strategy is that a sub-network is randomly sampled for sample data selected before each batch of parameters are adjusted, forward propagation prediction is carried out by using the sub-network, the weight is updated after a loss function is calculated, and only the weight participating in forward propagation is updated when the weight is updated. And then acquiring next batch of sample data, training the next randomly sampled sub-network, adjusting the network parameters of the corresponding sub-network, sequentially training different sub-networks through different sample data until the currently trained sub-network is converged, judging that the super-network is converged, and when the super-network is converged, representing that all the sub-networks therein are simultaneously converged. The network parameters are parameters capable of independently reflecting network characteristics, different sub-networks correspond to different network parameters, and the network parameters are adjusted through back propagation, so that the performance of the sub-networks is changed. The condition of the sub-network convergence can be customized, for example, whether the sub-network converges or iterates to a preset number is judged according to the value of the loss function.

And 208, acquiring target model scale parameters, and selecting a target sub-network from the sub-networks based on the target model scale parameters to form the current population.

The target model scale parameter is used for setting the size of a searched target model, the size of the searched model is smaller than or equal to the target model scale parameter, the target model scale parameter can comprise one or more different types of parameters, can be customized, can be a parameter related to weight, a parameter related to operand and the like, and can be a parameter in different forms, such as a threshold value, a vector and the like.

Specifically, a target sub-network is selected from the sub-networks based on the target model scale parameter, the model scale of the target sub-network conforms to the target model scale parameter, and the target sub-network satisfying the target model scale parameter can be randomly selected from the sub-networks. The specific number can be customized, for example, the preset number or all the target sub-networks meeting the target model scale parameters are screened out to form the current population.

And step 210, screening the current population according to the verification data to obtain a candidate population, selecting a sample from the candidate population to perform variation and interactive breeding operation to change the number of output channels of the sample to obtain an updated sample, adding the updated sample into the candidate population to obtain an updated current population, returning to the step of screening the current population according to the verification data until a convergence condition is reached, and obtaining a target pruning network.

Specifically, the verification data refers to data including a real tag, wherein the data includes test data and a tag corresponding to the test data, the test data is input into the model to generate prediction data when the model is verified, and then the prediction data and the corresponding tag are used for judging whether the model is predicted correctly, so that the performances of the model, such as prediction accuracy and the like, are calculated, and a verification result is obtained, wherein the verification data can be customized, and the verification method can also be customized. And screening samples meeting the conditions from the current population according to the verification result to obtain a candidate population. Different types of verification data exist in models with different functions, for example, for models with image subject identification classes, the test data and the corresponding labels thereof are respectively a test image and each subject class in the image.

The variation indicates that the number of channels in a certain layer of the network changes, so that a new network is generated, for example, the number of output channels in the third layer of one sub-network in the candidate population is varied from 3 to 5. The number of variant samples can be customized or adaptively adjusted. The breeding means that two parents interact with gene information to generate a next generation, in the scheme, a channel number gene is provided between two networks, and a new sample randomly selects a channel number from the two parents as the channel number of the new sample, so that a new network is generated.

And adding the updated sample into the candidate population to obtain an updated current population, and returning to the step of screening the current population according to the verification data until a convergence condition is reached to obtain the target pruning network. The searching process repeats step 210, eliminating the low index population, and generating a new sample with the remaining high index population. The quality of the new sample generated by each iteration is better and better, so that the aim of searching and optimizing is fulfilled. The convergence condition may be self-defined, and if the convergence condition is defined as n iterations, if n is 20, n may be set, and no updated sample is generated, it indicates that the evolutionary search is completed, and at this time, the optimal network in the population is the target pruning network.

In the convolutional network pruning method in the embodiment, a convolutional neural network model is decomposed into independent pruneable models according to the dependency relationship among operators of the convolutional neural network model by acquiring the convolutional neural network model, and each pruneable model is decomposed into a corresponding substructure set according to a matched decomposition mode; randomly selecting target substructures from the corresponding substructure sets by each pruning model to form different subnetworks, training each subnetwork according to training data, and adjusting network parameters of the corresponding subnetwork until the currently trained subnetwork is converged; obtaining target model scale parameters, and selecting a target sub-network from each sub-network based on the target model scale parameters to form a current population; screening a current population according to verification data to obtain a candidate population, selecting a sample from the candidate population to perform mutation and interactive breeding operation to obtain an updated sample, adding the updated sample into the candidate population to obtain an updated current population, returning to the step of screening the current population according to the verification data until a convergence condition is reached to obtain a target pruning network, decomposing the model into independent pruning models, decomposing and forming different sub-networks to realize super-networking, training all the sub-networks in a super-network space simultaneously only by training a super-network, actually, each sub-network in the super-network space represents a pruning strategy aiming at an original model, the training convergence of the super-network represents the training convergence of the model after pruning of all the pruning strategies in the space, the efficiency of network pruning development is improved, and the sample is subjected to mutation and interactive breeding operation, and further, the function of evolutionary search is realized, the target sub-network can be quickly found in an ultra-large search space, the working time and the working complexity are greatly reduced, and the processing efficiency of the neural network is improved.

In one embodiment, step 204 includes at least one of the following:

the first mode is as follows: acquiring a first operator and a second operator which are connected, wherein the output of the first operator is the input of the second operator; if the output of the first operator is the same as the output channel number of the second operator, judging that the first operator and the second operator have a dependency relationship; and (4) classifying operators with dependency relationship into the same pruning model.

Specifically, the first operator and the second operator are connected operators, a result obtained through the first operator is input into the second operator for operation, and if the output of the first operator is the same as the output channel number of the second operator, it is determined that the first operator and the second operator have a dependency relationship; and (4) classifying operators with dependency relationship into the same pruning model. As shown in fig. 6, the number of output channels of the network Layer behind the Convolutional Layer is consistent with the number of output channels of the Convolutional Layer before the Convolutional Layer, for example, the number of output channels of the BatchNorm network Layer is consistent with the number of output channels of a conv1(Convolutional Layer) network Layer, where the BatchNorm network Layer is a network Layer frequently used in a deep network to accelerate neural network training, and the number of output channels of the deep separation Convolutional network Layer needs to be consistent with the number of output channels of the Convolutional Layer before the Convolutional Layer, that is, the number of output channels of the deep separation Convolutional network Layer is consistent with the number of output channels of the conv1 network Layer.

The second mode is as follows: acquiring a first operator and a second operator for operation, if the number of output channels of the first operator and the second operator after operation is the same as that of the output channels of the first operator and the second operator, judging that the first operator and the second operator have a dependency relationship, and classifying the operators with the dependency relationship into the same pruning model.

Specifically, when the first operator and the second operator perform operation, the number of output channels of the first operator and the second operator is required to be the same, and the number of output channels after operation is also the same as the number of output channels of the first operator and the second operator, so that the first operator and the second operator have a dependency relationship, and the operators with the dependency relationship are classified into the same pruneable model. As shown in fig. 7, if the output characteristic sizes of the inputs conv1 and conv2 corresponding to the addition operation should be consistent, and the number of output channels after the operation should also be consistent with the output characteristic sizes of conv1 and conv2, there is a dependency relationship between conv1 and conv 2.

In the embodiment, the dependency relationship among the operators is quickly judged by comparing the number of output channels among the operators in different modes, so that the operators with the dependency relationship are classified into the same pruning model, and the convolutional neural network model is efficiently decomposed into independent pruning models.

In one embodiment, step 204 includes: acquiring the number of selectable output channels of the pruning model, and decomposing the pruning model into at least two substructures based on the number of the selectable output channels; the network weight is shared between at least two substructures.

Specifically, the number of selectable output channels of the pruneable model is N, the pruneable model is decomposed into at least two substructures based on the number of selectable output channels, the specific number of output channels of the substructures obtained by the decomposition is selectable in a user-defined manner, and the at least two substructures share a network weight, that is, in each of the substructures obtained by the decomposition, if the number of output channels of the second substructure is greater than the number of output channels of the first substructure, the network weight of the second substructure includes the network weight of the first substructure. If the number of the selectable output channels of the pruneable model is 9, three channels are taken to form a new convolutional layer 1 to obtain a first substructure, six channels are taken to form a new convolutional layer 2 to obtain a second substructure, and the weight of the new convolutional layer 1 is the same as the first three weights of the new convolutional layer 2, that is, the network weight of the second substructure comprises the network weight of the first substructure. Or three of the channels may be taken to form a new convolutional layer 1, to obtain a first substructure, six of the channels may be taken to form a new convolutional layer 2, to obtain a second substructure, and nine of the channels may be taken to form a new convolutional layer 3, to obtain a third substructure, where the network weight of the second substructure includes the network weight of the first substructure, and the network weight of the third substructure includes the network weight of the second substructure. The manner of decomposing the pruneable model is self-configurable, and so on, and will not be described herein.

As shown in fig. 3, the number of convolutional layer original channels of a pruneable model is 12, and the convolutional layer is decomposed, and weights of 3,6, 9, and 12 channels are respectively taken to form a new convolutional layer, so that the convolutional layer is changed from the original one operator to 4 operators. It should be noted that the sharing of the weights between these channels, i.e., 3 ∈ 6 ∈ 9 ∈ 12, indicates that the weights of the convolutional layers formed by the weights of the 3 channels are included in the weights of the convolutional layers formed by the weights of the 6 channels.

In this embodiment, the pruneable model is decomposed into at least two substructures based on the number of selectable output channels, and between each substructure, if the number of output channels of the second substructure is greater than the number of output channels of the first substructure, the network weight of the second substructure includes the network weight of the first substructure, so that weight sharing between each substructure is ensured, and a subsequent more efficient training network is facilitated.

In one embodiment, step 206 includes: acquiring a currently trained subnetwork; acquiring current training data corresponding to a currently trained subnetwork; inputting current training data into a currently trained subnetwork, and adjusting network parameters of the currently trained subnetwork according to an output result of the currently trained subnetwork and the label data; and acquiring the next subnetwork as the subnetwork to be trained currently, and entering the step of acquiring the current training data corresponding to the subnetwork to be trained currently until the subnetwork to be trained converges.

Specifically, the training data includes a plurality of different sets of training samples, different sub-networks may use the same or different training data, and a specific selection strategy may be customized. For each training sample data, randomly sampling a sub-network, performing forward propagation prediction by using the sub-network, calculating a loss function, then updating the weight, and only updating the weight participating in forward propagation during weight updating. And then acquiring next batch of sample data, training the next randomly sampled sub-network, adjusting the network parameters of the corresponding sub-network, sequentially training different sub-networks through different sample data until the currently trained sub-network is converged, judging that the super-network is converged, and when the super-network is converged, representing that all the sub-networks therein are simultaneously converged. The condition of the sub-network convergence can be customized, for example, whether the sub-network converges or iterates to a preset number is judged according to the value of the loss function.

It can be known from the super-networking policy of fig. 3 and the present solution that the sub-layers share the weight, and the sharing weight policy has a characteristic that the larger the number of channels, the better the corresponding model performance. The characteristic makes the model biased to compress the weight during updating, the important weight is distributed in the low-channel part, the convergence of the small model of the low-channel part is more friendly, and better convergence effect can be obtained compared with the method of training the corresponding small model from the beginning.

In this embodiment, all subnetworks of the piconet space can be trained simultaneously by just training the piconet. In fact, each sub-network in the super-network space represents a pruning strategy for the original model, and the training convergence of the super-network represents the training convergence of the model after the pruning strategies in the space are pruned, so that the efficiency of network pruning development is improved.

In one embodiment, the verification data includes test data and a corresponding tag, step 210 includes: inputting test data into a sub-network in the current population, and obtaining a prediction result according to the output of the sub-network; comparing the prediction result with the corresponding label to obtain a verification result; and sequencing the sub-networks in the current population according to the verification result, and forming the sub-networks meeting the performance condition into a candidate population according to the sequencing result.

Specifically, the verification data refers to data including a real tag, wherein the data includes test data and a tag corresponding to the test data, when the model is verified, the test data is input into the model to generate prediction data, then the prediction data and the corresponding tag are used for judging whether the model is predicted correctly, so that the prediction accuracy of the model is calculated as a verification result, sub-networks in the current population are sorted according to the prediction accuracy, the sub-networks with the prediction accuracy meeting an accuracy threshold value form a candidate population, or a preset number of sub-networks are selected from a large prediction accuracy to a small prediction accuracy to form a candidate population.

In one embodiment, a predictor for model performance is constructed, the performance of the model is directly predicted according to the structural information of the model, various model structures and corresponding performance indexes are collected, and the prediction model is trained by utilizing the collected data. And estimating the performance index of the model through the model, and sequencing the sub-networks in the current population according to the performance estimation result. Therefore, for each network, high-quality quantum networks with performance meeting requirements can be quickly obtained to form candidate populations without verifying data, and efficiency is further improved.

In this embodiment, the sub-networks meeting the performance condition are grouped into the candidate population through the sorting result, and the candidate population composed of the sub-networks can be rapidly screened to obtain a candidate population with higher quality.

In one embodiment, as shown in fig. 8, the selecting a sample from the candidate population to perform mutation and interactive breeding operations to change the number of output channels of the sample in step 210, and obtaining an updated sample includes:

step 212, obtaining the number of output channels corresponding to the network layer of the sub-network in the candidate population.

And 214, arranging the gene segments corresponding to the sub-networks by using the number of the output channels as genes according to the connection sequence of the network layers.

Specifically, the network layer of the sub-network has N layers, the number of output channels corresponding to the first layer is Xi, and the value range of i is 1 to N. For example, a subnetwork has 5 layers, where the number of output channels in the first layer is 3, the number of output channels in the second layer is 6, the number of output channels in the second layer is 2, the number of output channels in the third layer is 3, and the number of output channels in the fourth layer is 4. Then the number of output channels is used as the gene, and the gene segments corresponding to the sub-networks are arranged according to the connection sequence of the network layers and are (3, 6, 2, 3, 4)

And step 216, carrying out mutation and interactive breeding operation on the sample based on the gene segments to obtain an updated sample.

Specifically, mutation refers to mutation of an individual gene segment, and in the present embodiment, indicates that the number of output channels of a network layer in a network changes, so as to generate a new network. For example, the number of output channels of the second layer in the network is changed from 6 to 9, thereby generating a new network. Breeding refers to the generation of the next generation by the interaction of gene information of two ancestor samples. In the scheme, an output channel number gene is provided between two networks, and a new sample randomly selects one channel number from two father networks as the channel number of the new sample, so that a new network is generated

In the embodiment, the number of the model layer output channels is represented as the gene segments, so that the functions of variation and interactive breeding are realized, the function of evolutionary search is further realized, the target subnet can be quickly found in an ultra-large search space, and the working time and the working complexity are greatly reduced.

In one example, step 216 includes: changing the genes in the gene segments within the range of the corresponding selectable output channel number to obtain variant gene segments; and taking the number of output channels corresponding to the variant gene segments as the number of output channels of each network layer of the updated sample to obtain the updated sample.

Specifically, the number of the gene segments corresponding to the subnetworks is (3, 6, 2, 3, 4), wherein the number of the selected output channels corresponding to the second layer is (3, 6, 9), then 9 within the range of the number of the selectable output channels of the second layer is substituted for the gene value 6 corresponding to the second layer in the gene segments to obtain variant gene segments (3, 9, 2, 3, 4), and the number of the output channels of each network layer represented by (3, 9, 2, 3, 4) is used to obtain an updated sample, as shown in fig. 9.

In the embodiment, the updated sample is quickly obtained through gene variation, and the method is convenient and efficient.

In one embodiment, step 216 includes: acquiring a first gene segment corresponding to a first sub-network; acquiring a second gene segment corresponding to a second sub-network; selecting target genes from the first gene segment and the second gene segment respectively to form updated gene segments; and taking the number of output channels corresponding to the updated gene segments as the number of output channels of each network layer of the updated sample to obtain the updated sample.

Specifically, target genes are selected from the first gene segment and the second gene segment respectively to form updated gene segments, the selection mode is customized, the gene segments in the first gene segment and the second gene segment can be randomly selected, as shown in fig. 10, the first sub-network serves as a father network 1, the second sub-network serves as a father network 2, the updated sample in the figure selectively inherits the 1 st, 3 rd and 4 th layers of output channel genes from the father network 1, and the second sub-network inherits the 2 nd and 5 th layers of output channel genes from the father network 2, so that a sub-network containing gene information of the two father networks is generated, the number of output channels of the sample is changed, and the updated sample is obtained.

In the embodiment, the function of evolutionary search is further realized through the interactive breeding function, the target subnet can be quickly found in the super-large search space, and the working time and the working complexity are greatly reduced.

In a specific embodiment, a convolutional network pruning method is provided, and as shown in fig. 11, the convolutional network pruning process is specifically as follows:

1. obtaining a convolutional neural network model, decomposing the convolutional neural network model into independent pruneable models according to the dependency relationship among operators of the convolutional neural network model, decomposing each pruneable model into corresponding substructure sets according to a matched decomposition mode, for example, decomposing each pruneable model into independent 4-layer pruneable models, each layer being respectively corresponding to selectable substructure sets, that is, the number of selectable output channels corresponding to each layer is respectively: the first layer originally has 5 output channels, and after the super-screening, the first layer is decomposed into (3, 4, 5) three optional channels, the second layer originally has 8 output channels, and after the super-screening, the third layer is decomposed into (3, 6, 8) three optional channels, and the third layer originally has 5 output channels, and after the super-screening, the third layer is decomposed into (3, 4, 5) three optional channels, and the fourth layer originally has 3 output channels, and after the super-screening, the third layer is decomposed into (1, 2, 3) three optional channels.

2. Each pruneable model randomly selects a target substructure from a corresponding substructure set to form different subnetworks, trains each subnetwork according to training data, adjusts the network parameters of the corresponding subnetwork until the currently trained subnetwork converges, for example, (3, 6, 3, 1) output channels are respectively selected from each layer to form one subnetwork, trains each subnetwork according to the training data, adjusts the network parameters of the corresponding subnetwork, then randomly selects the output channels to form the next subnetwork, and acquires the training data to train the next subnetwork until the currently trained subnetwork converges.

3. And acquiring target model scale parameters, and selecting a target sub-network from each sub-network based on the target model scale parameters to form the current population. Firstly, obtaining a previous node of each layer in the network according to a graph structure corresponding to a convolutional neural network model, wherein an output channel of the previous node is an input channel of a current node, and the channel configuration of the current node is an output channel of the current node, and then calculating the model scale of the current layer according to the (input channel, output channel, input characteristic dimension, output characteristic dimension and weight dimension), including parameters, MACs and FLOPs, and obtaining the model scale through the following formula:

parameter number is input channel output channel weight size;

FLOPs input feature size input channel output channel;

MACs are half of FLOPs.

And adding the data of each layer of the sub-networks to obtain the model scale parameters of the whole sub-network, then randomly sampling from the search space to generate the sub-networks conforming to the target model scale parameters, comparing the model scale parameters of the sub-networks with the target model scale parameters, and selecting the sub-networks conforming to the conditions of the target model scale parameters as target sub-networks to form the current population.

In one embodiment, in the searching process, the model is represented by the model parameters, MACs and FLOPs, the pruned model is required to meet the actual requirement on the operation delay, an estimation strategy for the operation delay of the model can be added, the index is used as a threshold condition as one of the target model scale parameters, and the pruned model can better meet the actual requirement on the operation delay. The operation time delay of each layer of the model can be directly predicted by utilizing information such as input channels, output channels, input characteristic sizes, output characteristic sizes, weight sizes and the like of the layers of the model, and then the operation time delay of the model is obtained through accumulation. The configuration information of various operators and corresponding operation time delay can be collected, a prediction model is trained by utilizing the data, and the model is used for estimating the operation time delay of the model.

4. Screening the current population according to verification data to obtain a candidate population, wherein the verification data is data containing real labels and comprises test data and labels corresponding to the test data, inputting the test data into the sub-networks to generate prediction data during model verification, judging whether the sub-networks predict correctly or not by using the prediction data and the corresponding labels, calculating the performances such as prediction accuracy of the sub-networks and the like, sequencing the sub-networks according to the performances, and eliminating partial tail samples to obtain the candidate population.

5. Randomly selecting a sample from the candidate population, carrying out variation and interactive breeding operation to change the number of output channels of the sample, generating an updated sample, combining the updated sample with the candidate population to obtain an updated current population, returning to the step 4, repeatedly carrying out screening, and carrying out variation and interactive breeding operation until a convergence condition is reached, wherein the convergence condition can be customized, and if n iterations are carried out and no more samples are generated, the evolutionary search is completed, wherein n can be customized.

6. The optimal sub-network in the current population searched according to the steps is the target pruning network, and fine tuning optimization can be performed.

In the embodiment, the model is decomposed into independent pruning models, the models are decomposed and form different sub-networks, the super-networking is realized, all the sub-networks in the super-networking space can be trained simultaneously only by training the super-networking, actually, each sub-network in the super-networking space represents a pruning strategy for an original model, the training convergence of the super-networking represents the training convergence of the model after the pruning strategies in the space are pruned, the efficiency of network pruning development is improved, the function of evolutionary search is realized by carrying out mutation and interactive breeding operation on the sample, the target sub-network can be quickly found in the super-large search space, the working time and the working complexity are greatly reduced, and the processing efficiency of the neural network is improved.

It should be understood that although the steps in the flowcharts of fig. 2, 8, 11 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2, 8, and 11 may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or stages is not necessarily sequential, but may be performed alternately or alternatingly with other steps or at least some of the sub-steps or stages of other steps.

Fig. 12 is a block diagram of a convolutional network pruning device 300 according to an embodiment. As shown in fig. 12, a convolutional network pruning device 300 includes: an acquisition module 302, a decomposition module 304, a training module 306, and a target pruning network determination module 308. Wherein:

an obtaining module 302, configured to obtain a convolutional neural network model.

And the decomposition module 304 is configured to decompose the convolutional neural network model into independent pruneable models according to the dependency relationship among operators of the convolutional neural network model, and decompose each pruneable model into a corresponding substructure set according to a matching decomposition manner.

And the training module 306 is configured to select the target substructures from the corresponding substructures set randomly by each pruning model to form different subnetworks, train each subnetwork according to the training data, and adjust the network parameters of the corresponding subnetwork until the currently trained subnetwork converges.

The target pruning network determining module 308 is configured to obtain a target model scale parameter, select a target sub-network from the sub-networks based on the target model scale parameter to form a current population, screen the current population according to verification data to obtain a candidate population, select a sample from the candidate population to perform mutation and interactive breeding operations to change the number of output channels of the sample to obtain an updated sample, add the updated sample to the candidate population to obtain an updated current population, and return to the step of screening the current population according to the verification data until a convergence condition is reached to obtain a target pruning network.

The convolutional network pruning device 300 in this embodiment decomposes a convolutional neural network model into independent pruneable models according to the dependency relationship between operators of the convolutional neural network model by acquiring the convolutional neural network model, and decomposes each pruneable model into a corresponding substructure set according to a matching decomposition manner; randomly selecting target substructures from the corresponding substructure sets by each pruning model to form different subnetworks, training each subnetwork according to training data, and adjusting network parameters of the corresponding subnetwork until the currently trained subnetwork is converged; obtaining target model scale parameters, and selecting a target sub-network from each sub-network based on the target model scale parameters to form a current population; screening a current population according to verification data to obtain a candidate population, selecting a sample from the candidate population to perform mutation and interactive breeding operation to obtain an updated sample, adding the updated sample into the candidate population to obtain an updated current population, returning to the step of screening the current population according to the verification data until a convergence condition is reached to obtain a target pruning network, decomposing the model into independent pruning models, decomposing and forming different sub-networks to realize super-networking, training all the sub-networks in a super-network space simultaneously only by training a super-network, actually, each sub-network in the super-network space represents a pruning strategy aiming at an original model, the training convergence of the super-network represents the training convergence of the model after pruning of all the pruning strategies in the space, the efficiency of network pruning development is improved, and the sample is subjected to mutation and interactive breeding operation, and further, the function of evolutionary search is realized, the target sub-network can be quickly found in an ultra-large search space, the working time and the working complexity are greatly reduced, and the processing efficiency of the neural network is improved.

In one embodiment, the decomposition module 304 is further configured to derive the independent pruneable model by at least one of:

the first method is as follows: acquiring a first operator and a second operator which are connected, wherein the output of the first operator is the input of the second operator, if the output of the first operator is the same as the output channel number of the second operator, the first operator and the second operator are judged to have a dependency relationship, and the operators with the dependency relationship are classified into the same pruneable model;

the second method comprises the following steps: acquiring a first operator and a second operator for operation, if the number of output channels of the first operator and the second operator after operation is the same as that of the output channels of the first operator and the second operator, judging that the first operator and the second operator have a dependency relationship, and classifying the operators with the dependency relationship into the same pruning model.

The convolutional network pruning device 300 in the embodiment quickly judges the dependency relationship among operators by comparing the number of output channels among operators in different modes, so that operators with dependency relationship are classified into the same pruning model, and the convolutional neural network model is efficiently decomposed into independent pruning models.

In one embodiment, the decomposition module 304 is further configured to obtain a number of selectable output channels of the pruneable model; decomposing the pruneable model into at least two sub-structures based on the number of selectable output channels; in each of the decomposed substructures, if the number of output channels of the second substructure is greater than the number of output channels of the first substructure, the network weight of the second substructure includes the network weight of the first substructure.

In the convolutional network pruning device 300 in this embodiment, the pruneable model is decomposed into at least two substructures based on the number of the selectable output channels, and between each substructure, if the number of the output channels of the second substructure is greater than the number of the output channels of the first substructure, the network weight of the second substructure includes the network weight of the first substructure, so that weight sharing between each substructure is ensured, and a subsequent more efficient training network is facilitated.

In one embodiment, the training module 306 is further configured to obtain a currently trained subnetwork; acquiring current training data corresponding to a currently trained subnetwork; inputting current training data into a currently trained subnetwork, and adjusting network parameters of the currently trained subnetwork according to an output result of the currently trained subnetwork and the label data; and acquiring the next subnetwork as the subnetwork to be trained currently, and entering the step of acquiring current training data corresponding to the subnetwork to be trained currently until the subnetwork to be trained currently converges.

In one embodiment, the verification data includes test data and corresponding tags, and the target pruning network determination module 308 is further configured to input the test data into a sub-network in the current population, and obtain a prediction result according to an output of the sub-network; comparing the prediction result with the corresponding label to obtain a verification result; and sequencing the sub-networks in the current population according to the verification result, and forming the sub-networks meeting the performance condition into a candidate population according to the sequencing result.

In the convolutional network pruning device 300 in this embodiment, the sub-networks satisfying the performance condition are grouped into the candidate population through the sorting result, and the candidate population composed of the sub-networks can be obtained by fast screening, so that the candidate population with higher quality is obtained.

In one embodiment, the target pruning network determining module 308 is further configured to obtain the number of output channels corresponding to the network layer of the sub-network in the candidate population; the number of output channels is used as genes, and gene segments corresponding to the sub-networks are formed by arranging according to the connection sequence of the network layer; and carrying out mutation and interactive breeding operation on the sample based on the gene segments to obtain an updated sample.

The convolutional network pruning device 300 in the implementation represents the number of the model layer output channels as gene segments, thereby realizing the functions of mutation and interactive breeding, further realizing the function of evolutionary search, quickly finding a target subnet in an ultra-large search space, and greatly reducing the working time and the working complexity.

In one embodiment, the target pruning network determination module 308 is further configured to vary the genes in the gene segments within a range corresponding to the number of the selectable output channels to obtain variant gene segments; and taking the number of output channels corresponding to the variant gene segments as the number of output channels of each network layer of the updated sample to obtain the updated sample.

In one embodiment, the target pruning network determination module 308 is further configured to obtain a first gene segment corresponding to the first sub-network; a second gene segment corresponding to the second subnetwork; selecting target genes from the first gene segment and the second gene segment respectively to form updated gene segments; and taking the number of output channels corresponding to the updated gene segments as the number of output channels of each network layer of the updated sample to obtain the updated sample.

For the specific limitations of the convolutional network pruning device, reference may be made to the above limitations of the convolutional network pruning method and the neural network training method, which are not described herein again. The modules in the convolutional network pruning device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

Fig. 13 is a schematic diagram of an internal structure of an electronic device in one embodiment. As shown in fig. 13, the electronic apparatus includes a processor and a memory connected by a system bus. Wherein, the processor is used for providing calculation and control capability and supporting the operation of the whole electronic equipment. The memory may include a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The computer program can be executed by a processor for implementing the convolutional network pruning method provided by the above embodiments. The internal memory provides a cached execution environment for the operating system computer programs in the non-volatile storage medium. The electronic device may be a mobile phone, a server, etc.

The implementation of each module in the convolutional network pruning device provided in the embodiment of the present application may be in the form of a computer program. The computer program may be run on a terminal or a server. The program modules constituted by the computer program may be stored on the memory of the terminal or the server. The computer program, when executed by a processor, implements the convolutional network pruning method described in the embodiments of the present application.

The embodiment of the application also provides a computer readable storage medium. One or more non-transitory computer-readable storage media containing computer-executable instructions that, when executed by one or more processors, cause the processors to perform the convolutional network pruning method described in embodiments of the present application.

A computer program product containing instructions which, when run on a computer, cause the computer to perform the convolution network pruning method described in embodiments of the present application.

Any reference to memory, storage, database, or other medium used herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and bus dynamic RAM (RDRAM).

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A convolutional network pruning method is characterized by comprising the following steps:

acquiring a convolutional neural network model;

2. The method of claim 1, wherein decomposing the convolutional neural network model into independent pruneable models according to the dependency relationships between operators of the convolutional neural network model comprises at least one of:

acquiring a first operator and a second operator which are connected, wherein the output of the first operator is the input of the second operator, if the output of the first operator is the same as the output channel number of the second operator, the first operator and the second operator are judged to have a dependency relationship, and the operators with the dependency relationship are classified into the same pruneable model;

acquiring a first operator and a second operator for operation, if the number of output channels of the first operator and the second operator after operation is the same as that of the output channels of the first operator and the second operator, judging that the first operator and the second operator have a dependency relationship, and classifying the operators with the dependency relationship into the same pruning model.

3. The method of claim 1, wherein decomposing each pruneable model into a corresponding set of substructures according to a matching decomposition comprises:

acquiring the number of selectable output channels of the pruning model;

decomposing a pruneable model into at least two sub-structures based on the number of selectable output channels;

the at least two substructures share a network weight therebetween.

4. The method of claim 1, wherein training each sub-network according to the training data, adjusting network parameters of the corresponding sub-network until the currently trained sub-network converges, comprises:

acquiring a currently trained subnetwork;

acquiring current training data corresponding to a currently trained subnetwork;

inputting the current training data into a currently trained subnetwork, and adjusting the network parameters of the currently trained subnetwork according to the output result of the currently trained subnetwork and the label data;

and acquiring the next subnetwork as the subnetwork to be trained currently, and entering the step of acquiring the current training data corresponding to the subnetwork to be trained currently until the subnetwork to be trained converges.

5. The method of claim 1, wherein the validation data comprises test data and corresponding tags, and the screening the current population according to the validation data to obtain the candidate population comprises:

inputting test data into a sub-network in the current population, and obtaining a prediction result according to the output of the sub-network;

comparing the prediction result with the corresponding label to obtain a verification result;

and sequencing the sub-networks in the current population according to the verification result, and forming the sub-networks meeting the performance condition into the candidate population according to the sequencing result.

6. The method of claim 1, wherein selecting a sample from the candidate population for mutation and interactive breeding operations to obtain an updated sample comprises:

acquiring the number of output channels corresponding to the network layer of the sub-network in the candidate population;

the number of output channels is used as genes, and gene segments corresponding to the sub-networks are formed by arranging according to the connection sequence of the network layer;

and carrying out mutation and interactive breeding operation on the sample based on the gene segments to obtain the updated sample.

7. The method of claim 6, wherein the performing mutation and cross breeding operations on the sample based on the gene segments changes the number of output channels of the sample, and obtaining the updated sample comprises:

changing the genes in the gene segments within the range of the corresponding selectable output channel number to obtain variant gene segments;

and taking the number of output channels corresponding to the variant gene segments as the number of output channels of each network layer of the updated sample to obtain the updated sample.

8. The method of claim 6, wherein the performing mutation and cross breeding operations on the sample based on the gene segments changes the number of output channels of the sample, and obtaining the updated sample comprises:

acquiring a first gene segment corresponding to a first sub-network;

acquiring a second gene segment corresponding to a second sub-network;

selecting target genes from the first gene segment and the second gene segment respectively to form updated gene segments;

and taking the number of output channels corresponding to the updated gene segments as the number of output channels of each network layer of the updated sample to obtain the updated sample.

9. A convolutional network pruning device, comprising:

10. An electronic device comprising a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, causes the processor to perform the convolutional network pruning method of any of claims 1 to 8.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the convolutional network pruning method of any one of claims 1 to 8.