CN116050500A

CN116050500A - Network pruning method, data processing method and device, processing core and electronic equipment

Info

Publication number: CN116050500A
Application number: CN202111247906.XA
Authority: CN
Inventors: 赵荣臻; 吴臻志
Original assignee: Beijing Lynxi Technology Co Ltd
Current assignee: Beijing Lynxi Technology Co Ltd
Priority date: 2021-10-26
Filing date: 2021-10-26
Publication date: 2023-05-02

Abstract

The present disclosure provides a network pruning method, which includes: in the training process of the neural network, determining importance weights of convolution kernels of the neural network relative to the sample data set according to N batches of sample data in the sample data set, wherein the neural network comprises at least one convolution layer, each convolution layer comprises a plurality of convolution kernels, and N is an integer larger than 1; pruning is carried out on the convolution kernels of the trained neural network according to the importance weight, a pruned target network is obtained, and the trained neural network is obtained through training according to N batches of sample data. The disclosure also provides a data processing method and device, a processing core and electronic equipment.

Description

Network pruning method, data processing method and device, processing core and electronic equipment

Technical Field

The disclosure relates to the technical field of processors, and in particular relates to a network pruning method, a data processing method and device, processing core equipment and a medium.

Background

Convolutional neural networks (Convolutional Neural Network, CNN) have been one of the representative network structures in deep learning technology, and have been widely used and developed in various fields such as image processing, speech recognition, natural language processing, and the like.

Along with the increasing complexity of deep learning tasks, the redundancy (such as parameter complexity and operation complexity) of the convolutional neural network is also increased gradually, and in order to reduce network redundancy, the convolutional neural network needs to be compressed through network pruning, so that the network structure is simplified, and the network operation speed is increased.

Disclosure of Invention

The disclosure provides a network pruning method, a data processing method and device, processing core equipment and medium.

In a first aspect, the present disclosure provides a network pruning method, including: in the training process of the neural network, determining importance weights of convolution kernels of the neural network relative to the sample data set according to N batches of sample data in the sample data set, wherein the neural network comprises at least one convolution layer, each convolution layer comprises a plurality of convolution kernels, and N is an integer larger than 1; pruning is carried out on the convolution kernels of the trained neural network according to the importance weight, a pruned target network is obtained, and the trained neural network is obtained through training according to N batches of sample data.

In a second aspect, the present disclosure provides a data processing method, the method comprising: and processing the data to be processed through a target network to obtain a processing result of the data to be processed, wherein the target network is processed according to the network pruning method of the first aspect.

In a third aspect, the present disclosure provides a network pruning device, including: the computing module is used for determining importance weight of a convolution kernel of the neural network relative to the sample data set according to N batches of sample data in the sample data set in the training process of the neural network, wherein the neural network comprises at least one convolution layer, each convolution layer comprises a plurality of convolution kernels, and N is an integer larger than 1; and the pruning module is used for pruning the convolution kernel of the trained neural network according to the importance weight to obtain a pruned target network, wherein the trained neural network is obtained by training according to N batches of sample data.

In a fourth aspect, the present disclosure provides a data processing apparatus, configured to process data to be processed through a target network, to obtain a processing result of the data to be processed, where the target network is processed by the network pruning apparatus according to the third aspect.

In a fifth aspect, the present disclosure provides a processing core, which includes the network pruning device of the third aspect and the data processing device of the fourth aspect.

In a sixth aspect, the present disclosure provides a processing core for loading a neural network model to complete a deep learning process, where the convolution kernel in the neural network model is a convolution kernel obtained according to the network pruning method of the first aspect.

In a seventh aspect, the present disclosure provides an electronic device comprising: a plurality of processing cores; and a network on chip configured to interact data between the plurality of processing cores and external data; wherein one or more processing cores have one or more instructions stored therein, the one or more instructions being executable by the one or more processing cores to enable the one or more processing cores to perform the network pruning method of the first aspect and the data processing method of the second aspect described above.

In an eighth aspect, the present disclosure provides a computer readable medium having stored thereon a computer program, wherein the computer program when executed by a processing core implements the network pruning method of the first aspect and the data processing method of the second aspect described above.

According to the network pruning method, the data processing device, the processing core and the electronic equipment, pruning processing can be carried out on the convolution core of the trained neural network according to the importance weight of the convolution core of the neural network relative to the sample data set in the neural network training process, the overall evaluation of the importance degree of the convolution core is carried out on the basis of the global sample data set according to the importance weight of the convolution core, and the universality of the importance evaluation result of the convolution core of the neural network on the global sample data set is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure, without limitation to the disclosure. The above and other features and advantages will become more readily apparent to those skilled in the art by describing in detail exemplary embodiments with reference to the attached drawings, in which:

fig. 1 is a flowchart of a network pruning method provided in an embodiment of the present disclosure;

FIG. 2 is a particular flow chart for determining importance weights of convolution kernels relative to a sample data set provided by an embodiment of the present disclosure;

FIG. 3 is a specific flow chart of a squeeze stimulus transformation provided by an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a crush stimulus conversion module provided by an embodiment of the present disclosure;

FIG. 5 is a flow chart of a process for updating importance weights provided by an embodiment of the present disclosure;

fig. 6 is a schematic diagram of a specific flow of pruning processing according to an embodiment of the present disclosure;

Fig. 7 is a process flow of a network pruning method according to an exemplary embodiment of the present disclosure;

FIG. 8 is a flow chart of a data processing method according to an embodiment of the invention;

fig. 9 is a block diagram of a network pruning device according to an embodiment of the present disclosure;

FIG. 10 is a block diagram of a data processing apparatus provided by an embodiment of the present disclosure;

fig. 11 is a block diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

For a better understanding of the technical solutions of the present disclosure, exemplary embodiments of the present disclosure will be described below with reference to the accompanying drawings, in which various details of the embodiments of the present disclosure are included to facilitate understanding, and they should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Embodiments of the disclosure and features of embodiments may be combined with each other without conflict.

As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

In the embodiment of the disclosure, the network pruning is a neural network compression method, and the effects of reducing the network operand and memory capacity, simplifying the network structure and accelerating the network operation speed are realized by deleting the connection, the node and the convolution kernel (Convolution Kernel) which have small influence on the output result and are relatively redundant and unimportant in the neural network.

In some scenarios, pruning of convolutional neural networks can be categorized into weight-based pruning and feature-based pruning according to a culling basis.

For a pruning method according to weight, for example, a weight with a certain value (such as norm) smaller than a threshold value can be set to 0, and the pruning method is often adopted after model training; for another example, some regularization may be added, and sparse weights are learned in training, also based on some value of the weights. However, pruning methods based on weights often require additional techniques to ensure that the topology of the pruned weights is structured, otherwise impeding model deployment and affecting model acceleration on the GPU.

For pruning methods based on features, e.g. counting certain values of features of a certain channel on a dataset (like norms), also if they are smaller than a threshold, the corresponding weights/convolution kernels are set to 0. The pruning method can directly realize structured channel pruning. And the pruning method is often adopted after model training.

The embodiment of the disclosure provides a network pruning method, a data processing method and device, a processing core and electronic equipment, which can prune in a training process according to an input sample data set to obtain a pruned target network.

In some embodiments, the target network is used to perform data processing tasks, including any of image processing tasks, voice processing tasks, text processing tasks, video processing tasks. That is, the target network of the embodiments of the present disclosure may find wide application in various fields. The implementation of the network pruning method in the embodiment of the disclosure is not limited to any specific use scenario, and when processing the data to be processed of any data processing task, the network pruning method provided in the embodiment of the disclosure can be used for carrying out network pruning as long as the deep learning algorithm is used for a convolution layer, and the specific type of the data processing task is not limited.

Fig. 1 is a flowchart of a network pruning method according to an embodiment of the present disclosure.

Referring to fig. 1, an embodiment of the present disclosure provides a network pruning method including the following steps.

S110, in the training process of the neural network, determining importance weights of convolution kernels of the neural network relative to the sample data set according to N batches of sample data in the sample data set, wherein the neural network comprises at least one convolution layer, each convolution layer comprises a plurality of convolution kernels, and N is an integer greater than 1.

S120, pruning is carried out on the convolution kernel of the trained neural network according to the importance weight to obtain a pruned target network, and the trained neural network is obtained by training according to the sample data of N batches.

According to the network pruning method disclosed by the embodiment of the disclosure, in the neural network training process, pruning processing can be performed on the convolution kernel of the trained neural network according to the importance weight of the convolution kernel of the neural network relative to the sample data set. In the embodiment of the disclosure, the overall evaluation of the importance degree of the convolution kernel on the basis of the global sample data set according to the importance weight of the convolution kernel can be realized; the importance weight of the convolution kernel is not valued on a single sample, but on the whole sample data set; therefore, the universality of the importance evaluation result of the convolution kernel of the neural network on the global sample data set can be improved, and the network with lighter weight, higher efficiency and more accurate and reliable pruning result can be obtained.

Fig. 2 illustrates a particular flow chart for determining importance weights of convolution kernels relative to a sample data set provided by an embodiment of the present disclosure. Referring to fig. 2, in some embodiments, in S110, the step of determining importance weights of convolution kernels of the neural network with respect to the sample data set according to N batches of sample data in the sample data set may specifically include the following sub-steps.

S11, aiming at the ith batch trained by the neural network, extruding excitation transformation is carried out on output data of the mth convolution layer, so that importance sub-weights of a plurality of convolution kernels of the mth convolution layer relative to the ith batch are obtained, input data of the 1 st convolution layer is sample data of the ith batch, i is more than or equal to 1 and less than or equal to N, m is more than or equal to 1, and i and m are integers.

In this step, the convolution layer may be used to perform convolution operation on the input data, where the input data of the 1 st convolution layer is the sample data of the ith batch, and the inputs of the other convolution layers except the 1 st convolution layer are the outputs of the previous convolution layer or other network layers.

In an embodiment of the present disclosure, the output data of the mth convolution layer may be extrusion excited transformed by an extrusion excited (Squeeze and Excite, SE) module.

The squeeze stimulus transformation is used for explicitly modeling interdependencies among feature channels so as to adaptively recalibrate feature responses among the channels, thereby automatically acquiring the importance degree of each feature channel in a network self-learning mode, and then improving the features which are useful for the current task according to the importance degree and inhibiting the features which are less useful for the current task. Through this processing mechanism, the network can learn to use global information to selectively emphasize informative features, enhancing the representational capacity of the network.

And S12, updating the importance weights of the m-th convolution layers relative to the previous i-1 batches according to the importance sub-weights of the m-th convolution layers relative to the i-th batches to obtain the importance weights of the m-th convolution layers relative to the previous i-th batches, wherein the importance weights of the m-th convolution layers relative to the previous 0 batches are initial importance weights of a plurality of convolution kernels of the m-th convolution layers.

S13, determining importance weights of a plurality of convolution kernels of the mth convolution layer relative to the sample data set according to the importance weights of the mth convolution layer relative to the previous N batches.

Through the steps S11-S13, the importance sub-weights of a plurality of convolution kernels of an mth convolution layer relative to an ith batch can be calculated through extrusion excitation transformation, and the importance sub-weights of the ith batch are utilized to update the importance weights of the previous i-1; and iteratively calculating N batches to obtain importance weights of the mth convolution layer relative to the previous N batches, and finally obtaining importance weights of a plurality of convolution kernels of the mth convolution layer relative to a sample data set, thereby obtaining importance features of the plurality of convolution kernels of the mth convolution layer relative to a global sample.

FIG. 3 illustrates a specific flow chart of a squeeze stimulus transformation provided by an embodiment of the present disclosure. As shown in fig. 3, in some embodiments, the squeeze stimulus transformation consists essentially of a global average pooling process and a full join process; the step S11 may specifically include the following substeps.

S31, determining the feature vector of the input data of the mth convolution layer.

S32, carrying out global average pooling processing on the feature vector of the input data of the mth convolution layer.

And S33, performing at least one full connection treatment on the feature vector subjected to global average pooling to obtain importance sub-weights of a plurality of convolution kernels of the mth convolution layer relative to the ith batch.

Through the steps S31-S32, the importance characteristics of the multiple convolution kernels of the mth convolution layer with respect to the ith lot may be characterized according to the importance sub-weights of the multiple convolution kernels of the mth convolution layer with respect to the ith lot; and updating the importance weights of the first i-1 by using the importance sub-weight of the ith batch in the subsequent model training process, so that a data calculation basis is provided for finally obtaining importance characteristics of a plurality of convolution kernels of the mth convolution layer relative to the global sample.

In order to better understand the processing method of the importance sub-weights in the embodiment of the present disclosure, a specific process of performing extrusion excitation transformation on the output data of the mth convolution layer to obtain importance sub-weights of a plurality of convolution kernels of the mth convolution layer relative to the ith batch in the embodiment of the present disclosure will be described in detail with reference to fig. 4.

Fig. 4 shows a schematic structural diagram of a crush stimulus conversion module provided by an embodiment of the present disclosure. In fig. 4, the squeeze stimulus transformation module includes one global average pooling (Global Average Pooling) layer and two fully connected layers.

As shown in fig. 4, the input to the SE module is four-dimensional tensor data a (b, ci, hi, wi). The four-dimensional tensor data comprises 4 factors, wherein a factor b represents output data of an ith batch on an mth convolution layer aiming at neural network training, and if m=1, input data of the 1 st convolution layer is sample data of the ith batch; the factor ci represents the number of channels of each sample data, the factor hi represents the height data, and the factor wi is the height data; where i represents the serial number of the batch.

In fig. 4, the processing flow for the four-dimensional tensor data a (b, ci, hi, wi) includes a squeeze operation and an excitation operation.

S401, squeeze (Squeeze) operation.

In this step, global average pooling (Global Average Pooling) may be employed as a Squeeze operation. Specifically, the input four-dimensional tensor data may be globally averaged pooled (Global Average Pooling) to squeeze the characteristic channels. By global averaging pooling, four-dimensional tensor data a (b, c) is pooled in the spatial dimension _i ，h _i ，w _i ) Compression is performed and four-dimensional tensor data (b, c) after extrusion is obtained _i ，1，1)。

In the Squeeze operation, and the output dimension matches the number of characteristic channels entered, the global distribution of responses over the characteristic channels is characterized.

S402, excitation operation.

In this step, the exact operation can be implemented by a two Full Connected (FC) layer structure, the correlation between channels is modeled by the two full Connected layers, and the same number of weights as the input features are output.

For example, the extruded four-dimensional tensor data (b, c) may be processed through the first full connection layer _i 1, 1) to obtain four-dimensional tensor data (b, c) after dimension reduction _m 1, 1); wherein c _m Less than c _i The method comprises the steps of carrying out a first treatment on the surface of the Then the four-dimensional tensor data (b, c) after the dimension reduction is processed through the second full-connection layer _m 1, 1) performing upsizing to obtain four-dimensional tensor data a1 (b, c) after upsizing _o 1, 1); wherein c _o Can be equal to c _i 。

In some embodiments, the above-described squeeze excitation transformation process may be expressed as the following expression (1).

a1＝Squeeze_Excite(Xi)(1)

In the above expression (1), X _i Input data of the ith lot representing the mth convolution layer, squeeze_exact represents a Squeeze transformation operation, a1 represents importance sub-weights of the multiple convolution checks of the mth convolution layer to the input data of the ith lot, e.g., a1 (b, c _o ,1,1). Wherein the input data X of the 1 st convolution layer _i Sample data for the i-th lot.

In some embodiments, the first data conversion process may be performed on the importance sub-weight a1 of the input data of the ith batch of the multiple convolution checks of the mth convolution layer obtained by the extrusion transformation, so as to limit the value of each element in a1 to be between the value intervals [ -1,1 ]. The value obtained by updating the importance weights of the first i-1 according to the importance sub-weights of the ith batch is limited.

Illustratively, during this first data conversion process, the element value of a1 greater than 1 is set to 1, the element value of a1 less than-1 is set to-1, and the values greater than or equal to-1 and less than or equal to 1 are reserved, without performing the above-described first data conversion process.

It should be understood that the specific value ranges of the value intervals are merely illustrative, and may be set in a customized manner according to needs in a specific application scenario.

In the processing of the two fully connected layers, a weight can be generated for each characteristic channel through a specific parameter, and the specific parameter is used for explicitly constructing the correlation between the characteristic channels; the weights output after the extrusion operation and the excitation operation can be regarded as the importance of each feature channel after feature selection.

It should be understood that in the case where the number of the above-described fully connected layers is 1, the processing steps of the 1-layer fully connected layer are equivalent to the processing steps of the above-described two fully connected layers.

In the embodiment of the present disclosure, through the steps S302 and S303, the output data of the mth convolution layer may be subjected to the extrusion excitation transformation for the ith batch trained by the neural network, so as to obtain the importance sub-weight of the multiple convolution kernels of the mth convolution layer relative to the ith batch, so that the importance of the multiple convolution kernels of the mth convolution layer relative to the ith batch is represented by the importance sub-weight.

FIG. 5 illustrates a flow chart of a process for updating importance weights provided by an embodiment of the present disclosure. As shown in fig. 5, in some embodiments, the training of sample data for each batch of the neural network includes at least one iterative step; the step S12 may specifically include the following substeps.

S51, calculating the average value of the importance sub-weight of the mth convolution layer on the ith batch on the sample number of the ith batch.

S52, at each current iteration step of the neural network training process, taking the ratio of the average value to the step sequence of the current iteration step as the importance sub-weight of the mth convolution layer in the current iteration step relative to the ith batch;

S53, accumulating the importance sub-weights of the mth convolution layer relative to the ith batch in each iteration step to obtain the importance sub-weights of the mth convolution layer relative to the ith batch;

s54, accumulating the importance sub-weight of the mth convolution layer relative to the ith batch and the importance weight of the ith-1 batch to obtain the importance weight of the mth convolution layer relative to the previous ith batch.

Illustratively, the processing procedure of the importance weights of the mth convolution layer obtained by the above-described summation of S51 to S54 with respect to the previous i batches may be expressed as the following expression (2).

a＝a+a1.mean(0)[...，None]/(step_cnt+1)(2)

In expression (2) above, step_cnt represents the order of steps (e.g., what step is) in the entire training iteration step, and step_cnt typically counts from 0 for the step order, i.e., step_cnt=i-1, step_cnt=0 being the 1 st step of the network iteration training; a1 is the importance sub-weight of the mth convolutional layer at the ith lot, a1.Mean (0) [. No ] means that a1 is averaged along the dimension of sample b, the other dimension remaining unchanged. In addition, i=step_cnt+1 represents the ith step of the entire training iteration step, and the number of steps starts from 0 and is therefore increased by 1 to prevent the divisor in expression (2) from being 0.

It should be understood that when step_cnt is counted from 1, (step_cnt+1) in the above expression may be replaced with step_cnt.

In the embodiment of the disclosure, using the squeeze excitation transformation process described in the above-mentioned combination of steps S11, S31-S33, and steps S401 and S402, the importance sub-weights (e.g., the four-dimensional tensor data a1 (b, c) are obtained after the squeeze excitation transformation on the data characteristics (e.g., the four-dimensional tensor data a (b, ci, hi, wi)) of the sample data of different batches at the current layer (e.g., the mth convolution layer) are obtained after the squeeze excitation transformation _o ，1，1))。

Since the characteristics a of different input samples at the current layer are subjected to extrusion excitation transformation, the large probability can be obtained that different a' can be used for cutting out redundant convolution kernels on the whole data set, and the basic principle of a sliding average value (moving_mean) of an algorithm (BatchNorm) for accelerating the network convergence speed in a deep network can be used as a reference.

That is, in order to remain stable during the network training process, the mean value may be updated using a sliding mean value. The meaning of a running average is: when the current value is updated, the previous value is saved in a proportion. Taking the average value as an example, the average value between the current values is saved in a certain proportion under the condition of updating the current values, so that the feature distribution learned before can be kept after each data normalization, the normalization operation can be completed, the network training is accelerated, and the calculated importance weight value of the convolution kernel is not dependent on a single sample, but is dependent on the whole sample data set.

In the schemes of the embodiments of the present disclosure, there is typically little correlation between the first sample and the nth sample of the neural network. In order to measure the importance characteristics of the convolution kernel of the mth convolution layer on the whole sample set, finally pruning the redundant convolution kernel on the whole data set, in the solution process of the importance sub-weight of the mth convolution layer relative to 1 batch, the solution is that the average value of the number of samples along the sample dimension is needed in the solution process of a, namely, the samples of the whole batch can be taken into consideration; in the whole training process, all samples of different batches used in training are added up to form all samples, namely, the importance weights of a plurality of convolution kernels in one convolution layer obtained through final calculation are subjected to overall importance assessment relative to all samples in the whole world.

Fig. 6 shows a specific flowchart of the pruning process according to an embodiment of the present disclosure.

As shown in fig. 6, in some embodiments, in step S120, the step of pruning the convolution kernel of the trained neural network according to the importance weight may specifically include the following sub-steps.

S61, performing point multiplication calculation on the importance weight and the convolution kernel of the trained neural network to obtain the value of the convolution kernel of the trained neural network; s62, subtracting a convolution kernel with a value smaller than a predetermined threshold.

Through the steps S61 and S62, the convolution kernels to be subtracted in the embodiment of the disclosure do not depend on a single sample, but rather perform the operation of subtracting the convolution kernels after counting the importance weights of the convolution kernels of the neural network with respect to the sample data set, so that the subtracted convolution kernels are representative on the whole sample data set, the convolution kernels calculated according to the importance weights of the single sample are avoided from being subtracted, the convolution kernels that may be subtracted are only not important for the current sample, and are possibly important for other kernels, thereby ensuring that a lighter and more efficient network is obtained after pruning, and pruning results are more accurate and reliable.

In some embodiments, before step S61, a second data conversion process is performed on the importance weight of the calculated convolution kernel of the neural network with respect to the sample data set, and the value of the importance weight is limited between the value intervals [0,1 ].

Illustratively, in this second data conversion process, the value of importance weight greater than 1 is set to 1, the value of importance weight less than 0 is set to 0, and the values greater than or equal to 0 and less than or equal to 1 are reserved, and the above-described second data conversion process is not performed.

In some embodiments, pruning is performed on the convolution kernels of the trained neural network, and the importance weights used are importance weights after regularization; in this embodiment, before step S61, the method further includes: regularizing the importance weight to obtain the importance weight after regularization.

In the embodiment of the disclosure, in order to guide the importance weight which is obtained as sparse as possible through extrusion excitation transformation, a regularization can be introduced, so that the element in the importance weight is as 0 as possible; by regularized strength, the pruning force can be balanced adaptively, so that the network performance and efficiency are balanced.

Fig. 7 shows a process flow of a network pruning method according to an exemplary embodiment of the present disclosure. As shown in fig. 7, the process flow may include the following steps.

First, during the training of the neural network, for the input ith batch of samples x _i Its four-dimensional tensor data a (b, ci, hi, wi).

Next, the importance sub-weights a1 (b, c) of the multiple convolution kernels of the mth convolution layer relative to the ith batch are obtained through the extrusion excitation transformation ₀ 1, 1); and updating the importance weights of the m-th convolution layer relative to the previous i-1 batches according to the importance sub-weights of the plurality of convolution kernels of the m-th convolution layer relative to the i-th batches to obtain importance weights a2 (c 0, 1) of the m-th convolution layer relative to the previous i batches, so as to determine the importance weights of the plurality of convolution kernels of the m-th convolution layer relative to the sample data set.

And then, after the training of the neural network is finished, carrying out point multiplication calculation on the importance weight a2 (c 0, 1) of the mth convolution layer relative to the previous i batches and the convolution kernel of the trained neural network to obtain the value of the convolution kernel of the trained neural network, and subtracting the convolution kernel with the value smaller than a preset threshold value.

Then, the neural network after pruning is passed throughi batches of samples x _i Performing convolution operation to obtain an ith batch of samples x _i And (3) based on convolution operation results of the neural network after pruning.

According to the network pruning method described in the above embodiment of the present disclosure, in the training process of the neural network, the importance weight of the convolution kernel relative to the sample data set can be determined for each batch processing, the dimension of the importance weight is the same as the number co of the convolution kernels, and each dimension corresponds to one convolution kernel and can be used for representing the importance degree or redundancy degree of the convolution kernel; for different batch processing, the importance weights are dynamically changed, and when training is finished, redundant convolution kernels can be determined from co convolution kernels according to the finally obtained importance weights, and the redundant convolution kernels are removed through heavy parameterization, so that a trained neural network is obtained. Therefore, the importance weight of the convolution kernel can depend on the whole sample data set, so that the universality of the importance evaluation result of the convolution kernel of the neural network on the global sample data set is improved, and a network with lighter weight, high efficiency and more accurate and reliable pruning result is obtained after pruning.

In the embodiment of the disclosure, the re-parameterization is understood to be that the network parameters in the network reasoning module are reconstructed by a certain change of the network parameters in the network training module. The re-parameterization has two functions, on the one hand, the performance of the model can be improved, and on the other hand, the structure of the model is changed to achieve certain purposes. Thus, in the embodiment of the disclosure, convolution kernel subtraction may be performed in the reparameterization process, and the convolution kernel subtraction may be implemented in the reparameterization process, so as to obtain a pruned network.

Fig. 8 shows a flowchart of a data processing method of an embodiment of the present invention. As shown in fig. 8, the data processing method includes the following steps.

S810, processing the data to be processed through a target network to obtain a processing result of the data to be processed, wherein the target network is processed according to the network pruning method in the embodiment of the disclosure.

According to the data processing method of the embodiment of the disclosure, the target network is obtained according to the network pruning method described in the embodiment, and the data processing is performed according to the target network, so that the calculation cost can be remarkably reduced, and the processing result of the pruned model is more reliable on the whole data set.

Fig. 9 is a block diagram of a network pruning device according to an embodiment of the present disclosure.

Referring to fig. 9, an embodiment of the present disclosure provides a network pruning device 900, and the network pruning device 900 may include the following modules.

The calculating module 910 is configured to determine, during training of the neural network, importance weights of convolution kernels of the neural network relative to the sample data set according to N batches of sample data in the sample data set, where the neural network includes at least one convolution layer, each convolution layer includes a plurality of convolution kernels, and N is an integer greater than 1.

The pruning module 920 is configured to prune the convolution kernel of the trained neural network according to the importance weight to obtain a pruned target network, where the trained neural network is obtained by training according to N batches of sample data.

In some embodiments, the calculation module 910 includes the following units.

The weight calculation unit is used for carrying out extrusion excitation transformation on the input data of the mth convolution layer aiming at the ith batch trained by the neural network to obtain the importance sub-weight of a plurality of convolution kernels of the mth convolution layer relative to the ith batch, wherein the input data of the 1 st convolution layer is sample data of the ith batch, i is more than or equal to 1 and less than or equal to N, m is more than or equal to 1, and i and m are integers.

The weight determining unit is used for updating the importance weights of the mth convolution layer relative to the previous i-1 batches according to the importance sub-weights of the mth convolution layer relative to the i batches to obtain the importance weights of the mth convolution layer relative to the previous i batches, wherein the importance weights of the mth convolution layer relative to the previous 0 batches are initial importance weights of a plurality of convolution kernels of the mth convolution layer.

A weight determining unit, configured to determine importance weights of a plurality of convolution kernels of the mth convolution layer with respect to the sample data set according to importance weights of the mth convolution layer with respect to the first N batches.

In some embodiments, the squeeze stimulus transformation includes a global average pooling process and a full join process; the weight calculation unit is specifically configured to: determining a feature vector of input data of an mth convolution layer; carrying out global average pooling treatment on the feature vector of the input data of the mth convolution layer; and performing at least one full connection treatment on the feature vector subjected to global average pooling to obtain importance sub-weights of a plurality of convolution kernels of the mth convolution layer relative to the ith batch.

In some embodiments, the training of sample data for each batch of the neural network includes at least one iterative step; the weight determining unit includes the following sub-units.

And the average value calculating subunit is used for calculating the average value of the importance sub-weight of the mth convolution layer on the ith batch on the sample number of the ith batch.

And the ratio calculating subunit is used for taking the ratio of the average value to the step sequence of the current iteration step as the importance sub-weight of the mth convolution layer in the current iteration step relative to the ith batch in each current iteration step of the neural network training process.

And the accumulation unit is used for accumulating the importance sub-weights of the mth convolution layer relative to the ith batch in each iteration step to obtain the importance sub-weights of the mth convolution layer relative to the ith batch.

The accumulation unit is further configured to accumulate the importance sub-weight of the mth convolution layer relative to the ith batch with the importance weight of the ith-1 batch to obtain the importance weight of the mth convolution layer relative to the previous ith batch.

In some embodiments, the pruning module 910 is specifically configured to perform a dot product calculation on the importance weight and the convolution kernel of the trained neural network, so as to obtain a value of the convolution kernel of the trained neural network; convolution kernels with subtraction values less than a predetermined threshold.

In some embodiments, the importance weight is a regularized importance weight; the network pruning device 900 further includes: the regularization module is used for regularizing the importance weights before performing point multiplication calculation on the importance weights and the convolution kernels of the trained neural network to obtain the regularized importance weights.

In some embodiments, the target network is used to perform data processing tasks, including any of image processing tasks, voice processing tasks, text processing tasks, video processing tasks.

According to the network pruning device disclosed by the embodiment of the invention, in the training process of the neural network, the pruning treatment is carried out on the convolution kernel of the trained neural network according to the importance weight of the convolution kernel of the neural network relative to the sample data set, so that the universality of the importance evaluation result of the convolution kernel of the neural network on the global sample data set is improved, and a network with lighter weight, high efficiency and more accurate and reliable pruning result is further obtained.

It should be understood that the present disclosure is not limited to the particular arrangements and processes described in the foregoing embodiments and illustrated in the drawings. For convenience and brevity of description, detailed descriptions of known methods are omitted herein, and specific working processes of the systems, modules and units described above may refer to corresponding processes in the foregoing method embodiments, which are not repeated herein.

Fig. 10 is a block diagram of a data processing apparatus according to an embodiment of the present disclosure.

Referring to fig. 10, an embodiment of the present disclosure provides a data processing apparatus 1000, and a network pruning apparatus 1000 may include the following modules.

The calculating module 1010 is configured to process the data to be processed through a target network to obtain a processing result of the data to be processed, where the target network is processed according to the network pruning method described in the foregoing embodiments of the disclosure.

According to the data processing device of the embodiment of the disclosure, the target network obtained by the network pruning method described in the above embodiment can be subjected to data processing, so that the data processing efficiency is improved, the calculation cost is reduced, and the processing result of the pruned model according to the embodiment of the disclosure is more reliable.

The embodiment of the disclosure also provides a processing core, which comprises the network pruning device or the data processing device.

The embodiment of the disclosure also provides a processing core for loading the neural network model to complete the deep learning processing, wherein the convolution core in the neural network model is the convolution core obtained according to the network pruning method.

Referring to fig. 11, an embodiment of the present disclosure provides an electronic device including a plurality of processing cores 1101 and a network-on-chip 1102, wherein the plurality of processing cores 1101 are each connected to the network-on-chip 1102, and the network-on-chip 1102 is configured to interact data between the plurality of processing cores and external data.

One or more instructions are stored in the one or more processing cores 1101, where the one or more instructions are executed by the one or more processing cores 1101, so that the one or more processing cores 1101 can perform the network pruning method or the data processing method described above.

Furthermore, the embodiment of the present disclosure also provides a computer readable medium, on which a computer program is stored, where the computer program, when executed by a processing core, implements the network pruning method or the data processing method described above.

Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.

Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purpose of limitation. In some instances, it will be apparent to one skilled in the art that features, characteristics, and/or elements described in connection with a particular embodiment may be used alone or in combination with other embodiments unless explicitly stated otherwise. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the disclosure as set forth in the appended claims.

Claims

1. A network pruning method, comprising:

in the training process of the neural network, determining importance weights of convolution kernels of the neural network relative to a sample data set according to N batches of sample data in the sample data set, wherein the neural network comprises at least one convolution layer, each convolution layer comprises a plurality of convolution kernels, and N is an integer greater than 1;

pruning is carried out on the convolution kernels of the trained neural network according to the importance weight, a pruned target network is obtained, and the trained neural network is obtained through training according to the N batches of sample data.

2. The method of claim 1, wherein the determining importance weights of the convolution kernels of the neural network relative to the sample data set from the N batches of sample data in the sample data set comprises:

aiming at the ith batch trained by the neural network, performing extrusion excitation transformation on input data of the mth convolution layer to obtain importance sub-weights of a plurality of convolution kernels of the mth convolution layer relative to the ith batch, wherein the input data of the 1 st convolution layer is sample data of the ith batch, i is more than or equal to 1 and less than or equal to N, m is more than or equal to 1, and i and m are integers;

updating the importance weights of the mth convolution layer relative to the previous i-1 batches according to the importance sub-weights of the mth convolution layer relative to the i batches to obtain the importance weights of the mth convolution layer relative to the previous i batches, wherein the importance weights of the mth convolution layer relative to the previous 0 batches are initial importance weights of a plurality of convolution kernels of the mth convolution layer;

and determining importance weights of a plurality of convolution kernels of the mth convolution layer relative to the sample data set according to the importance weights of the mth convolution layer relative to the previous N batches.

3. The method of claim 2, wherein each batch of sample data training of the neural network comprises at least one iterative step; the updating the importance weight of the mth convolution layer relative to the previous i-1 batch according to the importance sub-weight of the mth convolution layer relative to the ith batch, to obtain the importance weight of the mth convolution layer relative to the previous i batch, comprising:

calculating the average value of the importance sub-weight of the mth convolution layer in the ith batch on the sample number of the ith batch;

at each current iteration step of the neural network training process, taking the ratio of the average value to the step sequence of the current iteration step as the importance sub-weight of the mth convolution layer in the current iteration step relative to the ith batch;

accumulating the importance sub-weights of the mth convolution layer relative to the ith batch in each iteration step to obtain the importance sub-weights of the mth convolution layer relative to the ith batch;

and accumulating the importance sub-weight of the mth convolution layer relative to the ith batch and the importance weight of the ith-1 batch to obtain the importance weight of the mth convolution layer relative to the previous ith batch.

4. The method of claim 1, wherein pruning the convolution kernel of the trained neural network according to the importance weights comprises:

performing point multiplication calculation on the importance weight and the trained convolution kernel of the neural network to obtain a value of the trained convolution kernel of the neural network;

convolution kernels with subtraction values less than a predetermined threshold.

5. The method of claim 4, wherein the importance weight is a regularized importance weight; before performing the dot product calculation on the importance weight and the trained convolution kernel of the neural network, the method further comprises:

and regularizing the importance weight to obtain the importance weight after regularization.

6. The method of any of claims 1-5, wherein the target network is configured to perform data processing tasks including any of image processing tasks, voice processing tasks, text processing tasks, video processing tasks.

7. A data processing method, comprising:

processing data to be processed through a target network to obtain a processing result of the data to be processed, wherein the target network is obtained by processing according to the network pruning method of any one of claims 1-6.

8. A network pruning device, comprising:

the computing module is used for determining importance weights of convolution kernels of the neural network relative to the sample data set according to N batches of sample data in the sample data set in the neural network training process, wherein the neural network comprises at least one convolution layer, each convolution layer comprises a plurality of convolution kernels, and N is an integer larger than 1;

and the pruning module is used for pruning the convolution kernel of the trained neural network according to the importance weight to obtain a pruned target network, wherein the trained neural network is obtained by training according to the N batches of sample data.

9. A data processing device, configured to process data to be processed through a target network, to obtain a processing result of the data to be processed, where the target network is obtained by processing by the network pruning device according to claim 8.

10. A processing core comprising the network pruning device of claim 8 or the data processing device of claim 9.

11. A processing core for loading a neural network model to complete a deep learning process, wherein the convolution core in the neural network model is a convolution core obtained according to the network pruning method of any one of claims 1-6.

12. An electronic device, comprising:

a plurality of processing cores; and

a network on chip configured to interact data between the plurality of processing cores and external data;

one or more of the processing cores have one or more instructions stored therein that are executable by one or more of the processing cores to enable one or more of the processing cores to perform the network pruning method of any one of claims 1-6 or the data processing method of claim 7.

13. A computer readable medium having stored thereon a computer program, wherein the computer program when executed by a processing core implements the network pruning method of any one of claims 1-6 or the data processing method of claim 7.