CN111062477A

CN111062477A - Data processing method, device and storage medium

Info

Publication number: CN111062477A
Application number: CN201911303249.9A
Authority: CN
Inventors: 高雨婷; 胡易; 余宗桥; 孙星; 彭湃; 郭晓威; 黄小明
Original assignee: Tencent Cloud Computing Beijing Co Ltd
Current assignee: Tencent Cloud Computing Beijing Co Ltd
Priority date: 2019-12-17
Filing date: 2019-12-17
Publication date: 2020-04-24
Anticipated expiration: 2039-12-17
Also published as: CN111062477B

Abstract

The application discloses a data processing method, a data processing device and a storage medium, relates to the field of neural networks, and aims to reduce model parameters and model calculation amount. In the method, the relative information among the channels is determined by calculating the conditional entropy among the characteristic graphs corresponding to the channels in the initial neural network model; and pruning the initial neural network model by removing channels with small influence on other channels. Therefore, by pruning the initial neural network model, the parameter quantity of the initial neural network model is reduced, and the calculation quantity of the initial neural network model is reduced, so that the model operation speed is increased while the effect is ensured, and the requirement on deployment equipment is reduced.

Description

Data processing method, device and storage medium

Technical Field

The present application relates to the field of neural networks, and in particular, to a data processing method, apparatus, and storage medium.

Background

With the development of deep learning, the neural network is deeper and deeper, the amount of calculation and the amount of parameters are more and more, and the device with limited calculation capability is difficult to deploy. Many studies prove that a large amount of redundancy exists in huge parameters, but the large amount of parameters is very necessary in the model training optimization stage, so that the model optimization problem is simpler, and the model can be converged to a better solution.

Therefore, after the model training is converged, how to prune unimportant channels in the model is very important for reducing the model parameters and the model calculation amount.

Disclosure of Invention

The embodiment of the application provides a data processing method, a data processing device and a storage medium, which are used for reducing model parameters and model calculation amount, so that the effect is guaranteed, the model operation speed is increased, and the requirement on deployment equipment is reduced.

In a first aspect, a data processing method is provided, including:

acquiring a training sample of data to be processed, inputting the training sample into a trained initial neural network model, and respectively acquiring a characteristic diagram output by each convolutional layer in the initial neural network model;

calculating the conditional entropy between the feature map corresponding to each channel in the convolutional layer and the feature maps corresponding to other channels aiming at each convolutional layer of the initial neural network model;

averaging the conditional entropy between the feature map corresponding to each channel and the feature maps corresponding to other channels in the convolutional layer aiming at each convolutional layer of the initial neural network model to obtain the average conditional entropy between the feature map corresponding to each channel and the feature maps corresponding to other channels;

aiming at each convolution layer of the initial neural network model, carrying out pruning processing on the initial neural network model according to the average conditional entropy corresponding to each channel in the convolution layer to obtain an optimized neural network model;

and inputting the data to be processed into the optimized neural network model to obtain processed data output by the optimized neural network model.

In a second aspect, a data processing apparatus is provided, including:

the characteristic diagram acquisition module is used for acquiring a training sample of data to be processed, inputting the training sample into a trained initial neural network model and respectively acquiring a characteristic diagram output by each convolution layer in the initial neural network model;

the first conditional entropy acquisition module is used for calculating the conditional entropy between the feature map corresponding to each channel in the convolutional layer and the feature maps corresponding to other channels aiming at each convolutional layer of the initial neural network model;

an average conditional entropy obtaining module, configured to average, for each convolutional layer of the initial neural network model, conditional entropy between a feature map corresponding to each channel in the convolutional layer and feature maps corresponding to other channels to obtain an average conditional entropy between the feature map corresponding to each channel and the feature maps corresponding to other channels;

the pruning module is used for pruning the initial neural network model according to the average conditional entropy corresponding to each channel in the convolutional layer aiming at each convolutional layer of the initial neural network model to obtain an optimized neural network model;

and the data acquisition module is used for inputting the data to be processed into the optimized neural network model and acquiring the processed data output by the optimized neural network model.

In an embodiment, the pruning module is specifically configured to prune a preset number of channels according to a sequence that the average conditional entropy corresponding to each channel is from small to large.

In one embodiment, determining the conditional entropy unit comprises:

the first determining probability sum subunit is used for summing the probabilities of the first channel in each interval to obtain the probability sum of the first channel; and;

a second probability sum subunit, configured to sum a product of a conditional probability of a second channel in each interval relative to the first channel in each interval and a logarithm of the conditional probability to obtain a probability sum of the second channel;

and determining a conditional entropy subunit, configured to use a negative of a product of the probability sum of the first channel and the probability sum of the second channel as the conditional entropy of the second channel on the first channel.

In one embodiment, the apparatus further comprises:

the second conditional entropy acquisition module is used for calculating the conditional entropy between the feature graph corresponding to each channel in the convolutional layer and the feature graphs corresponding to other channels aiming at each convolutional layer of the initial neural network model by the first conditional entropy acquisition module and then acquiring the conditional entropy between the feature graph corresponding to each channel and the feature graph corresponding to the first conditional entropy acquisition module;

the conditional entropy matrix generation module is used for generating a conditional entropy matrix according to the conditional entropy between the feature diagram corresponding to each channel and the feature diagrams corresponding to other channels and the conditional entropy between the feature diagram corresponding to each channel and the feature diagram corresponding to the conditional entropy matrix generation module; the rows and columns of the conditional entropy matrix are the number of channels of the convolutional layer, and the conditional entropy in each row of the conditional entropy matrix is the conditional entropy between the feature map corresponding to the same channel and the feature maps corresponding to other channels.

In one embodiment, the apparatus further comprises:

and the training module is used for training the optimized neural network model after the pruning module carries out pruning processing on the initial neural network model according to the average conditional entropy corresponding to each channel in the convolutional layer aiming at each convolutional layer of the initial neural network model to obtain the optimized neural network model.

In a third aspect, a computing device is provided, comprising at least one processing unit, and at least one memory unit, wherein the memory unit stores a computer program that, when executed by the processing unit, causes the processing unit to perform the steps of any of the data processing methods described above.

In one embodiment, the computing device may be a server or a terminal device.

In a fourth aspect, a computer-readable medium is provided, which stores a computer program executable by a terminal device, and when the program is run on the terminal device, causes the terminal device to perform the steps of any of the data processing methods described above.

According to the data processing method, the data processing device and the storage medium, the relative information among the channels is determined by calculating the conditional entropy among the characteristic graphs corresponding to the channels in the initial neural network model; and pruning the initial neural network model by removing channels with small influence on other channels. Therefore, by pruning the initial neural network model, the parameter quantity of the initial neural network model is reduced, and the calculation quantity of the initial neural network model is reduced, so that the model operation speed is increased while the effect is ensured, and the requirement on deployment equipment is reduced.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a flow chart of a data processing method in an embodiment of the present application;

FIG. 2 is a schematic flow chart illustrating a data processing method according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating a data processing method according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a terminal device in an embodiment of the present application.

Detailed Description

In order to reduce the number of model parameters and the calculation amount of a model, thereby achieving the purposes of ensuring the effect, improving the speed of model operation and reducing the requirements for deploying equipment, the embodiments of the present application provide a data processing method, an apparatus and a storage medium. In order to better understand the technical solution provided by the embodiments of the present application, the following brief description is made on the basic principle of the solution:

in order to facilitate those skilled in the art to better understand the technical solutions in the embodiments of the present application, the following description illustrates terms related to the embodiments of the present application.

Characteristic diagram: the method refers to the image characteristics generated after convolution of a training sample and a convolution kernel in a neural network model. In an embodiment of the present application, a feature map may be obtained for a training sample and a channel of convolutional layers.

Conditional entropy: the conditional entropy label represents the uncertainty of the random variable Y given the known random variable X. In the embodiments of the present application, conditional entropy refers to the influence of one channel on other channels.

Model pruning: and unimportant parts in the model are evaluated and cut down, so that the model parameters and the calculated amount can be reduced. Wherein, pruning the channel is to delete the channel.

The following briefly introduces the design concept of the embodiments of the present application.

As mentioned above, after the model training is converged, how to prune unimportant channels in the model is very important for reducing the model parameters and the model calculation amount. In the prior art, model pruning mostly considers the importance degree of each channel independently, usually, the importance of each channel in the model is evaluated, and unimportant channels are pruned according to the importance degree. The prior art ignores the correlation between different channels. In order to achieve the purpose, an embodiment of the application provides a data processing method, a data processing device and a storage medium. In the method, the relative information among the channels is determined by calculating the conditional entropy among the characteristic graphs corresponding to the channels in the initial neural network model; and pruning the initial neural network model by removing channels with small influence on other channels. Therefore, by pruning the initial neural network model, the parameter quantity of the initial neural network model is reduced, and the calculation quantity of the initial neural network model is reduced, so that the model operation speed is increased while the effect is ensured, and the requirement on deployment equipment is reduced.

According to the data processing method, the data processing device and the storage medium, the effect can be guaranteed, meanwhile, the model operation speed is improved, and the requirements for deploying equipment are reduced. For example: when a user browses a network article through the intelligent terminal, the user does not like to see advertisements in the network article. Therefore, before recommending the web articles to the user, the web articles to be recommended need to be identified by the identifying neural network model, the web articles determined as advertisement articles need to be filtered, and the filtered web articles to be recommended need to be recommended to the user. However, the processing speed of the processor of the intelligent terminal is not fast, so that the recognition time is too long, and the use experience of a user is influenced. By the method provided by the embodiment of the application, pruning processing is carried out on the recognition neural network model, recognition can be carried out quickly, and therefore the use experience of a user is improved. Or, aiming at pedestrian re-identification, the method provided by the embodiment of the application is used for pruning the identification neural network model, so that the pruned model can be applied to the camera, the identification work can be directly completed on the camera, and the identification efficiency is improved.

For the convenience of understanding, the technical solutions provided in the present application are further described below with reference to the accompanying drawings.

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach for computers to have intelligence, and is applied to all fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and formal education learning.

In the embodiment of the present application, a flowchart of a data processing method provided by the present application is shown in fig. 1: respectively carrying out model pre-training, channel pruning based on conditional entropy and training a pruning model.

The model pre-training is to train a neural network model, which is an early preparation part of the application, and any trained neural network model with multiple channels can be used as an initial neural network model of the application.

Channel pruning based on conditional entropy is a scheme that is introduced with emphasis in the present application, and this section is described in detail below.

In the embodiment of the present application, in order to implement optimization on the initial neural network model, as shown in fig. 2, the method specifically includes the following steps:

step 201: and acquiring a training sample of the data to be processed, inputting the training sample into the trained initial neural network model, and respectively acquiring the characteristic diagram output by each convolutional layer in the initial neural network model.

In an embodiment of the present application, the initial neural network model has at least one convolutional layer, and each convolutional layer has a plurality of channels therein. A training sample and a channel may result in a feature map. For example: the initial neural network modeler has 3 convolutional layers, each convolutional layer has 10 channels, and a total of 150 feature maps can be obtained by inputting 5 training samples into the initial neural network model.

It should be noted that the number of channels in each convolution layer may be the same or different, and the present application is not limited thereto.

Step 202: and calculating the conditional entropy between the feature map corresponding to each channel and the feature maps corresponding to other channels in each convolutional layer of the initial neural network model.

In the embodiment of the application, the conditional entropy between the feature map corresponding to each channel and the feature maps corresponding to other channels is determined by determining the two-Norm (L2 Norm) of each feature map. Specifically, the method can be implemented as steps A1-A3:

step A1: and aiming at each convolutional layer of the initial neural network model, determining a characteristic diagram corresponding to each channel in the convolutional layer, and calculating two norms of the characteristic diagrams corresponding to each channel.

In the embodiment of the present application, the two-norm is calculated by the following formula:

wherein i is 1,2,3 … … k; (1);

wherein x is_l2Denotes the two norm, x_iRepresenting vectors in the feature map. Therefore, the dimension reduction can be realized by calculating the two norms of the feature diagram, and the calculation amount of subsequent calculation is reduced.

Step A2: dividing the two norms of the feature maps corresponding to the channels into intervals with preset number, and determining the probability of each channel in each interval according to the number of the two norms of the feature maps corresponding to the channels in each interval.

In the embodiment of the application, after the two norms of each feature map are obtained, the channel where the two norms are located is determined according to the channel, and the two norms corresponding to each channel are divided according to the numerical value of the two norms, so that the probability of each channel in each interval is determined.

In one embodiment, if a layer of convolutional layers has 2 channels, and the layer of convolutional layers obtains 40 feature maps, i.e., 20 feature maps for each channel. And calculating the two norms of the 40 characteristic graphs, and performing interval division according to the numerical value of the two norms of each channel.

If the maximum value of the two norms corresponding to the first channel is 4.0 and the minimum value is 1.0, performing region division by taking 1.0 and 4.0 as boundaries, and if the two norms are divided into 3 regions, the range of each region is [1.0, 2.0 ]; [2.0, 3.0); [3.0,4.0]. And determining the probability of the first channel in 3 intervals according to the number of the corresponding two norms of the first channel in each interval. For example: the first interval [1.0, 2.0) has 5 two norms, the probability of the first interval is 0.25; the second interval [2.0, 3.0) has 5 second norms, then the probability of the second interval is 0.25; the third interval [3.0, 4.0] has 10 two norms, then the probability of the third interval is 0.5.

If the maximum value of the second norm corresponding to the second channel is 6.5 and the minimum value is 0.5, performing area division by taking 0.5 and 6.5 as boundaries, and if the second channel is divided into 3 intervals, the range of each interval is [0.5, 2.5 ]; [2.5, 4.5); [4.5,6.5]. And determining the probability of the second channel in 3 intervals according to the number of the second norm corresponding to the second channel in each interval. For example: the first interval [0.5, 2.5) has 8 two norms, the probability of the first interval is 0.4; the second interval [2.5, 4.5) has 2 second norms, then the probability of the second interval is 0.1; the third interval [4.5, 6.5] has 10 two norms, then the probability of the third interval is 0.5.

This yields the probability of each interval for each channel. It should be noted that the number of divided regions of each channel needs to be kept the same.

Step A3: and determining the conditional entropy between the feature graph corresponding to each channel and the feature graphs corresponding to other channels according to the probability of each channel in each interval and a conditional entropy formula.

In the embodiment of the present application, the conditional entropy is determined according to the probability of each channel and the conditional entropy formula obtained above. Specifically, the method can be implemented as steps B1-B3:

step B1: and summing the probabilities of the first channel in each interval to obtain the probability sum of the first channel.

Step B2: and summing the products of the conditional probability of the second channel in each interval relative to the probability of the first channel in each interval and the logarithm of the conditional probability to obtain the probability sum of the second channel.

Step B3: and taking the negative of the product of the probability sum of the first channel and the probability sum of the second channel as the conditional entropy of the second channel to the first channel.

In the embodiment of the present application, the conditional entropy is determined according to the following formula:

H(Y|X)＝-∑_x∈Xp(x)∑_x∈Yp(y|x)logp(y|x)； (2)；

wherein X represents a first channel, Y represents a second channel, and p (X) represents the probability obtained in the first channel; p (y | x) represents the conditional probability of the second channel over the first channel.

In one embodiment, to determine the conditional probability of the second channel to the first channel, the number of intervals of the second norm of the second channel in the first channel needs to be determined. For example: the first channel is divided into 3 intervals, and the range of each interval is [1.0, 2.0 ]; [2.0, 3.0); [3.0,4.0]. The second channel has 20 second norms, and the conditional probability of the second channel to the first channel is determined according to the number of the second norms corresponding to the second channel in each interval. For example: the first interval [1.0, 2.0) has 2 second norms, and the conditional probability of the second channel to the first channel in the first interval is 0.1; the second interval [2.0, 3.0) has 4 second norms, then the conditional probability of the second channel to the first channel in the second interval is 0.2; the third interval [3.0, 4.0] has 8 second norms, and the conditional probability of the second channel to the first channel in the third interval is 0.4.

Thus, the conditional entropy of the second channel to the first channel is determined according to the conditional entropy formula.

In the embodiment of the present application, in addition to representing the feature map by using the two-Norm, the feature map may also be represented by using the one-Norm (L1 Norm) and the global average pooling (global average pooling), which is not limited in the present application.

Step 203: and averaging the conditional entropy between the feature map corresponding to each channel and the feature maps corresponding to other channels in the convolutional layer aiming at each convolutional layer of the initial neural network model to obtain the average conditional entropy between the feature map corresponding to each channel and the feature maps corresponding to other channels.

In one embodiment, if there are 4 channels in the convolutional layer, and the conditional entropy of the first channel to the second channel is-0.5; the conditional entropy of the first channel to the third channel is-0.3; the conditional entropy of the first channel to the fourth channel is-0.1; the average conditional entropy of the first channel over the other channels is-0.3.

Step 204: and aiming at each convolution layer of the initial neural network model, carrying out pruning processing on the initial neural network model according to the average conditional entropy corresponding to each channel in the convolution layer to obtain an optimized neural network model.

In the embodiment of the present application, a certain number of channels may be limited to be pruned to obtain an optimized neural network model, which may be specifically implemented as: and pruning a preset number of channels according to the sequence of the average conditional entropy corresponding to each channel from small to large.

In one embodiment, if there are 4 channels in the convolutional layer, and the average conditional entropy of the first channel to the other channels is-0.3; the average conditional entropy of the second channel to the other channels is-0.5; the average conditional entropy of the third channel to the other channels is-0.2; the average conditional entropy of the fourth channel over the other channels is-0.4. If the convolutional layer needs to be pruned to remove 2 channels, pruning a second channel corresponding to-0.5 and a fourth channel corresponding to-0.4 according to the sequence from small to large, thereby completing model optimization and obtaining an optimized neural network model.

Therefore, the number of channels pruned by each convolution layer can be ensured to be the same by limiting the number of pruned channels.

Of course, in addition to limiting to prune a certain number of channels, a threshold may be set, and when it is determined that the average conditional entropy is not greater than the threshold, it is considered that the importance of the channel corresponding to the average conditional entropy is not high, and the channel is pruned. In one embodiment, if there are 4 channels in the convolutional layer, and the average conditional entropy of the first channel to the other channels is-0.3; the average conditional entropy of the second channel to the other channels is-0.5; the average conditional entropy of the third channel to the other channels is-0.2; the average conditional entropy of the fourth channel over the other channels is-0.4. If the limited threshold is-0.4, comparing the data of the average conditional entropy with the threshold, and pruning a second channel corresponding to-0.5 and a fourth channel corresponding to-0.4, thereby completing model optimization and obtaining the optimized neural network model.

It should be noted that, if the average conditional entropy of all channels is greater than the threshold, the channels are not pruned.

Step 205: and inputting the data to be processed into the optimized neural network model to obtain the processed data output by the optimized neural network model.

Therefore, by pruning the initial neural network model, the parameter quantity of the initial neural network model is reduced, and the calculation quantity of the initial neural network model is reduced, so that the model operation speed is increased while the effect is ensured, and the requirement on deployment equipment is reduced.

In order to make or obtain the more intuitive display of the conditional entropy, a matrix method can be adopted to arrange the obtained conditional entropy, which can be specifically implemented as steps C1-C2:

step C1: and acquiring the conditional entropy between the characteristic diagram corresponding to each channel and the characteristic diagram corresponding to the channel.

In the embodiment of the present application, the conditional entropy between the feature map corresponding to each channel and the feature map corresponding to itself is 0.

Step C2: generating a conditional entropy matrix according to the conditional entropy between the feature diagram corresponding to each channel and the feature diagrams corresponding to other channels and the conditional entropy between the feature diagram corresponding to each channel and the feature diagram corresponding to the channel; the rows and columns of the conditional entropy matrix are the number of channels of the convolutional layer, and the conditional entropy in each row of the conditional entropy matrix is the conditional entropy between the feature map corresponding to the same channel and the feature maps corresponding to other channels.

In one embodiment, if there are 4 channels in the convolutional layer, 16 conditional entropies can be obtained, which are: h (1|1), H (1|2), H (1|3), H (1|4), H (2|1), H (2|2), H (2|3), H (2|4), H (3|1), H (3|2), H (3|3), H (3|4), H (4|1), H (4|2), H (4|3), H (4| 4). Thus, a 4 × 4 conditional entropy matrix is generated:

[H(1|1)，H(1|2)，H(1|3)，H(1|4)；

H(2|1)，H(2|2)，H(2|3)，H(2|4)；

H(3|1)，H(3|2)，H(3|3)，H(3|4)；

H(4|1)，H(4|2)，H(4|3)，H(4|4)]；

thus, when calculating the average conditional entropy of each channel to other channels, it is only necessary to sum the conditional entropy matrices line by line and divide by the number of the conditional entropy of each channel to other channels.

After the section of channel pruning based on conditional entropy is introduced, the training of pruning models is explained below.

In order to make the effect of the optimized neural network model better, the optimized neural network model can be trained, and the method specifically comprises the following steps: and training the optimized neural network model.

Specifically, the optimized neural network model may be finetune, i.e., parameters in the optimized neural network model may be fine-tuned by retraining the optimized neural network model.

In the embodiment of the application, the method for retraining the optimized neural network model is the same as the method for pre-training the first part of models, a plurality of training samples with labels are input into the optimized neural network model, an output result is obtained, parameters in the optimized neural network model are adjusted according to the output result and the labels, and when the output result of the optimized neural network model meets the requirements, retraining of the optimized neural network model is completed. Therefore, the effect of optimizing the neural network model is further improved by retraining the optimized neural network model.

As shown in fig. 3, an embodiment of the present application further provides a complete method for data processing, including:

step 301: and acquiring a training sample of the data to be processed, inputting the training sample into the trained initial neural network model, and respectively acquiring the characteristic diagram output by each convolutional layer in the initial neural network model.

Step 302: and aiming at each convolutional layer of the initial neural network model, determining a characteristic diagram corresponding to each channel in the convolutional layer, and calculating two norms of the characteristic diagrams corresponding to each channel.

Step 303: dividing the two norms of the feature maps corresponding to the channels into intervals with preset number, and determining the probability of each channel in each interval according to the number of the two norms of the feature maps corresponding to the channels in each interval.

Step 304: and determining the conditional entropy between the feature graph corresponding to each channel and the feature graphs corresponding to other channels according to the probability of each channel in each interval and a conditional entropy formula.

Step 305: and acquiring the conditional entropy between the characteristic diagram corresponding to each channel and the characteristic diagram corresponding to the channel.

Step 306: generating a conditional entropy matrix according to the conditional entropy between the feature diagram corresponding to each channel and the feature diagrams corresponding to other channels and the conditional entropy between the feature diagram corresponding to each channel and the feature diagram corresponding to the channel; the rows and columns of the conditional entropy matrix are the number of channels of the convolutional layer, and the conditional entropy in each row of the conditional entropy matrix is the conditional entropy between the feature map corresponding to the same channel and the feature maps corresponding to other channels.

Step 307: and averaging the conditional entropy between the feature map corresponding to each channel and the feature maps corresponding to other channels in the convolutional layer aiming at each convolutional layer of the initial neural network model to obtain the average conditional entropy between the feature map corresponding to each channel and the feature maps corresponding to other channels.

Step 308: and (3) pruning channels corresponding to the average conditional entropy in a preset number in the convolutional layers according to the sequence from small to large aiming at each convolutional layer of the initial neural network model to obtain an optimized neural network model.

Step 309: and training the optimized neural network model.

Step 310: and inputting the data to be processed into the trained optimized neural network model to obtain the processed data output by the trained optimized neural network model.

Based on the same inventive concept, the embodiment of the application also provides a data processing device. As shown in fig. 4, the apparatus includes:

a feature map obtaining module 401, configured to obtain a training sample of data to be processed, input the training sample into a trained initial neural network model, and obtain feature maps output by each convolutional layer in the initial neural network model respectively;

a first conditional entropy obtaining module 402, configured to calculate, for each convolutional layer of the initial neural network model, a conditional entropy between a feature map corresponding to each channel in the convolutional layer and feature maps corresponding to other channels;

an average conditional entropy obtaining module 403, configured to average, for each convolutional layer of the initial neural network model, conditional entropy between a feature map corresponding to each channel in the convolutional layer and feature maps corresponding to other channels to obtain an average conditional entropy between the feature map corresponding to each channel and the feature maps corresponding to other channels;

a pruning module 404, configured to perform pruning on the initial neural network model according to an average conditional entropy corresponding to each channel in each convolutional layer of the initial neural network model, so as to obtain an optimized neural network model;

a data obtaining module 405, configured to input the data to be processed into the optimized neural network model, and obtain processed data output by the optimized neural network model.

In an embodiment, the pruning module 404 is specifically configured to prune a preset number of channels according to a descending order of the average conditional entropy corresponding to each channel.

In one embodiment, the first conditional entropy acquisition module 402 includes:

a determining two-norm unit, configured to determine, for each convolutional layer of the initial neural network model, a feature map corresponding to each channel in the convolutional layer, and calculate a two-norm of the feature map corresponding to each channel;

a probability determining unit, configured to divide the two norms of the feature maps corresponding to the channels into intervals of a preset number, and determine the probability of each channel in each interval according to the number of the two norms of the feature maps corresponding to each channel in each interval;

and the conditional entropy determining unit is used for determining the conditional entropy between the feature graph corresponding to each channel and the feature graphs corresponding to other channels according to the probability of each channel in each interval and a conditional entropy formula.

In one embodiment, determining the conditional entropy unit comprises:

In one embodiment, the apparatus further comprises:

a second conditional entropy obtaining module, configured to, for each convolutional layer of the initial neural network model, after calculating a conditional entropy between a feature map corresponding to each channel in the convolutional layer and feature maps corresponding to other channels, obtain a conditional entropy between the feature map corresponding to each channel and a feature map corresponding to the conditional entropy obtaining module 402;

In one embodiment, the apparatus further comprises:

a training module, configured to, for each convolutional layer of the initial neural network model, the pruning module 404 prunes the initial neural network model according to an average conditional entropy corresponding to each channel in the convolutional layer for each convolutional layer of the initial neural network model, and after an optimized neural network model is obtained, trains the optimized neural network model.

Based on the same technical concept, the present application further provides a terminal device 500, and referring to fig. 5, the terminal device 500 is configured to implement the methods described in the above various method embodiments, for example, implement the embodiment shown in fig. 2, and the terminal device 500 may include a memory 501, a processor 502, an input unit 503, and a display panel 504.

A memory 501 for storing computer programs executed by the processor 502. The memory 501 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the terminal device 500, and the like. The processor 502 may be a Central Processing Unit (CPU), a digital processing unit, or the like. The input unit 503 may be used to obtain a user instruction input by a user. The display panel 504 is configured to display information input by a user or information provided to the user, and in this embodiment of the present application, the display panel 504 is mainly used to display a display interface of each application program in the terminal device and a control entity displayed in each display interface. Alternatively, the display panel 504 may be configured in the form of a Liquid Crystal Display (LCD) or an organic light-emitting diode (OLED), and the like.

The embodiment of the present application does not limit the specific connection medium among the memory 501, the processor 502, the input unit 503, and the display panel 504. In the embodiment of the present application, the memory 501, the processor 502, the input unit 503, and the display panel 504 are connected by the bus 505 in fig. 5, the bus 505 is represented by a thick line in fig. 5, and the connection manner between other components is merely illustrative and not limited thereto. The bus 505 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.

The memory 501 may be a volatile memory (volatile memory), such as a random-access memory (RAM); the memory 501 may also be a non-volatile memory (non-volatile) such as, but not limited to, a read-only memory (rom), a flash memory (flash memory), a hard disk (HDD) or a solid-state drive (SSD), or the memory 501 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 501 may be a combination of the above memories.

The processor 502, for implementing the embodiment shown in fig. 2, includes:

a processor 502 for invoking a computer program stored in the memory 501 to perform the embodiment shown in fig. 2.

The embodiment of the present application further provides a computer-readable storage medium, which stores computer-executable instructions required to be executed by the processor, and includes a program required to be executed by the processor.

In some possible embodiments, the aspects of a method for querying a biological identity provided in the present application may also be implemented in the form of a program product, which includes program code for causing a terminal device to perform the steps of a method for querying a biological identity according to various exemplary embodiments of the present application described above in this specification when the program product is run on the terminal device. For example, the terminal device may perform the embodiment as shown in fig. 2.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product for a biometric identity lookup of an embodiment of the present application may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a computing device. However, the program product of the present application is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including a physical programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device over any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., over the internet using an internet service provider).

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such division is merely exemplary and not mandatory. Indeed, the features and functions of two or more units described above may be embodied in one unit, according to embodiments of the application. Conversely, the features and functions of one unit described above may be further divided into embodiments by a plurality of units.

Further, while the operations of the methods of the present application are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method of data processing, the method comprising:

aiming at each convolution layer of the initial neural network model, carrying out pruning processing on the initial neural network model according to the average conditional entropy corresponding to each channel in the convolution layer to obtain an optimized neural network model; and inputting the data to be processed into the optimized neural network model to obtain processed data output by the optimized neural network model.

2. The method of claim 1, wherein pruning the initial neural network model according to the average conditional entropy corresponding to each channel in the convolutional layer comprises:

and pruning a preset number of channels according to the sequence of the average conditional entropy corresponding to each channel from small to large.

3. The method of claim 1, wherein calculating, for each convolutional layer of the initial neural network model, a conditional entropy between a feature map corresponding to each channel in the convolutional layer and feature maps corresponding to other channels comprises:

determining a characteristic diagram corresponding to each channel in the convolutional layer aiming at each convolutional layer of the initial neural network model, and calculating two norms of the characteristic diagrams corresponding to each channel;

dividing the two norms of the feature maps corresponding to the channels into intervals with preset number, and determining the probability of each channel in each interval according to the number of the two norms of the feature maps corresponding to the channels in each interval;

and determining the conditional entropy between the feature graph corresponding to each channel and the feature graphs corresponding to other channels according to the probability of each channel in each interval and a conditional entropy formula.

4. The method according to claim 3, wherein the conditional entropy between the feature map corresponding to each channel and the feature maps corresponding to other channels according to the probability of each channel in each interval and a conditional entropy formula comprises:

summing the probability of the first channel in each interval to obtain the probability sum of the first channel; and the number of the first and second groups,

summing products of conditional probabilities of the second channel in each interval relative to the probability of the first channel in each interval and the logarithm of the conditional probabilities to obtain a probability sum of the second channel;

and taking the negative of the product of the probability sum of the first channel and the probability sum of the second channel as the conditional entropy of the second channel to the first channel.

5. The method of claim 1, wherein after calculating, for each convolutional layer of the initial neural network model, a conditional entropy between a feature map corresponding to each channel in the convolutional layer and feature maps corresponding to other channels, the method further comprises:

acquiring conditional entropy between the characteristic diagram corresponding to each channel and the characteristic diagram corresponding to the channel;

generating a conditional entropy matrix according to the conditional entropy between the feature diagram corresponding to each channel and the feature diagrams corresponding to other channels and the conditional entropy between the feature diagram corresponding to each channel and the feature diagram corresponding to the channel; the rows and columns of the conditional entropy matrix are the number of channels of the convolutional layer, and the conditional entropy in each row of the conditional entropy matrix is the conditional entropy between the feature map corresponding to the same channel and the feature maps corresponding to other channels.

6. The method according to claim 1, wherein after the pruning processing is performed on the initial neural network model according to the average conditional entropy corresponding to each channel in the convolutional layer for each convolutional layer of the initial neural network model, so as to obtain an optimized neural network model, the method further comprises:

and training the optimized neural network model.

7. A data processing apparatus, characterized in that the apparatus comprises:

8. The apparatus according to claim 7, wherein the first conditional entropy obtaining module comprises:

9. A computer-readable medium having stored thereon computer-executable instructions for performing the method of any one of claims 1-6.

10. A computing device, comprising:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.