WO2020048389A1

WO2020048389A1 - Method for compressing neural network model, device, and computer apparatus

Info

Publication number: WO2020048389A1
Application number: PCT/CN2019/103511
Authority: WO
Inventors: 金玲玲; 饶东升; 何文玮
Original assignee: 深圳灵图慧视科技有限公司
Priority date: 2018-09-05
Filing date: 2019-08-30
Publication date: 2020-03-12
Also published as: CN109165720A

Abstract

A method for compressing a neural network model, a device, a computer apparatus, and a computer readable medium. The method comprises: acquiring a first trained neural network model (S202); selecting one or more layers from layers of the first neural network model as layers to be compressed (S204); sorting the layers to be compressed according to a pre-determined rule (S206); and compressing, according to a sequential order from the sorting and by means of a genetic algorithm, a portion or all of the layers to be compressed, and obtaining a second neural network model (S208), wherein the accuracy of the second neural network model based on a pre-configured training sample is not less than a pre-determined accuracy value. The method, the device, the computer apparatus, and the computer readable medium compress a trained neural network model by means of a genetic algorithm, thereby reducing a computational load and storage space of the neural network model, and providing applicability of the same to apparatuses having limited memory and computational resources without compromising accuracy or compression of the neural network model.

Description

Neural network model compression method, device and computer equipment

Technical field

The present application relates to the field of computer application technology, and in particular, to a method and a device for compressing a neural network model, a computer device, and a computer-readable medium.

Background technique

In recent years, with the development of artificial intelligence, neural network (NN) algorithms have been widely used in image processing, speech recognition, natural language processing, and other fields. However, deep neural networks with good performance often have a large number of nodes (neurons) and model parameters, which not only have a large amount of calculation but also occupy a large part of the space in actual deployment, which limits its application to both storage and computing resources. Restricted equipment. Therefore, how to compress the neural network model is particularly important. Especially the compressed neural network model will be compressed, which will help to apply the trained neural network model to application scenarios such as embedded devices and integrated hardware devices.

Summary of the Invention

In view of the above problems, embodiments of the present invention provide a method and apparatus for compressing a neural network model, a computer device, and a computer-readable medium, which can compress a trained neural network model, thereby reducing the amount of calculation and storage of the neural network model. Space, enabling neural network models to be applied to devices with limited storage and computing resources.

A method for compressing a neural network model according to an embodiment of the present invention includes: obtaining a trained first neural network model; selecting at least one layer from each layer of the first neural network model as a layer to be compressed; and treating it according to a preset rule The compression layer is sorted; according to the order of sorting, the genetic algorithm is used to perform compression processing on part or all of the compression layer to obtain a second neural network model, wherein the accuracy of the second neural network model based on the preset training samples is not lower than Preset precision.

A neural network model compression device according to an embodiment of the present invention includes: an acquisition module for acquiring a trained first neural network model; a selection module for selecting at least one layer from each layer of the first neural network model As a layer to be compressed; a sorting module for sorting the compressed layers according to a preset rule; a compression module for performing a compression process on a part or all of the compressed layers using a genetic algorithm according to the sequencing order to obtain a second neural network Model, wherein the accuracy of the second neural network model based on the preset training samples is not lower than the preset accuracy.

A computer device according to an embodiment of the invention includes: a processor; and a memory on which executable instructions are stored, wherein the executable instructions, when executed, cause the processor to perform the aforementioned method.

A computer-readable medium according to an embodiment of the present invention has executable instructions stored thereon, wherein the executable instructions, when executed, cause a computer to perform the aforementioned method.

It can be seen from the above description that the solution of the embodiment of the present invention uses genetic algorithms to compress the trained neural network model, reduces the calculation amount and storage space of the neural network model, and enables it to be applied to storage and computing resources. Both are restricted devices. In addition, the solution of the embodiment of the present invention can simultaneously take into account the accuracy and compression of the neural network model.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an exemplary architecture diagram to which an embodiment of the present invention can be applied; FIG.

2 is a flowchart of a neural network model compression method according to an embodiment of the present invention;

FIG. 3 is a flowchart of a method for performing a compression process on a compression layer using a genetic algorithm according to an embodiment of the present invention; FIG.

Figure 3a is an example diagram of a neural network structure;

4 is a flowchart of a neural network model compression apparatus according to an embodiment of the present invention;

5 is a schematic diagram of a computer device according to an embodiment of the present invention;

FIG. 6 is a block diagram of an exemplary computer device suitable for use in implementing embodiments of the present invention, according to one embodiment of the present invention.

detailed description

The subject matter described herein will now be discussed with reference to example embodiments. It should be understood that these embodiments are discussed only to enable those skilled in the art to better understand and implement the subject matter described herein, and are not intended to limit the scope, applicability, or examples set forth in the claims. Changes may be made in the function and arrangement of elements in question without departing from the scope of protection of the present disclosure. Various examples can omit, substitute, or add various procedures or components as needed. For example, the methods described may be performed in a different order than that described, and various steps may be added, omitted, or combined. In addition, the features described with respect to some examples may be combined in other examples.

As used herein, the term "including" and variations thereof mean open terms, meaning "including but not limited to." The term "based on" means "based at least in part on." The terms "one embodiment" and "an embodiment" mean "at least one embodiment." The term "another embodiment" means "at least one other embodiment." The terms "first", "second", etc. may refer to different or the same objects. Other definitions can be included below, either explicitly or implicitly. Unless the context clearly indicates otherwise, the definition of a term is consistent throughout the specification.

The embodiment of the present invention uses a genetic algorithm to compress a neural network model. The genetic algorithm and the neural network are briefly described below.

Genetic algorithm (GA) is a kind of randomized search method that evolved from the evolutionary laws of the biological world (survival of the fittest, genetic mechanism of survival of the fittest). It was first proposed by Professor J. Holland of the United States in 1975. Its main feature is to directly operate on structural objects, and there are no restrictions on derivative and function continuity; it has inherent implicit parallelism and better global optimization Ability; using a probabilistic optimization method, it can automatically obtain and guide the optimized search space, adaptively adjust the search direction, and no need to determine the rules. These properties of genetic algorithms have been widely used in the fields of combinatorial optimization, machine learning, signal processing, adaptive control, and artificial life. It is a key technology in modern intelligent computing.

Neural Network (Neural Network, NN) is a research hotspot that has emerged in the field of artificial intelligence since the 1980s. It abstracts the human brain neuron network from the perspective of information processing, establishes some simple model, and forms different networks according to different connection methods. A neural network is a computing model that consists of a large number of nodes (or neurons) connected to each other. Each node represents a specific output function, called an activation function. Each connection between two nodes represents a weighted value for signals passing through the connection, which is called the connection weight. The output of the network is different depending on the connection mode, connection weight and incentive function of the network. The structural information of the neural network includes information such as nodes and connection rights.

FIG. 1 illustrates an exemplary system architecture 100 to which a neural network model compression method or a neural network model compression apparatus of an embodiment of the present invention can be applied.

As shown in FIG. 1, the system architecture 100 may include

servers

102, 104 and a network 106. The network 106 is a medium that provides a communication link between the server 102 and the server 104. The network 106 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The server 102 may be a server that provides various services, such as a data storage server that stores a trained neural network model.

The server 104 may be a server providing various services, such as a server for compressing a neural network model. The server 104 may obtain the trained neural network model from the server 102, analyze the neural network model, and perform processing such as analysis, and store the processing result (for example, the compressed neural network model).

It should be noted that the neural network model compression method in the embodiment of the present invention is generally executed by the server 104, and accordingly, the neural network model compression device is generally disposed in the server 104.

It should be noted that if the neural network model obtained by the server 104 is stored locally in advance, the system architecture may not include the server 102.

It should be understood that the number of servers and networks in FIG. 1 is merely exemplary. According to actual needs, there can be any number of servers and networks.

FIG. 2 shows a flowchart of a neural network model compression method according to an embodiment of the present invention. The method 200 shown in FIG. 2 may be performed by a computer or an electronic device with computing capabilities (such as the server 104 shown in FIG. 1). In addition, those skilled in the art will understand that any system that performs the method 200 is within the scope and spirit of embodiments of the present invention.

As shown in FIG. 2, in step S202, a trained first neural network model is obtained. In this embodiment, an electronic device (for example, the server 104 shown in FIG. 1) on which the neural network model compression method is run may be connected to a server (for example, The server 102) obtains a first neural network model to be compressed. Of course, if the first neural network model is stored locally in the electronic device in advance, the electronic device may also obtain the first neural network model locally.

In this embodiment, the first neural network model has been previously trained on a training sample, and its accuracy has met a preset accuracy requirement. The first neural network model in this embodiment may be any general neural network model, for example, it may be a back propagation neural network (BPNN: Back Propagation Neural Network) model, a convolutional neural network (CNN: Convolutional Neural Network) model, based on Convolutional neural network (RCNN: Region, Basic, Neural, Network) model of regional information, recurrent neural network (RNN: Recurrent Neural Network) model, long short-term memory model (LSTM: Long Short-Term Memory), or gated recurring unit (GRU) : Gated Recurrent Unit), in addition, it can also be other types of neural network models or cascade neural network models combined by multiple neural networks.

In step S204, at least one layer is selected from the layers of the first neural network model as a layer to be compressed. In this embodiment, the electronic device may select at least one layer from each layer of the obtained first neural network model as a layer to be compressed. For example, the above electronic device may select each layer of the first neural network model as a layer to be compressed.

In some optional implementations of this embodiment, if the first neural network model includes a convolution layer and a fully connected layer (FC), the above electronic device may select at least one convolution layer and at least one A fully connected layer acts as the layer to be compressed.

In step S206, the compressed layers are sorted according to a preset rule. In this embodiment, after the electronic device selects a layer to be compressed from the obtained first neural network model, the electronic device may sort the compressed layers according to a preset rule.

In an optional implementation manner of this embodiment, the foregoing electronic device may sort the compressed layers in order from the order of the number of levels of the layers to be compressed in the first neural network model. The first neural network model may include, for example, at least one input layer, at least one hidden layer, and at least one output layer. Each layer of the first neural network model may have a corresponding number of layers. As an example, it is assumed that the first neural network model includes an input layer, a hidden layer, and an output layer. The input layer may be at the first layer of the first neural network model, and the number of levels of the input layer may be 1. The hidden layer It can be at the second layer of the first neural network model, the number of levels of the hidden layer can be 2; the output layer can be at the third layer of the first neural network model, and the number of levels of the output layer can be 3; The order of the numbers from large to small is: output layer, hidden layer, and input layer.

In another optional implementation manner of this embodiment, the electronic device may also sort the layers to be compressed according to the contribution of the layer to be compressed to the loss of the first neural network model. Among them, the loss of the first neural network model can be transmitted to each layer of the first neural network model through a back propagation method (Back Propagation, BP), and then the contribution degree of each layer to the network loss is calculated, and then according to the contribution degree, Sort from small to large to the compression layer.

In some optional implementations of this embodiment, the layer to be compressed may be represented by a connection matrix, such as an N × N matrix C = (c _ij ) N × N represents a network structure with N nodes, where c _ij The value represents the connection right from node i to node j; c _ij = 0 means no connection from node i to node j; c _ii represents the bias of node i. Then the contribution of the layer to be compressed can be calculated using the following formula:

Where | c _ij | is the absolute value of the connection weight of the k-th layer to be compressed from node i to node j; i = 1, 2, 3, ..., N; j = 1, 2, 3, ..., N . The larger G _k indicates that the error generated by the k-th layer to be compressed has a greater impact on the performance of the entire neural network.

In step S208, according to the sequencing order, a part or all of the compression layer to be compressed is performed by using a genetic algorithm to obtain a second neural network model, wherein the accuracy of the second neural network model based on the preset training samples is not lower than the Set the accuracy. In this embodiment, the genetic algorithm is used to perform compression processing on the compression layer. The principle of its implementation is based on the principle of "the survival of the fittest" of the genetic algorithm, and taking into account the accuracy of the neural network model, using the "compression layer to be compressed" as a criterion. Various genetic operations are performed on the layer to be compressed, and finally a structure to be compressed is obtained. In specific implementation, based on the compressed fitness value, a chromosome individual that meets the requirements can be selected to perform genetic operations to generate a chromosome individual with the best network simplification (that is, the most simplified structure). A compression-treated layer to be compressed is obtained. In this embodiment, the compression-based fitness value refers to a fitness value that can reflect network simplification (or network complexity). For example, the larger the fitness value, the higher the network simplification, that is, effective compression is achieved. ; The smaller the fitness value, the lower the network simplification, that is, no effective compression is achieved. When the genetic algorithm is used to compress the neural network model, chromosomal individuals with a large fitness value can be selected to perform genetic operations. Finally, the chromosome with the highest fitness value among the chromosome individuals generated in the Nth generation population is the optimal chromosome individual.

It should be noted that in other embodiments of the present invention, the larger the fitness value, the higher the network complexity, that is, the effective compression is not achieved; the smaller the fitness value, the lower the network complexity, that is, the realization For effective compression, when using a genetic algorithm to compress a neural network model, a chromosome with a small fitness value can be selected to perform a genetic operation, and the chromosome with the smallest fitness value among the chromosome individuals generated in the Nth generation population is optimal Individual chromosomes.

In some optional implementations of this embodiment, in order to balance the accuracy and compression of the first neural network model, a preset accuracy can be set to constrain the compression of the first neural network model. It should be noted that the preset accuracy It can be the original accuracy of the first neural network model, or a value slightly lower than the original accuracy. The preset accuracy may be set manually, or may be set by the foregoing electronic device based on a preset algorithm, and the preset accuracy may be adjusted according to actual needs, which is not limited in this embodiment.

In some optional implementations of this embodiment, the compression process includes deleting at least one node of the layer to be compressed and its corresponding connection, and / or deleting at least one connection of the layer to be compressed, so as to reduce the network complexity of the layer to be compressed Degree, that is, to improve the network simplification of the layer to be compressed.

In some optional implementations of this embodiment, after performing compression processing on one of the layers to be compressed each time by using a genetic algorithm, the preset neural network model is used to train the current neural network model; if the current neural network model is used, The accuracy of is not lower than the preset accuracy. When there are uncompressed layers to be compressed, the compression processing of the next to-be-compressed layer is continued according to the sequence. When the compression processing has been performed on all the to-be-compressed layers, The current neural network model is determined as the second neural network model after compression processing; if the accuracy of the current neural network model is lower than a preset accuracy, the neural network model after performing compression processing on the previous layer to be compressed is determined as The second neural network model after compression processing.

As an example, it is assumed that the number of layers to be compressed is N. After sorting the N layers to be compressed, the obtained sequence is as follows: layer 1 to be compressed, layer 2 to be compressed, layer 3 to be compressed, ..., layer N to be compressed. First use genetic algorithm to perform compression processing on the layer 1 to be compressed, and then replace the uncompressed layer 1 to be compressed in the first neural network model with the compressed layer 1 to be compressed. The neural network model is trained to obtain the accuracy of the current neural network model, determine whether the accuracy is lower than the preset accuracy, and if it is not lower than the preset accuracy, continue to perform compression processing on the compression layer 2 and repeat the same steps, and so on , Until after the compression processing is performed on the layer N to be compressed, the accuracy of the current neural network model is still not lower than the preset accuracy, then the current neural network model (all layers to be compressed are replaced by the layer to be compressed after compression processing) is determined The second neural network model after compression processing. If compression processing is performed on a layer to be compressed, for example, layer 3 to be compressed, the current neural network model (at this time, the layers to be compressed 1, 2 and 3 of the first neural network model have been replaced with the compression to be compressed) Layer), if the accuracy is lower than the preset accuracy, the neural network model after the compression processing is performed on the previous layer to be compressed (that is, the layers 1 and 2 to be compressed of the first neural network model are replaced by the compression to be compressed layers) Determined as the second neural network model after compression processing.

It should be noted that, when the above-mentioned electronic device trains the neural network model after compression processing, the current neural network model can be fine-tuning. In this way, the neural network model that is slightly lower than the preset accuracy can be fine-tuned to meet the preset accuracy requirement, so that the neural network model can be further compressed.

In this embodiment, the electronic device may store a second neural network model obtained through compression processing, for example, a server that is stored locally (for example, a hard disk or a memory) of the electronic device or is remotely connected to the electronic device.

As can be seen from the above description, the solution provided by the embodiment of the present invention utilizes a genetic algorithm to compress a trained neural network model, reduces the calculation amount and storage space of the neural network model, and enables it to be applied to storage and computing resources Both are restricted devices. Further, the solution of the embodiment of the present invention can simultaneously take into account the accuracy and compression of the neural network model.

FIG. 3 shows a flowchart of a method for performing a compression process on a compression layer using a genetic algorithm according to an embodiment of the present invention. The method 300 shown in FIG. 3 may be implemented by a computer or an electronic device having computing capabilities (for example, as shown in FIG. 1). Server 104). In addition, those skilled in the art will understand that any system that performs the method 300 is within the scope and spirit of embodiments of the present invention.

As shown in FIG. 3, in step S302, network structure information of a layer to be compressed is acquired. The network structure can be represented by a connection matrix, such as an N × N matrix C = (c _ij ) N × N represents a network structure with N nodes, where the value of c _ij represents the connection right from node i to node j; c _ij = 0 indicates no connection from node i to node j; c _ii indicates the offset of node i.

In step S304, according to the network structure information of the layer to be compressed, the layer to be compressed is encoded to obtain a chromosome. The structure of the neural network needs to be expressed as a genetic algorithm's individual chromosome code in order to be able to perform calculations with the genetic algorithm. In one embodiment, it is assumed that there are N neurons in the layer to be compressed, and the nodes are numbered from 1 to N. An N × N matrix may be used to represent the network structure of the layer to be compressed. The neural network structure with 7 nodes shown in FIG. 3a is taken as an example to illustrate the method for encoding the neural network model in this embodiment. Table 1 is the node connection relationship of the neural network structure. In Table 1, the element corresponding to (i, j) in the matrix represents the connection relationship from the i-th node to the j-th node. Because the embodiment of the present invention does not involve changing the connection right of the neural network model to be compressed when compressing the neural network model, this embodiment expresses the connection relationship of the nodes as a form of 0, 1, -1, where "0 "1" indicates no connection; "1" indicates that the connection weight is 1, which has an excitation effect, which is indicated by a solid line in Figure 3a; "-1" indicates that the connection weight is -1, which has an inhibitory effect, as shown in the figure. 3a is indicated by a dotted line. It can be seen that Table 1 is equivalent to the structure shown in Figure 3a.

Table 1. Example neural network structure connection relationship in this embodiment

According to the node connection relationship shown in Table 1, the coding of the neural network can be expressed as a digital string form composed of 0, 1, and -1, from element (3,1) to element (7,6) from left to right, Connected from top to bottom, they form the following chromosome code:

In step S306, based on the chromosomes obtained above, population initialization is performed to generate an initial population. In this embodiment, during specific implementation, a replication operation may be performed on the chromosomes obtained above, a predetermined number of chromosome individuals are randomly generated, and the set of these chromosome individuals is used as an initial population. The size of the initial population is determined by the population size M, which may be, for example, but not limited to, 10-100. Because of the replication operation, all chromosomal individuals in the initial population are the same.

In step S308, the fitness value of individual chromosomes in the population is calculated. In some optional implementations of this embodiment, the fitness function may use the following formula:

or

Among them, f (i, t) represents the fitness of the i-th individual of the t-th generation; E (i, t) represents the network error of the neural network model corresponding to the i-th individual of the t-th generation; H (i, t) represents Network simplification of the i-th individual in the t-th generation.

In specific implementation, E (i, t) can be calculated using the following formula:

among them,

The neural network model corresponding to the i-th individual of the t-th generation is based on the expected output value and the actual output value of the preset q-th training sample. The smaller the network error value, the higher the accuracy.

H (i, t) can be calculated using the following formula:

Among them, m (i, t) is the number of nodes of the i-th individual in the t-th generation. The fewer the number of nodes, the larger the network simplification value, the higher the network simplification, and the simpler the neural network model.

In this implementation manner, the network error E (i, t) is used to constrain the compression process of the neural network model to be compressed, and both accuracy and compression can be taken into account at the same time. The smaller the network error E (i, t), the higher the accuracy of the neural network model after compression processing. The larger the network simplification value, the simpler the structure of the neural network model after compression processing. Therefore, in this embodiment, the smaller the network error and the larger the simplification degree of the network is, the larger the fitness value is.

In other optional implementations of this embodiment, the fitness function may also use the following formula:

In this embodiment, the fitness function includes formula ① and formula ②. Among them, formula ① is a fitness function based on network errors, which reflects the accuracy of the neural network model; formula ② is a fitness function based on network simplification, which reflects the compression of neural network models. Therefore, in this embodiment, the accuracy-based fitness value and the compression-based fitness value of the individual chromosome are calculated separately.

In step S310, it is determined whether a termination condition is reached. The termination condition may include a preset threshold for the number of iterations or a set convergence condition. The number of iterations can be set to, for example, but not limited to, 500, but it is determined that the termination condition is reached when the number of iterations reaches 500. The convergence condition may be set, for example, but not limited to, when the fitness value meets a certain condition, it is determined that the termination condition is reached. For example, the fitness value may be set to be greater than a preset threshold.

In step S312, if it is determined in step S310 that the termination condition is not met, then using the fitness value as a standard, select a chromosome individual whose fitness value meets the requirements and perform genetic operations such as replication, crossover, or mutation to generate a new generation of population. Then, it returns to step S308. According to the fitness value function of S308, this embodiment selects a chromosomal individual with a relatively large fitness value to perform a genetic operation, and eliminates some chromosomal individuals with a small fitness value.

When the accuracy-based fitness value and the compression-based fitness value of the individual chromosomes are separately calculated, the selection criteria of this embodiment may adopt the following steps: (1) Calculate the accuracy-based fitness of each individual chromosome individual in the population by formula ① Value, and then calculate the first selection probability of the individual being selected, and select the first chromosome individual according to the first selection probability; (2) Calculate the fitness value of each chromosome individual in the population based on the compression, and then calculate the individual The selected second selection probability is to select a second chromosome individual from the first chromosome individuals selected in step (1) according to the second selection probability. Optionally, before selecting chromosome individuals according to the selection probability, the chromosome individuals with the highest and lowest fitness values in the current population can be found, the best chromosome individuals are retained and directly entered into the next generation, and the worst chromosome individuals are eliminated, which can ensure Pass on good genes to the next generation. The selection strategy of this embodiment can restrict the compression process of the compression layer through accuracy constraints, and can ensure that chromosome individuals with small network errors and large network simplifications enter the next generation.

In some optional implementations of this embodiment, the fitness-proportion selection method (roulette selection method) can be used as a commonly used selection method, which means that the higher the fitness, the greater the probability of being selected, that is, :

Among them, p (i, t) is the selection probability of the i-th individual in the t-th generation, f (i, t) is the fitness of the i-th individual in the t-th generation, and f (sum, t) is the total fitness of the t-th population. .

Perform replication, crossover, or mutation operations on selected chromosomes. Among them, the replication operation refers to directly copying the selected parental chromosome individuals from the current generation to the new generation of individuals without any change. Cross operation refers to randomly selecting two parental chromosome individuals from the population according to the above-mentioned selection method, replacing some components of the two parental chromosome individuals with each other to form a new offspring chromosome individual. The mutation operation refers to randomly selecting a parent chromosome individual from the population according to the selection method described above, and then randomly selecting a node as a mutation point in the expression of the individual, and changing the value of the mutation point gene to another Valid values, forming new offspring chromosome individuals.

Whether a cross operation occurs can be determined according to the cross probability P _c . The method is to randomly generate a random number P between 0 and 1. When P ≤ P _c , the cross operation occurs, and when P> P _c , the cross does not occur. Similarly, whether the mutation operation occurs can also be determined according to the mutation probability P _m . Since it is the prior art, the description thereof is omitted here.

In this embodiment, when performing a crossover operation, a crossover point may be randomly selected in each parent chromosome individual with a certain probability, and the lower part of the crossover point is referred to as a crossover segment. After the first parental chromosome individual deletes its crossover segment, the second parental chromosome individual's crossover segment is inserted at his intersection, so that the first offspring chromosome individual is generated. Similarly, after the second parent chromosome individual deletes its cross section, the first parent chromosome individual's cross section is inserted at his intersection to form a second offspring chromosome individual. In this case, if the two parents of the selected chromosomes are the same, but due to their different intersections, the resulting offspring chromosomes are also different, which effectively avoids inbreeding and improves the global search ability.

In this embodiment, when performing the mutation operation, one of the following operations may be adopted randomly: (a) delete at least one node in the hidden layer of the neural network model and its corresponding connection; (b) delete the hidden layer of the neural network model At least one of the connections; (c) Randomly repair the deleted node or connection with a certain probability; (d) Add hidden layer nodes to randomly generate corresponding connection weights. Among them, deleting nodes always precedes adding nodes, and the number of added nodes should not be greater than the number of deleted nodes. At the same time, only when the deleted nodes cannot produce a good child, the nodes are added. Such a mutation operation can guarantee the method Always go in the direction of compressing the neural network model.

In step S314, if the determination result in step S210 is that the termination condition is reached, the chromosome individual with the best fitness value is output, so as to obtain the compressed layer to be compressed.

In some optional implementations of this embodiment, the optimal chromosome individual may be set to max (f, i), and the chromosome individual having the greatest fitness when the termination condition is reached is regarded as the optimal chromosome individual. By performing the decoding operation on the optimal chromosome individual, the optimal network structure of the layer to be compressed can be obtained.

FIG. 4 shows a schematic diagram of a neural network model compression device according to an embodiment of the present invention. The device 400 shown in FIG. 4 corresponds to the above-mentioned neural network model compression method. Since the embodiment of the device 400 is basically similar to the method embodiment, it is described relatively simply. For the relevant part, refer to the description of the method embodiment. The device 400 may be implemented in software, hardware, or a combination of software and hardware, and may be installed in a computer or other suitable electronic device with computing capabilities.

As shown in FIG. 4, the device 400 may include an acquisition module 402, a selection module 404, a sorting module 406, and a compression module 408. The obtaining module 402 is configured to obtain a trained first neural network model. The selection module 404 is configured to select at least one layer from each layer of the first neural network model as a layer to be compressed. The sorting module 406 is configured to sort the compressed layers according to a preset rule. The compression module 408 is configured to perform a compression process on a part or all of the compression layer using a genetic algorithm to obtain a second neural network model according to the sequencing order, wherein the accuracy of the second neural network model based on the preset training samples is not lower than Preset precision.

In an embodiment of the apparatus 400, the ranking module 406 is specifically configured to sort the layers to be compressed according to the number of levels of the layers to be compressed in the first neural network model.

In another implementation of the apparatus 400, the ranking module 406 is specifically configured to sort the layers to be compressed according to the contribution of the layer to be compressed to the loss of the first neural network model.

In yet another embodiment of the apparatus 400, the compression module 408 includes a training unit and a determination unit. The training unit is configured to train a current neural network model using a preset training sample after performing compression processing on one of the layers to be compressed each time by using a genetic algorithm. The determining unit is configured to: if the accuracy of the current neural network model is not lower than a preset accuracy, when there is a layer to be compressed that has not been compressed yet, continue to perform compression processing on the next layer to be compressed, and When the compression layer performs compression processing, the current neural network model is determined as the second neural network model obtained after the compression processing; if the accuracy of the current neural network model is lower than the preset accuracy, the previous layer to be compressed is compressed. The processed neural network model is determined as the second neural network model obtained after compression processing.

In still another embodiment of the apparatus 400, the compression module 408 further includes an acquisition unit, a coding unit, an initialization unit, a calculation unit, a judgment unit, a genetic operation unit, and an output unit. The obtaining unit is configured to obtain network structure information of a layer to be compressed. The encoding unit is configured to encode the layer to be compressed according to the network structure information of the layer to be compressed to obtain a chromosome. The initialization unit is configured to perform population initialization according to a chromosome obtained to generate an initial population. The calculation unit is used to calculate the fitness value of the individual chromosomes in the population. The judging unit is used to judge whether the termination condition is reached. The genetic operation unit is used to select a chromosome individual whose fitness value meets the requirements based on the fitness value if the termination condition is not reached, and perform replication, crossover or mutation operations to generate a new generation of population. The output unit is used to output the chromosome individual with the best fitness value if the termination condition is reached, so as to obtain the compressed layer to be compressed.

In still another embodiment of the apparatus 400, the calculation unit is further configured to calculate the precision-based and compression-based fitness values of the individual chromosomes in the population, respectively. Correspondingly, the genetic operation unit is further configured to obtain the first selection probability of the chromosome individuals in the population according to the fitness value based on the accuracy, select the first chromosome individual according to the first selection probability, and obtain the compression value based on the fitness value. The second selection probability of the chromosomal individuals in the population, and the second chromosomal individual is selected from the first chromosomal individuals according to the second selection probability; the second chromosomal individuals are copied, crossed or mutated to generate a new generation of population.

Figure 5 shows a schematic diagram of a computer device according to an embodiment of the invention. As shown in FIG. 5, the computer device 500 may include a processor 502 and a memory 504, where the memory 502 stores executable instructions, where the executable instructions, when executed, cause the processor 502 to execute the instructions shown in FIG. The method 200 or the method 300 shown in FIG. 3.

FIG. 6 shows a block diagram of an exemplary computer device suitable for use in implementing embodiments of the present invention. The computer device 600 shown in FIG. 6 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present invention.

As shown in FIG. 6, the computer device 600 is implemented in the form of a general-purpose computing device. The components of the computer device 600 may include, but are not limited to, a processor 602, a system memory 604, and a bus 606 connecting different system components (including the processor 602 and the system memory 604).

The bus 606 represents one or more of several types of bus structures, including a memory bus or a memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local area bus using any of a variety of bus structures. By way of example, these architectures include, but are not limited to, the Industry Standard Architecture (ISA) bus, the Micro Channel Architecture (MAC) bus, the enhanced ISA bus, the Video Electronics Standards Association (VESA) local area bus, and peripheral component interconnects ( PCI) bus.

Computer device 600 typically includes a variety of computer system-readable media. These media can be any available media that can be accessed by the computer device 600, including volatile and non-volatile media, removable and non-removable media.

System memory 604 may include computer system-readable media in the form of volatile memory, such as random access memory (RAM) 608 and / or cache memory 610. Computer device 600 may further include other removable / non-removable, volatile / nonvolatile computer system storage media. For example only, the storage system 612 may be used to read and write non-removable, non-volatile magnetic media (not shown in FIG. 6 and is commonly referred to as a "hard drive"). Although not shown in FIG. 6, a disk drive for reading and writing to a removable non-volatile disk (such as a "floppy disk"), and a removable non-volatile optical disk (such as a CD-ROM, DVD-ROM, etc.) may be provided. Or other optical media). In these cases, each drive may be connected to the bus 606 through one or more data medium interfaces. The system memory 604 may include at least one program product having a set (for example, at least one) of program modules configured to perform the functions of the embodiment of FIG. 1 or FIG. 2 of the present invention.

A program / utility tool 614 having a set (at least one) of program modules 616 may be stored in, for example, system memory 604. Such program modules 616 include, but are not limited to, an operating system, one or more application programs, other program modules, and programs Data, each or some combination of these examples may include an implementation of the network environment. The program module 616 generally performs the functions and / or methods in the embodiment of FIG. 1 or FIG. 2 described in the present invention.

The computer device 600 may also communicate with one or more external devices 700 (such as a keyboard, pointing device, display 800, etc.), and may also communicate with one or more devices that enable a user to interact with the computer device 600, and / or with Any device (eg, network card, modem, etc.) that enables the computer device 600 to communicate with one or more other computing devices. This communication can take place through an input / output (I / O) interface 618. Moreover, the computer device 600 may also communicate with one or more networks (such as a local area network (LAN), a wide area network (WAN), and / or a public network, such as the Internet) through the network adapter 520. As shown, the network adapter 620 communicates with other modules of the computer device 600 through the bus 606. It should be understood that although not shown in the figures, other hardware and / or software modules may be used in conjunction with the computer device 600, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives And data backup storage systems.

The processor 602 executes various functional applications and data processing by running a program stored in the system memory 604, for example, implementing the neural network model compression method shown in the foregoing embodiment.

An embodiment of the present invention further provides a computer-readable medium having executable instructions stored thereon, where the executable instructions, when executed, cause a computer to execute the method 200 shown in FIG. 2 or the method shown in FIG. 3 300.

The computer-readable medium of this embodiment may include the RAM 608 and / or the cache memory 610 and / or the storage system 612 in the system memory 604 in the embodiment shown in FIG. 6.

With the development of science and technology, the spread of computer programs is no longer limited to tangible media, and can also be downloaded directly from the network or obtained by other means. Therefore, the computer-readable medium in this embodiment may include not only a tangible medium but also an intangible medium.

The computer-readable medium of this embodiment may adopt any combination of one or more computer-readable media. The computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium. The computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (non-exhaustive list) of computer-readable storage media include: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing. In this document, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device.

The computer-readable signal medium may include a data signal in baseband or propagated as part of a carrier wave, which carries a computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .

Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for performing the operations of the present invention may be written in one or more programming languages, or combinations thereof, including programming languages such as Java, Smalltalk, C ++, and also conventional Procedural programming language, such as "C" or similar programming language. The program code can be executed entirely on the user's computer, partly on the user's computer, as an independent software package, partly on the user's computer, partly on a remote computer, or entirely on a remote computer or server. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or wide area network (WAN), or it can be connected to an external computer (such as through the Internet using an Internet service provider) connection).

Those skilled in the art should understand that the embodiments of the present invention may be provided as a method, an apparatus, or a computer program product. Therefore, the embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the embodiments of the present invention may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, etc.) containing computer-usable program code.

Embodiments of the present invention are described with reference to flowcharts and / or block diagrams of methods, apparatuses, and computer program products according to embodiments of the present invention. It should be understood that each process and / or block in the flowcharts and / or block diagrams, and combinations of processes and / or blocks in the flowcharts and / or block diagrams can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing terminal device Means are generated for implementing the functions specified in one or more of the flowcharts and / or one or more of the block diagrams.

The specific embodiments described above in conjunction with the drawings describe exemplary embodiments, but do not represent all embodiments that can be implemented or fall within the scope of protection of the claims. The term "exemplary" used throughout this specification means "serving as an example, instance, or illustration" and does not mean "preferred" or "having an advantage" over other embodiments. The specific embodiments include specific details for the purpose of providing an understanding of the techniques described. However, these techniques can be implemented without these specific details. In some examples, to avoid obscuring the concept of the described embodiments, well-known structures and devices are shown in block diagram form.

The foregoing description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. It will be apparent to those skilled in the art that various modifications can be made to the present disclosure, and the general principles defined herein can also be applied to other variations without departing from the scope of protection of the present disclosure. . Accordingly, the disclosure is not limited to the examples and designs described herein, but is in accordance with the broadest scope consistent with the principles and novelty features disclosed herein.

Claims

Neural network model compression methods, including:

Obtaining a trained first neural network model;

Selecting at least one layer from each layer of the first neural network model as a layer to be compressed;

Sort the compressed layers according to a preset rule;

According to the sequencing order, the genetic algorithm is used to perform compression processing on part or all of the compression layer to obtain a second neural network model, wherein the accuracy of the second neural network model based on the preset training samples is not lower than the preset accuracy.
The method according to claim 1, wherein sorting the compressed layers according to a preset rule comprises:

Sort the layers to be compressed according to the number of levels of the layers to be compressed in the first neural network model.
The method according to claim 1, wherein sorting the compressed layers according to a preset rule comprises:

The layers to be compressed are ranked in order of their contribution to the loss of the first neural network model from small to large.
The method according to any one of claims 1 to 3, wherein, according to an order of sorting, using a genetic algorithm to perform compression processing on part or all of the compression layer to obtain a second neural network model, comprising:

After each time using a genetic algorithm to perform compression processing on one of the layers to be compressed, the current neural network model is trained using preset training samples;

If the accuracy of the current neural network model is not lower than the preset accuracy, when there are remaining layers to be compressed that have not yet been compressed, the compression processing will continue to be performed on the next to-be-compressed layer according to the sequencing order. When performing compression processing on the layer to be compressed, the current neural network model is determined as the second neural network model after the compression processing; if the accuracy of the current neural network model is lower than a preset accuracy, the previous layer to be compressed is compressed. The processed neural network model is determined as the second neural network model after compression processing.
The method according to claim 1, wherein using a genetic algorithm to perform compression processing on part or all of the compression layer comprises:

Obtain the network structure information of the layer to be compressed;

Encode the layer to be compressed according to the network structure information of the layer to be compressed to obtain a chromosome;

According to a chromosome obtained, perform population initialization to generate an initial population;

Calculate the fitness value of individual chromosomes in the population;

Determine whether the termination conditions are met;

If the termination condition is not met, then use the fitness value as a standard, select some chromosome individuals whose fitness value meets the requirements, perform replication, crossover or mutation operations to generate a new generation of population, and then return to calculate the fitness value of the chromosome individuals in the population step;

If the termination condition is reached, a chromosome individual with the best fitness value is output, so as to obtain a layer to be compressed after compression processing.
The method according to claim 5, wherein calculating the fitness value of the individual chromosome in the population comprises:

Calculate the precision-based and compression-based fitness values of individual chromosomes in the population;

Correspondingly, using the fitness value as a standard, select some chromosomal individuals whose fitness value meets the requirements, and perform replication, crossover or mutation operations to generate a new generation of population, including:

A first selection probability of chromosomal individuals in the population is obtained according to the fitness value based on precision, a first chromosome individual is selected according to the first selection probability, and a second selection probability of chromosome individuals in the population is obtained according to the fitness value based on compression , Selecting a second chromosome individual from the first chromosome individual according to the second selection probability; performing replication, crossover, or mutation operations on the second chromosome individual to generate a new generation of population.
Neural network model compression device, including:

An acquisition module, configured to acquire a trained first neural network model;

A selection module for selecting at least one layer from each layer of the first neural network model as a layer to be compressed;

A sorting module for sorting compressed layers according to a preset rule;

The compression module is configured to perform a compression process on a part or all of the compression layer using a genetic algorithm to obtain a second neural network model according to the sequencing order, wherein the accuracy of the second neural network model based on preset training samples is not lower than Preset precision.
The apparatus according to claim 7, wherein the sorting module is specifically configured to:

Sort the layers to be compressed according to the number of levels of the layers to be compressed in the first neural network model.
The apparatus according to claim 7, wherein the sorting module is specifically configured to:

The layers to be compressed are ranked in order of their contribution to the loss of the first neural network model from small to large.
The apparatus according to any one of claims 7-9, wherein the compression module includes:

A training unit, configured to train a current neural network model using a preset training sample after performing compression processing on one of the layers to be compressed each time by using a genetic algorithm;

A determining unit configured to, if the accuracy of the current neural network model is not lower than a preset accuracy, continue to perform compression processing on the next to-be-compressed layer according to the sequence when there are remaining to-be-compressed layers that have not been compressed yet When compression processing has been performed on all layers to be compressed, the current neural network model is determined as the second neural network model obtained after compression processing; if the accuracy of the current neural network model is lower than the preset accuracy, the previous neural network model is The neural network model after the compression layer performs the compression process is determined as the second neural network model obtained after the compression process.
The apparatus according to claim 7, wherein the compression module further comprises:

An obtaining unit, configured to obtain network structure information of a layer to be compressed;

A coding unit, configured to code the layer to be compressed according to the network structure information of the layer to be compressed to obtain a chromosome;

An initialization unit, configured to initialize a population based on a chromosome obtained to generate an initial population;

A calculation unit for calculating fitness values of individual chromosomes in a population;

A judging unit, configured to judge whether a termination condition is reached;

The genetic operation unit is used to select a chromosome individual whose fitness value meets the requirements based on the fitness value if the termination condition is not reached, and perform replication, crossover or mutation operations to generate a new generation of population;

An output unit is configured to output a chromosome individual with an optimal fitness value if a termination condition is reached, so as to obtain a layer to be compressed after compression processing.
The apparatus according to claim 11, wherein the computing unit is further configured to:

Calculate the precision-based and compression-based fitness values of individual chromosomes in the population;

Accordingly, the genetic operation unit is further used for:

A first selection probability of chromosomal individuals in the population is obtained according to the fitness value based on precision, a first chromosome individual is selected according to the first selection probability, and a second selection probability of chromosome individuals in the population is obtained according to the fitness value based on compression , Selecting a second chromosome individual from the first chromosome individual according to the second selection probability; performing replication, crossover, or mutation operations on the second chromosome individual to generate a new generation of population.
A computer device including:

Processor; and

The memory stores executable instructions thereon, wherein the executable instructions, when executed, cause the processor to perform the method according to any one of claims 1-4.
A computer-readable medium having executable instructions stored thereon, wherein the executable instructions, when executed, cause a computer to perform the method of any one of claims 1-4.