WO2020048389A1 - Procédé de compression de modèle de réseau neuronal, dispositif et appareil d'ordinateur - Google Patents

Procédé de compression de modèle de réseau neuronal, dispositif et appareil d'ordinateur Download PDF

Info

Publication number
WO2020048389A1
WO2020048389A1 PCT/CN2019/103511 CN2019103511W WO2020048389A1 WO 2020048389 A1 WO2020048389 A1 WO 2020048389A1 CN 2019103511 W CN2019103511 W CN 2019103511W WO 2020048389 A1 WO2020048389 A1 WO 2020048389A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
network model
compressed
layer
compression
Prior art date
Application number
PCT/CN2019/103511
Other languages
English (en)
Chinese (zh)
Inventor
金玲玲
饶东升
何文玮
Original Assignee
深圳灵图慧视科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳灵图慧视科技有限公司 filed Critical 深圳灵图慧视科技有限公司
Publication of WO2020048389A1 publication Critical patent/WO2020048389A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/086Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Definitions

  • the present application relates to the field of computer application technology, and in particular, to a method and a device for compressing a neural network model, a computer device, and a computer-readable medium.
  • neural network In recent years, with the development of artificial intelligence, neural network (NN) algorithms have been widely used in image processing, speech recognition, natural language processing, and other fields.
  • deep neural networks with good performance often have a large number of nodes (neurons) and model parameters, which not only have a large amount of calculation but also occupy a large part of the space in actual deployment, which limits its application to both storage and computing resources. Restricted equipment. Therefore, how to compress the neural network model is particularly important. Especially the compressed neural network model will be compressed, which will help to apply the trained neural network model to application scenarios such as embedded devices and integrated hardware devices.
  • embodiments of the present invention provide a method and apparatus for compressing a neural network model, a computer device, and a computer-readable medium, which can compress a trained neural network model, thereby reducing the amount of calculation and storage of the neural network model.
  • Space enabling neural network models to be applied to devices with limited storage and computing resources.
  • a method for compressing a neural network model includes: obtaining a trained first neural network model; selecting at least one layer from each layer of the first neural network model as a layer to be compressed; and treating it according to a preset rule
  • the compression layer is sorted; according to the order of sorting, the genetic algorithm is used to perform compression processing on part or all of the compression layer to obtain a second neural network model, wherein the accuracy of the second neural network model based on the preset training samples is not lower than Preset precision.
  • a neural network model compression device includes: an acquisition module for acquiring a trained first neural network model; a selection module for selecting at least one layer from each layer of the first neural network model As a layer to be compressed; a sorting module for sorting the compressed layers according to a preset rule; a compression module for performing a compression process on a part or all of the compressed layers using a genetic algorithm according to the sequencing order to obtain a second neural network Model, wherein the accuracy of the second neural network model based on the preset training samples is not lower than the preset accuracy.
  • a computer device includes: a processor; and a memory on which executable instructions are stored, wherein the executable instructions, when executed, cause the processor to perform the aforementioned method.
  • a computer-readable medium has executable instructions stored thereon, wherein the executable instructions, when executed, cause a computer to perform the aforementioned method.
  • the solution of the embodiment of the present invention uses genetic algorithms to compress the trained neural network model, reduces the calculation amount and storage space of the neural network model, and enables it to be applied to storage and computing resources. Both are restricted devices.
  • the solution of the embodiment of the present invention can simultaneously take into account the accuracy and compression of the neural network model.
  • FIG. 1 is an exemplary architecture diagram to which an embodiment of the present invention can be applied;
  • FIG. 1 is an exemplary architecture diagram to which an embodiment of the present invention can be applied;
  • FIG. 2 is a flowchart of a neural network model compression method according to an embodiment of the present invention
  • FIG. 3 is a flowchart of a method for performing a compression process on a compression layer using a genetic algorithm according to an embodiment of the present invention
  • Figure 3a is an example diagram of a neural network structure
  • FIG. 4 is a flowchart of a neural network model compression apparatus according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a computer device according to an embodiment of the present invention.
  • FIG. 6 is a block diagram of an exemplary computer device suitable for use in implementing embodiments of the present invention, according to one embodiment of the present invention.
  • the term “including” and variations thereof mean open terms, meaning “including but not limited to.”
  • the term “based on” means “based at least in part on.”
  • the terms “one embodiment” and “an embodiment” mean “at least one embodiment.”
  • the term “another embodiment” means “at least one other embodiment.”
  • the terms “first”, “second”, etc. may refer to different or the same objects. Other definitions can be included below, either explicitly or implicitly. Unless the context clearly indicates otherwise, the definition of a term is consistent throughout the specification.
  • the embodiment of the present invention uses a genetic algorithm to compress a neural network model.
  • the genetic algorithm and the neural network are briefly described below.
  • Genetic algorithm is a kind of randomized search method that evolved from the evolutionary laws of the biological world (survival of the fittest, genetic mechanism of survival of the fittest). It was first proposed by Professor J. Holland of the United States in 1975. Its main feature is to directly operate on structural objects, and there are no restrictions on derivative and function continuity; it has inherent implicit parallelism and better global optimization Ability; using a probabilistic optimization method, it can automatically obtain and guide the optimized search space, adaptively adjust the search direction, and no need to determine the rules. These properties of genetic algorithms have been widely used in the fields of combinatorial optimization, machine learning, signal processing, adaptive control, and artificial life. It is a key technology in modern intelligent computing.
  • Neural Network (Neural Network, NN) is a research hotspot that has emerged in the field of artificial intelligence since the 1980s. It abstracts the human brain neuron network from the perspective of information processing, establishes some simple model, and forms different networks according to different connection methods.
  • a neural network is a computing model that consists of a large number of nodes (or neurons) connected to each other. Each node represents a specific output function, called an activation function. Each connection between two nodes represents a weighted value for signals passing through the connection, which is called the connection weight. The output of the network is different depending on the connection mode, connection weight and incentive function of the network.
  • the structural information of the neural network includes information such as nodes and connection rights.
  • FIG. 1 illustrates an exemplary system architecture 100 to which a neural network model compression method or a neural network model compression apparatus of an embodiment of the present invention can be applied.
  • the system architecture 100 may include servers 102, 104 and a network 106.
  • the network 106 is a medium that provides a communication link between the server 102 and the server 104.
  • the network 106 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
  • the server 102 may be a server that provides various services, such as a data storage server that stores a trained neural network model.
  • the server 104 may be a server providing various services, such as a server for compressing a neural network model.
  • the server 104 may obtain the trained neural network model from the server 102, analyze the neural network model, and perform processing such as analysis, and store the processing result (for example, the compressed neural network model).
  • the neural network model compression method in the embodiment of the present invention is generally executed by the server 104, and accordingly, the neural network model compression device is generally disposed in the server 104.
  • the system architecture may not include the server 102.
  • FIG. 1 the number of servers and networks in FIG. 1 is merely exemplary. According to actual needs, there can be any number of servers and networks.
  • FIG. 2 shows a flowchart of a neural network model compression method according to an embodiment of the present invention.
  • the method 200 shown in FIG. 2 may be performed by a computer or an electronic device with computing capabilities (such as the server 104 shown in FIG. 1).
  • a computer or an electronic device with computing capabilities such as the server 104 shown in FIG. 1.
  • any system that performs the method 200 is within the scope and spirit of embodiments of the present invention.
  • a trained first neural network model is obtained.
  • an electronic device for example, the server 104 shown in FIG. 1
  • a server for example, The server 102
  • the electronic device may also obtain the first neural network model locally.
  • the first neural network model has been previously trained on a training sample, and its accuracy has met a preset accuracy requirement.
  • the first neural network model in this embodiment may be any general neural network model, for example, it may be a back propagation neural network (BPNN: Back Propagation Neural Network) model, a convolutional neural network (CNN: Convolutional Neural Network) model, based on Convolutional neural network (RCNN: Region, Basic, Neural, Network) model of regional information, recurrent neural network (RNN: Recurrent Neural Network) model, long short-term memory model (LSTM: Long Short-Term Memory), or gated recurring unit (GRU) : Gated Recurrent Unit), in addition, it can also be other types of neural network models or cascade neural network models combined by multiple neural networks.
  • BPNN Back Propagation Neural Network
  • CNN Convolutional Neural Network
  • RCNN Region, Basic, Neural, Network
  • RNN Recurrent Neural Network
  • LSTM Long Short-Term Memory
  • GRU Gated Re
  • step S204 at least one layer is selected from the layers of the first neural network model as a layer to be compressed.
  • the electronic device may select at least one layer from each layer of the obtained first neural network model as a layer to be compressed.
  • the above electronic device may select each layer of the first neural network model as a layer to be compressed.
  • the above electronic device may select at least one convolution layer and at least one A fully connected layer acts as the layer to be compressed.
  • step S206 the compressed layers are sorted according to a preset rule.
  • the electronic device may sort the compressed layers according to a preset rule.
  • the foregoing electronic device may sort the compressed layers in order from the order of the number of levels of the layers to be compressed in the first neural network model.
  • the first neural network model may include, for example, at least one input layer, at least one hidden layer, and at least one output layer. Each layer of the first neural network model may have a corresponding number of layers. As an example, it is assumed that the first neural network model includes an input layer, a hidden layer, and an output layer.
  • the input layer may be at the first layer of the first neural network model, and the number of levels of the input layer may be 1.
  • the hidden layer It can be at the second layer of the first neural network model, the number of levels of the hidden layer can be 2; the output layer can be at the third layer of the first neural network model, and the number of levels of the output layer can be 3;
  • the order of the numbers from large to small is: output layer, hidden layer, and input layer.
  • the electronic device may also sort the layers to be compressed according to the contribution of the layer to be compressed to the loss of the first neural network model.
  • the loss of the first neural network model can be transmitted to each layer of the first neural network model through a back propagation method (Back Propagation, BP), and then the contribution degree of each layer to the network loss is calculated, and then according to the contribution degree, Sort from small to large to the compression layer.
  • BP back propagation method
  • step S208 a part or all of the compression layer to be compressed is performed by using a genetic algorithm to obtain a second neural network model, wherein the accuracy of the second neural network model based on the preset training samples is not lower than the Set the accuracy.
  • the genetic algorithm is used to perform compression processing on the compression layer.
  • the principle of its implementation is based on the principle of "the survival of the fittest" of the genetic algorithm, and taking into account the accuracy of the neural network model, using the "compression layer to be compressed” as a criterion.
  • Various genetic operations are performed on the layer to be compressed, and finally a structure to be compressed is obtained.
  • a chromosome individual that meets the requirements can be selected to perform genetic operations to generate a chromosome individual with the best network simplification (that is, the most simplified structure).
  • a compression-treated layer to be compressed is obtained.
  • the compression-based fitness value refers to a fitness value that can reflect network simplification (or network complexity). For example, the larger the fitness value, the higher the network simplification, that is, effective compression is achieved. ; The smaller the fitness value, the lower the network simplification, that is, no effective compression is achieved.
  • the genetic algorithm is used to compress the neural network model, chromosomal individuals with a large fitness value can be selected to perform genetic operations.
  • the chromosome with the highest fitness value among the chromosome individuals generated in the Nth generation population is the optimal chromosome individual.
  • a chromosome with a small fitness value can be selected to perform a genetic operation, and the chromosome with the smallest fitness value among the chromosome individuals generated in the Nth generation population is optimal Individual chromosomes.
  • a preset accuracy in order to balance the accuracy and compression of the first neural network model, can be set to constrain the compression of the first neural network model. It should be noted that the preset accuracy It can be the original accuracy of the first neural network model, or a value slightly lower than the original accuracy. The preset accuracy may be set manually, or may be set by the foregoing electronic device based on a preset algorithm, and the preset accuracy may be adjusted according to actual needs, which is not limited in this embodiment.
  • the compression process includes deleting at least one node of the layer to be compressed and its corresponding connection, and / or deleting at least one connection of the layer to be compressed, so as to reduce the network complexity of the layer to be compressed Degree, that is, to improve the network simplification of the layer to be compressed.
  • the preset neural network model is used to train the current neural network model; if the current neural network model is used, The accuracy of is not lower than the preset accuracy.
  • the compression processing of the next to-be-compressed layer is continued according to the sequence.
  • the current neural network model is determined as the second neural network model after compression processing; if the accuracy of the current neural network model is lower than a preset accuracy, the neural network model after performing compression processing on the previous layer to be compressed is determined as The second neural network model after compression processing.
  • the number of layers to be compressed is N.
  • the obtained sequence is as follows: layer 1 to be compressed, layer 2 to be compressed, layer 3 to be compressed, ..., layer N to be compressed.
  • First use genetic algorithm to perform compression processing on the layer 1 to be compressed, and then replace the uncompressed layer 1 to be compressed in the first neural network model with the compressed layer 1 to be compressed.
  • the neural network model is trained to obtain the accuracy of the current neural network model, determine whether the accuracy is lower than the preset accuracy, and if it is not lower than the preset accuracy, continue to perform compression processing on the compression layer 2 and repeat the same steps, and so on , Until after the compression processing is performed on the layer N to be compressed, the accuracy of the current neural network model is still not lower than the preset accuracy, then the current neural network model (all layers to be compressed are replaced by the layer to be compressed after compression processing) is determined The second neural network model after compression processing.
  • the current neural network model (at this time, the layers to be compressed 1, 2 and 3 of the first neural network model have been replaced with the compression to be compressed) Layer)
  • the accuracy is lower than the preset accuracy
  • the neural network model after the compression processing is performed on the previous layer to be compressed that is, the layers 1 and 2 to be compressed of the first neural network model are replaced by the compression to be compressed layers
  • the current neural network model can be fine-tuning.
  • the neural network model that is slightly lower than the preset accuracy can be fine-tuned to meet the preset accuracy requirement, so that the neural network model can be further compressed.
  • the electronic device may store a second neural network model obtained through compression processing, for example, a server that is stored locally (for example, a hard disk or a memory) of the electronic device or is remotely connected to the electronic device.
  • a server that is stored locally (for example, a hard disk or a memory) of the electronic device or is remotely connected to the electronic device.
  • the solution provided by the embodiment of the present invention utilizes a genetic algorithm to compress a trained neural network model, reduces the calculation amount and storage space of the neural network model, and enables it to be applied to storage and computing resources Both are restricted devices. Further, the solution of the embodiment of the present invention can simultaneously take into account the accuracy and compression of the neural network model.
  • FIG. 3 shows a flowchart of a method for performing a compression process on a compression layer using a genetic algorithm according to an embodiment of the present invention.
  • the method 300 shown in FIG. 3 may be implemented by a computer or an electronic device having computing capabilities (for example, as shown in FIG. 1). Server 104).
  • any system that performs the method 300 is within the scope and spirit of embodiments of the present invention.
  • step S302 network structure information of a layer to be compressed is acquired.
  • step S304 according to the network structure information of the layer to be compressed, the layer to be compressed is encoded to obtain a chromosome.
  • the structure of the neural network needs to be expressed as a genetic algorithm's individual chromosome code in order to be able to perform calculations with the genetic algorithm.
  • N neurons in the layer to be compressed, and the nodes are numbered from 1 to N.
  • An N ⁇ N matrix may be used to represent the network structure of the layer to be compressed.
  • the neural network structure with 7 nodes shown in FIG. 3a is taken as an example to illustrate the method for encoding the neural network model in this embodiment.
  • Table 1 is the node connection relationship of the neural network structure.
  • the element corresponding to (i, j) in the matrix represents the connection relationship from the i-th node to the j-th node. Because the embodiment of the present invention does not involve changing the connection right of the neural network model to be compressed when compressing the neural network model, this embodiment expresses the connection relationship of the nodes as a form of 0, 1, -1, where "0 "1" indicates no connection; “1” indicates that the connection weight is 1, which has an excitation effect, which is indicated by a solid line in Figure 3a; “-1” indicates that the connection weight is -1, which has an inhibitory effect, as shown in the figure. 3a is indicated by a dotted line. It can be seen that Table 1 is equivalent to the structure shown in Figure 3a.
  • the coding of the neural network can be expressed as a digital string form composed of 0, 1, and -1, from element (3,1) to element (7,6) from left to right, Connected from top to bottom, they form the following chromosome code:
  • step S306 based on the chromosomes obtained above, population initialization is performed to generate an initial population.
  • a replication operation may be performed on the chromosomes obtained above, a predetermined number of chromosome individuals are randomly generated, and the set of these chromosome individuals is used as an initial population.
  • the size of the initial population is determined by the population size M, which may be, for example, but not limited to, 10-100. Because of the replication operation, all chromosomal individuals in the initial population are the same.
  • step S308 the fitness value of individual chromosomes in the population is calculated.
  • the fitness function may use the following formula:
  • f (i, t) represents the fitness of the i-th individual of the t-th generation
  • E (i, t) represents the network error of the neural network model corresponding to the i-th individual of the t-th generation
  • H (i, t) represents Network simplification of the i-th individual in the t-th generation.
  • E (i, t) can be calculated using the following formula:
  • the neural network model corresponding to the i-th individual of the t-th generation is based on the expected output value and the actual output value of the preset q-th training sample. The smaller the network error value, the higher the accuracy.
  • H (i, t) can be calculated using the following formula:
  • m (i, t) is the number of nodes of the i-th individual in the t-th generation. The fewer the number of nodes, the larger the network simplification value, the higher the network simplification, and the simpler the neural network model.
  • the network error E (i, t) is used to constrain the compression process of the neural network model to be compressed, and both accuracy and compression can be taken into account at the same time.
  • the fitness function may also use the following formula:
  • f (i, t) represents the fitness of the i-th individual of the t-th generation
  • E (i, t) represents the network error of the neural network model corresponding to the i-th individual of the t-th generation
  • H (i, t) represents Network simplification of the i-th individual in the t-th generation.
  • the fitness function includes formula 1 and formula 2.
  • formula 1 is a fitness function based on network errors, which reflects the accuracy of the neural network model
  • formula 2 is a fitness function based on network simplification, which reflects the compression of neural network models. Therefore, in this embodiment, the accuracy-based fitness value and the compression-based fitness value of the individual chromosome are calculated separately.
  • the termination condition may include a preset threshold for the number of iterations or a set convergence condition.
  • the number of iterations can be set to, for example, but not limited to, 500, but it is determined that the termination condition is reached when the number of iterations reaches 500.
  • the convergence condition may be set, for example, but not limited to, when the fitness value meets a certain condition, it is determined that the termination condition is reached. For example, the fitness value may be set to be greater than a preset threshold.
  • step S312 if it is determined in step S310 that the termination condition is not met, then using the fitness value as a standard, select a chromosome individual whose fitness value meets the requirements and perform genetic operations such as replication, crossover, or mutation to generate a new generation of population. Then, it returns to step S308.
  • this embodiment selects a chromosomal individual with a relatively large fitness value to perform a genetic operation, and eliminates some chromosomal individuals with a small fitness value.
  • the selection criteria of this embodiment may adopt the following steps: (1) Calculate the accuracy-based fitness of each individual chromosome individual in the population by formula 1 Value, and then calculate the first selection probability of the individual being selected, and select the first chromosome individual according to the first selection probability; (2) Calculate the fitness value of each chromosome individual in the population based on the compression, and then calculate the individual
  • the selected second selection probability is to select a second chromosome individual from the first chromosome individuals selected in step (1) according to the second selection probability.
  • the chromosome individuals with the highest and lowest fitness values in the current population can be found, the best chromosome individuals are retained and directly entered into the next generation, and the worst chromosome individuals are eliminated, which can ensure Pass on good genes to the next generation.
  • the selection strategy of this embodiment can restrict the compression process of the compression layer through accuracy constraints, and can ensure that chromosome individuals with small network errors and large network simplifications enter the next generation.
  • the fitness-proportion selection method can be used as a commonly used selection method, which means that the higher the fitness, the greater the probability of being selected, that is, :
  • p (i, t) is the selection probability of the i-th individual in the t-th generation
  • f (i, t) is the fitness of the i-th individual in the t-th generation
  • f (sum, t) is the total fitness of the t-th population.
  • the replication operation refers to directly copying the selected parental chromosome individuals from the current generation to the new generation of individuals without any change.
  • Cross operation refers to randomly selecting two parental chromosome individuals from the population according to the above-mentioned selection method, replacing some components of the two parental chromosome individuals with each other to form a new offspring chromosome individual.
  • the mutation operation refers to randomly selecting a parent chromosome individual from the population according to the selection method described above, and then randomly selecting a node as a mutation point in the expression of the individual, and changing the value of the mutation point gene to another Valid values, forming new offspring chromosome individuals.
  • Whether a cross operation occurs can be determined according to the cross probability P c .
  • the method is to randomly generate a random number P between 0 and 1.
  • P ⁇ P c the cross operation occurs, and when P> P c , the cross does not occur.
  • whether the mutation operation occurs can also be determined according to the mutation probability P m . Since it is the prior art, the description thereof is omitted here.
  • a crossover point when performing a crossover operation, may be randomly selected in each parent chromosome individual with a certain probability, and the lower part of the crossover point is referred to as a crossover segment.
  • the first parental chromosome individual deletes its crossover segment
  • the second parental chromosome individual's crossover segment is inserted at his intersection, so that the first offspring chromosome individual is generated.
  • the second parent chromosome individual deletes its cross section
  • the first parent chromosome individual's cross section is inserted at his intersection to form a second offspring chromosome individual.
  • the two parents of the selected chromosomes are the same, but due to their different intersections, the resulting offspring chromosomes are also different, which effectively avoids inbreeding and improves the global search ability.
  • one of the following operations may be adopted randomly: (a) delete at least one node in the hidden layer of the neural network model and its corresponding connection; (b) delete the hidden layer of the neural network model At least one of the connections; (c) Randomly repair the deleted node or connection with a certain probability; (d) Add hidden layer nodes to randomly generate corresponding connection weights.
  • deleting nodes always precedes adding nodes, and the number of added nodes should not be greater than the number of deleted nodes. At the same time, only when the deleted nodes cannot produce a good child, the nodes are added.
  • Such a mutation operation can guarantee the method Always go in the direction of compressing the neural network model.
  • step S314 if the determination result in step S210 is that the termination condition is reached, the chromosome individual with the best fitness value is output, so as to obtain the compressed layer to be compressed.
  • the optimal chromosome individual may be set to max (f, i), and the chromosome individual having the greatest fitness when the termination condition is reached is regarded as the optimal chromosome individual.
  • FIG. 4 shows a schematic diagram of a neural network model compression device according to an embodiment of the present invention.
  • the device 400 shown in FIG. 4 corresponds to the above-mentioned neural network model compression method. Since the embodiment of the device 400 is basically similar to the method embodiment, it is described relatively simply. For the relevant part, refer to the description of the method embodiment.
  • the device 400 may be implemented in software, hardware, or a combination of software and hardware, and may be installed in a computer or other suitable electronic device with computing capabilities.
  • the device 400 may include an acquisition module 402, a selection module 404, a sorting module 406, and a compression module 408.
  • the obtaining module 402 is configured to obtain a trained first neural network model.
  • the selection module 404 is configured to select at least one layer from each layer of the first neural network model as a layer to be compressed.
  • the sorting module 406 is configured to sort the compressed layers according to a preset rule.
  • the compression module 408 is configured to perform a compression process on a part or all of the compression layer using a genetic algorithm to obtain a second neural network model according to the sequencing order, wherein the accuracy of the second neural network model based on the preset training samples is not lower than Preset precision.
  • the ranking module 406 is specifically configured to sort the layers to be compressed according to the number of levels of the layers to be compressed in the first neural network model.
  • the ranking module 406 is specifically configured to sort the layers to be compressed according to the contribution of the layer to be compressed to the loss of the first neural network model.
  • the compression module 408 includes a training unit and a determination unit.
  • the training unit is configured to train a current neural network model using a preset training sample after performing compression processing on one of the layers to be compressed each time by using a genetic algorithm.
  • the determining unit is configured to: if the accuracy of the current neural network model is not lower than a preset accuracy, when there is a layer to be compressed that has not been compressed yet, continue to perform compression processing on the next layer to be compressed, and When the compression layer performs compression processing, the current neural network model is determined as the second neural network model obtained after the compression processing; if the accuracy of the current neural network model is lower than the preset accuracy, the previous layer to be compressed is compressed.
  • the processed neural network model is determined as the second neural network model obtained after compression processing.
  • the compression module 408 further includes an acquisition unit, a coding unit, an initialization unit, a calculation unit, a judgment unit, a genetic operation unit, and an output unit.
  • the obtaining unit is configured to obtain network structure information of a layer to be compressed.
  • the encoding unit is configured to encode the layer to be compressed according to the network structure information of the layer to be compressed to obtain a chromosome.
  • the initialization unit is configured to perform population initialization according to a chromosome obtained to generate an initial population.
  • the calculation unit is used to calculate the fitness value of the individual chromosomes in the population.
  • the judging unit is used to judge whether the termination condition is reached.
  • the genetic operation unit is used to select a chromosome individual whose fitness value meets the requirements based on the fitness value if the termination condition is not reached, and perform replication, crossover or mutation operations to generate a new generation of population.
  • the output unit is used to output the chromosome individual with the best fitness value if the termination condition is reached, so as to obtain the compressed layer to be compressed.
  • the calculation unit is further configured to calculate the precision-based and compression-based fitness values of the individual chromosomes in the population, respectively.
  • the genetic operation unit is further configured to obtain the first selection probability of the chromosome individuals in the population according to the fitness value based on the accuracy, select the first chromosome individual according to the first selection probability, and obtain the compression value based on the fitness value.
  • the second selection probability of the chromosomal individuals in the population, and the second chromosomal individual is selected from the first chromosomal individuals according to the second selection probability; the second chromosomal individuals are copied, crossed or mutated to generate a new generation of population.
  • Figure 5 shows a schematic diagram of a computer device according to an embodiment of the invention.
  • the computer device 500 may include a processor 502 and a memory 504, where the memory 502 stores executable instructions, where the executable instructions, when executed, cause the processor 502 to execute the instructions shown in FIG.
  • FIG. 6 shows a block diagram of an exemplary computer device suitable for use in implementing embodiments of the present invention.
  • the computer device 600 shown in FIG. 6 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present invention.
  • the computer device 600 is implemented in the form of a general-purpose computing device.
  • the components of the computer device 600 may include, but are not limited to, a processor 602, a system memory 604, and a bus 606 connecting different system components (including the processor 602 and the system memory 604).
  • the bus 606 represents one or more of several types of bus structures, including a memory bus or a memory controller, a peripheral bus, a graphics acceleration port, a processor, or a local area bus using any of a variety of bus structures.
  • these architectures include, but are not limited to, the Industry Standard Architecture (ISA) bus, the Micro Channel Architecture (MAC) bus, the enhanced ISA bus, the Video Electronics Standards Association (VESA) local area bus, and peripheral component interconnects ( PCI) bus.
  • Computer device 600 typically includes a variety of computer system-readable media. These media can be any available media that can be accessed by the computer device 600, including volatile and non-volatile media, removable and non-removable media.
  • System memory 604 may include computer system-readable media in the form of volatile memory, such as random access memory (RAM) 608 and / or cache memory 610.
  • Computer device 600 may further include other removable / non-removable, volatile / nonvolatile computer system storage media.
  • the storage system 612 may be used to read and write non-removable, non-volatile magnetic media (not shown in FIG. 6 and is commonly referred to as a "hard drive").
  • a disk drive for reading and writing to a removable non-volatile disk such as a "floppy disk”
  • a removable non-volatile optical disk such as a CD-ROM, DVD-ROM, etc.
  • each drive may be connected to the bus 606 through one or more data medium interfaces.
  • the system memory 604 may include at least one program product having a set (for example, at least one) of program modules configured to perform the functions of the embodiment of FIG. 1 or FIG. 2 of the present invention.
  • a program / utility tool 614 having a set (at least one) of program modules 616 may be stored in, for example, system memory 604.
  • Such program modules 616 include, but are not limited to, an operating system, one or more application programs, other program modules, and programs Data, each or some combination of these examples may include an implementation of the network environment.
  • the program module 616 generally performs the functions and / or methods in the embodiment of FIG. 1 or FIG. 2 described in the present invention.
  • the computer device 600 may also communicate with one or more external devices 700 (such as a keyboard, pointing device, display 800, etc.), and may also communicate with one or more devices that enable a user to interact with the computer device 600, and / or with Any device (eg, network card, modem, etc.) that enables the computer device 600 to communicate with one or more other computing devices. This communication can take place through an input / output (I / O) interface 618.
  • the computer device 600 may also communicate with one or more networks (such as a local area network (LAN), a wide area network (WAN), and / or a public network, such as the Internet) through the network adapter 520. As shown, the network adapter 620 communicates with other modules of the computer device 600 through the bus 606.
  • LAN local area network
  • WAN wide area network
  • public network such as the Internet
  • the processor 602 executes various functional applications and data processing by running a program stored in the system memory 604, for example, implementing the neural network model compression method shown in the foregoing embodiment.
  • An embodiment of the present invention further provides a computer-readable medium having executable instructions stored thereon, where the executable instructions, when executed, cause a computer to execute the method 200 shown in FIG. 2 or the method shown in FIG. 3 300.
  • the computer-readable medium of this embodiment may include the RAM 608 and / or the cache memory 610 and / or the storage system 612 in the system memory 604 in the embodiment shown in FIG. 6.
  • the computer-readable medium in this embodiment may include not only a tangible medium but also an intangible medium.
  • the computer-readable medium of this embodiment may adopt any combination of one or more computer-readable media.
  • the computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium.
  • the computer-readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (non-exhaustive list) of computer-readable storage media include: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in combination with an instruction execution system, apparatus, or device.
  • the computer-readable signal medium may include a data signal in baseband or propagated as part of a carrier wave, which carries a computer-readable program code. Such a propagated data signal may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, and the computer-readable medium may send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for performing the operations of the present invention may be written in one or more programming languages, or combinations thereof, including programming languages such as Java, Smalltalk, C ++, and also conventional Procedural programming language, such as "C" or similar programming language.
  • the program code can be executed entirely on the user's computer, partly on the user's computer, as an independent software package, partly on the user's computer, partly on a remote computer, or entirely on a remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or wide area network (WAN), or it can be connected to an external computer (such as through the Internet using an Internet service provider) connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider Internet service provider
  • the embodiments of the present invention may be provided as a method, an apparatus, or a computer program product. Therefore, the embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the embodiments of the present invention may take the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, etc.) containing computer-usable program code.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, etc.
  • Embodiments of the present invention are described with reference to flowcharts and / or block diagrams of methods, apparatuses, and computer program products according to embodiments of the present invention. It should be understood that each process and / or block in the flowcharts and / or block diagrams, and combinations of processes and / or blocks in the flowcharts and / or block diagrams can be implemented by computer program instructions.
  • These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions executed by the processor of the computer or other programmable data processing terminal device Means are generated for implementing the functions specified in one or more of the flowcharts and / or one or more of the block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Physiology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention concerne un procédé de compression de modèle de réseau neuronal, un dispositif, un appareil d'ordinateur et un support lisible par ordinateur. Le procédé consiste : à acquérir un premier modèle de réseau neuronal entraîné (S202) ; à sélectionner une ou plusieurs couches parmi des couches du premier modèle de réseau neuronal en tant que couches à compresser (S204) ; à trier les couches à compresser selon une règle prédéterminée (S206) ; et à compresser, selon un ordre séquentiel à partir du tri et au moyen d'un algorithme génétique, une partie ou la totalité des couches à compresser, et à obtenir un second modèle de réseau neuronal (S208), la précision du second modèle de réseau neuronal sur la base d'un échantillon d'apprentissage pré-configuré n'étant pas inférieure à une valeur de précision prédéterminée. Le procédé, le dispositif, l'appareil d'ordinateur et le support lisible par ordinateur compressent un modèle de réseau neuronal entraîné au moyen d'un algorithme génétique, ce qui permet de réduire une charge de calcul et un espace de stockage du modèle de réseau neuronal, et de fournir une applicabilité de ce dernier à des appareils ayant des ressources de mémoire et de calcul limitées sans compromettre la précision ou la compression du modèle de réseau neuronal.
PCT/CN2019/103511 2018-09-05 2019-08-30 Procédé de compression de modèle de réseau neuronal, dispositif et appareil d'ordinateur WO2020048389A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811037330.2 2018-09-05
CN201811037330.2A CN109165720A (zh) 2018-09-05 2018-09-05 神经网络模型压缩方法、装置和计算机设备

Publications (1)

Publication Number Publication Date
WO2020048389A1 true WO2020048389A1 (fr) 2020-03-12

Family

ID=64894255

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103511 WO2020048389A1 (fr) 2018-09-05 2019-08-30 Procédé de compression de modèle de réseau neuronal, dispositif et appareil d'ordinateur

Country Status (2)

Country Link
CN (1) CN109165720A (fr)
WO (1) WO2020048389A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024098373A1 (fr) * 2022-11-11 2024-05-16 Nvidia Corporation Techniques de compression de réseaux neuronaux

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165720A (zh) * 2018-09-05 2019-01-08 深圳灵图慧视科技有限公司 神经网络模型压缩方法、装置和计算机设备
WO2020155091A1 (fr) * 2019-02-01 2020-08-06 华为技术有限公司 Procédé, appareil, dispositif et support de quantification de réseau neuronal profond
CN110175671B (zh) * 2019-04-28 2022-12-27 华为技术有限公司 神经网络的构建方法、图像处理方法及装置
CN110135498A (zh) * 2019-05-17 2019-08-16 电子科技大学 一种基于深度进化神经网络的图像识别方法
CN110276448B (zh) * 2019-06-04 2023-10-24 深圳前海微众银行股份有限公司 一种模型压缩方法及装置
CN110309911B (zh) * 2019-07-05 2021-01-05 安徽寒武纪信息科技有限公司 神经网络模型验证方法、装置、计算机设备和存储介质
CN112445823A (zh) * 2019-09-04 2021-03-05 华为技术有限公司 神经网络结构的搜索方法、图像处理方法和装置
CN112784952B (zh) * 2019-11-04 2024-03-19 珠海格力电器股份有限公司 一种卷积神经网络运算系统、方法及设备
CN111028226A (zh) * 2019-12-16 2020-04-17 北京百度网讯科技有限公司 算法移植的方法及装置
CN111338816B (zh) * 2020-02-18 2023-05-12 深圳鲲云信息科技有限公司 基于神经网络的指令交互方法、系统、设备及存储介质
CN111275190B (zh) * 2020-02-25 2023-10-10 北京百度网讯科技有限公司 神经网络模型的压缩方法及装置、图像处理方法及处理器
CN112529278B (zh) * 2020-12-02 2021-08-31 中国人民解放军93209部队 基于联结矩阵寻优的航路网规划方法及装置
CN114239792B (zh) * 2021-11-01 2023-10-24 荣耀终端有限公司 利用量化模型进行图像处理的系统、装置及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599138A (zh) * 2009-07-07 2009-12-09 武汉大学 基于人工神经网络的土地评价方法
CN103971162A (zh) * 2014-04-04 2014-08-06 华南理工大学 一种基于遗传算法改进bp神经网络的方法
CN106503802A (zh) * 2016-10-20 2017-03-15 上海电机学院 一种利用遗传算法优化bp神经网络系统的方法
CN108038546A (zh) * 2017-12-29 2018-05-15 百度在线网络技术(北京)有限公司 用于压缩神经网络的方法和装置
CN109165720A (zh) * 2018-09-05 2019-01-08 深圳灵图慧视科技有限公司 神经网络模型压缩方法、装置和计算机设备

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7313550B2 (en) * 2002-03-27 2007-12-25 Council Of Scientific & Industrial Research Performance of artificial neural network models in the presence of instrumental noise and measurement errors
CN108229646A (zh) * 2017-08-08 2018-06-29 北京市商汤科技开发有限公司 神经网络模型压缩方法、装置、存储介质和电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599138A (zh) * 2009-07-07 2009-12-09 武汉大学 基于人工神经网络的土地评价方法
CN103971162A (zh) * 2014-04-04 2014-08-06 华南理工大学 一种基于遗传算法改进bp神经网络的方法
CN106503802A (zh) * 2016-10-20 2017-03-15 上海电机学院 一种利用遗传算法优化bp神经网络系统的方法
CN108038546A (zh) * 2017-12-29 2018-05-15 百度在线网络技术(北京)有限公司 用于压缩神经网络的方法和装置
CN109165720A (zh) * 2018-09-05 2019-01-08 深圳灵图慧视科技有限公司 神经网络模型压缩方法、装置和计算机设备

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024098373A1 (fr) * 2022-11-11 2024-05-16 Nvidia Corporation Techniques de compression de réseaux neuronaux

Also Published As

Publication number Publication date
CN109165720A (zh) 2019-01-08

Similar Documents

Publication Publication Date Title
WO2020048389A1 (fr) Procédé de compression de modèle de réseau neuronal, dispositif et appareil d'ordinateur
CN110674880B (zh) 用于知识蒸馏的网络训练方法、装置、介质与电子设备
US11487954B2 (en) Multi-turn dialogue response generation via mutual information maximization
CN108536679B (zh) 命名实体识别方法、装置、设备及计算机可读存储介质
CN110366734B (zh) 优化神经网络架构
CN110929515B (zh) 基于协同注意力和自适应调整的阅读理解方法及系统
CN111128137A (zh) 一种声学模型的训练方法、装置、计算机设备和存储介质
CN109919221B (zh) 基于双向双注意力机制图像描述方法
WO2020238783A1 (fr) Procédé et dispositif de traitement d'informations et support de stockage
CN112508085A (zh) 基于感知神经网络的社交网络链路预测方法
US20200134471A1 (en) Method for Generating Neural Network and Electronic Device
CN111538827A (zh) 基于内容和图神经网络的判例推荐方法、装置及存储介质
CN116049459B (zh) 跨模态互检索的方法、装置、服务器及存储介质
CN112766496B (zh) 基于强化学习的深度学习模型安全性保障压缩方法与装置
CN115455171B (zh) 文本视频的互检索以及模型训练方法、装置、设备及介质
CN110941964A (zh) 双语语料筛选方法、装置及存储介质
CN111814489A (zh) 口语语义理解方法及系统
CN117475038B (zh) 一种图像生成方法、装置、设备及计算机可读存储介质
CN116681078A (zh) 一种基于强化学习的关键词生成方法
CN113779244B (zh) 文档情感分类方法、装置、存储介质以及电子设备
CN112884019B (zh) 一种基于融合门循环网络模型的图像转语言方法
CN112818658A (zh) 文本对分类模型的训练方法、分类方法、设备及存储介质
CN117521674B (zh) 对抗信息的生成方法、装置、计算机设备和存储介质
CN115269844B (zh) 模型的处理方法、装置、电子设备和存储介质
JP2021136025A (ja) 学習装置、学習方法、学習プログラム、評価装置、評価方法、および評価プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19856593

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19856593

Country of ref document: EP

Kind code of ref document: A1