WO2020133364A1 - Neural network compression method and apparatus - Google Patents

Neural network compression method and apparatus Download PDF

Info

Publication number
WO2020133364A1
WO2020133364A1 PCT/CN2018/125372 CN2018125372W WO2020133364A1 WO 2020133364 A1 WO2020133364 A1 WO 2020133364A1 CN 2018125372 W CN2018125372 W CN 2018125372W WO 2020133364 A1 WO2020133364 A1 WO 2020133364A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
neural network
training
weight
network model
Prior art date
Application number
PCT/CN2018/125372
Other languages
French (fr)
Chinese (zh)
Inventor
朱佳峰
魏巍
卢惠莉
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2018/125372 priority Critical patent/WO2020133364A1/en
Priority to CN201880099986.9A priority patent/CN113168565A/en
Publication of WO2020133364A1 publication Critical patent/WO2020133364A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of neural networks, and in particular to a neural network compression method and device.
  • deep learning technology has been increasingly applied to smart devices around people, such as mobile phones, smart home devices, wearable devices, and vehicle-mounted devices.
  • deep learning models that is, neural network models
  • CPU central processing unit
  • NPU neural network processor
  • the compression technology for deep learning models came into being.
  • This type of technology can eliminate a large amount of redundancy of the model's weight parameters within a tolerable drop in accuracy, and greatly reduce the size of the model.
  • Sparse weights are currently commonly used compression techniques. Specifically, the weight sparseness achieves the weight sparseness by discarding a part of the small weight connections (weight is set to 0), thereby eliminating redundancy and speeding up operations.
  • the weight sparseness can not only reduce the model volume by cutting out the redundant weights of the model, but more importantly, by excluding a part of the smaller weights, the redundant weak performance branches of the model can be cut out, and the residuals with superior performance can be strengthened through subsequent training. Branches to improve the final accuracy of the model.
  • the most commonly used method is to set the weights less than the threshold to zero based on the weight threshold to perform weight sparseness.
  • the configuration of the weight threshold is relatively blind, usually relying on human experience to set, and iteratively needs to repeatedly train multiple settings to select a setting with better model accuracy.
  • the above method takes a long time to achieve model sparseness, and the weight threshold is easily affected by human factors, resulting in unstable model performance. In other words, the above method is not flexible when implementing model compression.
  • Embodiments of the present application provide a neural network compression method and device to solve the problem of inflexible model compression in the prior art.
  • the present application provides a neural network compression method, according to the initial weight threshold of the i-th layer of the initial neural network model, the initial weight of the i-th layer is clipped to obtain a clipped neural network model ,
  • the i takes any positive integer from 1 to m, where m is the total number of layers of the neural network model; multiple trainings are performed on the cropped neural network model, during the t-th training process ,
  • the weight threshold of the i-th layer during the t-th training is determined according to the weight threshold of the i-th layer of the neural network model obtained from the t-1th training, and the weight threshold of the i-th layer during the t-th training Crop the current weight value of the i-th layer during the t-th training;
  • the t takes any positive number from 1 to q, and the q is the total number of trainings.
  • the weight threshold of each crop can be adjusted adaptively, which means that the neural network can be flexibly compressed, which can avoid the influence of human factors caused by artificially setting the weight threshold, thereby enhancing the stability of the performance of the neural network model. Sex.
  • the initial weight threshold of the i-th layer before clipping the initial weight value of the i-th layer according to the initial weight threshold of the i-th layer of the initial neural network model, obtain the i-th layer of the initial neural network model Initial weights; then determine the mean and standard deviation of the initial weights of the i-th layer according to the initial weights of the i-th layer; and finally determine the average weights and standard deviations of the initial weights of the i-th layer The initial weight threshold of the i-th layer.
  • the initial weight threshold of each layer can be accurately determined, so that the weight of each layer of the initial neural network can be tailored.
  • the initial weight threshold of the i-th layer may conform to the following formula:
  • T i is the initial weight threshold of the i-th layer of the initial neural network model
  • ⁇ i is the mean of the initial weight of the i-th layer
  • ⁇ i is the standard deviation of the initial weight of the i-th layer
  • is the set Fixed value, ⁇ 0.
  • the initial weight threshold of each layer can be accurately determined, so that the weight of each layer of the initial neural network can be tailored.
  • the weight threshold of the i-th layer during the t-th training can be determined according to the weight threshold of the i-th layer of the neural network model obtained from the t-1th training, which can meet the following formula:
  • the weight threshold of each layer can be adaptively obtained during each training process, so that neural network compression can be flexibly performed.
  • the weight of any layer is tailored according to the weight threshold of any layer.
  • the specific method may be: the weight of any layer is less than the weight of the any layer
  • the weight of the weight threshold is set to zero, and the weight of the weight of any layer that is greater than or equal to the weight threshold of the any layer is kept unchanged. This can successfully complete the weight cutting of each layer.
  • the present application also provides a neural network compression device, which has the function of implementing the method of the first aspect described above.
  • the function can be realized by hardware, or can also be realized by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the structure of the neural network compression device may include a weight cutting unit and a training unit, and these units may perform the corresponding functions in the above method examples. For details, see the detailed description in the method examples. To repeat.
  • the structure of the neural network compression device may include a processor and a memory, and the processor is configured to perform the above-mentioned method.
  • the memory is coupled to the processor, and stores necessary program instructions and data of the neural network compression device.
  • the present application also provides a computer storage medium that stores computer-executable instructions, which when used by the computer are used to cause the computer to execute the first Any one of the methods mentioned on the one hand.
  • the present application also provides a computer program product containing instructions, which when executed on a computer, causes the computer to perform any of the methods mentioned in the first aspect above.
  • the present application further provides a chip coupled to a memory, and used to read and execute program instructions stored in the memory to implement any of the methods mentioned in the first aspect above.
  • FIG. 1 is a schematic diagram of a neural network provided by an embodiment of this application.
  • FIG. 2 is a structural diagram of a computer device provided by an embodiment of the present application.
  • FIG. 3 is a flowchart of a neural network compression method provided by an embodiment of this application.
  • FIG. 4 is a flowchart of an example of a neural network compression method provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a neural network compression device provided by an embodiment of the present application.
  • FIG. 6 is a structural diagram of a neural network compression device provided by an embodiment of the present application.
  • Embodiments of the present application provide a neural network compression method and device to solve the problem of inflexible model compression in the prior art.
  • the method and the device described in this application are based on the same inventive concept. Since the principles of the method and the device to solve the problem are similar, the implementation of the device and the method can be referred to each other, and the repetition is not repeated.
  • Neural network is to imitate the behavior characteristics of animal neural network, similar to the structure of brain synapse connection for data processing.
  • a neural network consists of a large number of nodes (or neurons) connected to each other.
  • the neural network consists of an input layer, a hidden layer, and an output layer, such as shown in Figure 1.
  • the input layer is the input data of the neural network
  • the output layer is the output data of the neural network
  • the hidden layer is composed of many nodes connected between the input layer and the output layer, and is used to perform arithmetic processing on the input data.
  • the hidden layer may be composed of one or more layers.
  • the number of hidden layers in the neural network and the number of nodes are directly related to the complexity of the problem actually solved by the neural network, the number of nodes in the input layer and the number of nodes in the output layer.
  • the neural network compression method may be but not limited to a processor.
  • the processor may be a processor in a computer device or other
  • the processor in the device (for example, a chip system) may also be a separate processor.
  • a description will be made by taking a processor in a computer device executing a neural network compression method as an example.
  • FIG. 2 shows a structural diagram of a possible computer device applicable to the neural network compression method provided by the embodiment of the present application.
  • the computer device includes: a processor 210, a memory 220, a communication module 230, an input unit 240, a display unit 250, a power supply 260 and other components.
  • a processor 210 the computer device includes: a processor 210, a memory 220, a communication module 230, an input unit 240, a display unit 250, a power supply 260 and other components.
  • the structure of the computer device shown in FIG. 2 does not constitute a limitation on the computer device.
  • the computer device provided in the embodiments of the present application may include more or less components than the illustration, or a combination of Components, or different component arrangements.
  • the communication module 230 may be connected to other devices through a wireless connection or a physical connection to implement data transmission and reception by a computer device.
  • the communication module 230 may include any one or a combination of a radio frequency (RF) circuit, a wireless fidelity (WiFi) module, a communication interface, a Bluetooth module, etc. This embodiment of the present application does not make any limited.
  • the memory 220 can be used to store program instructions and data.
  • the processor 210 executes program instructions stored in the memory 220 to execute various functional applications and data processing of the computer device.
  • program instructions there are program instructions that enable the processor 210 to execute the neural network compression method provided by the following embodiments of the present application.
  • the memory 220 may mainly include a program storage area and a data storage area.
  • the storage program area can store the operating system, various application programs, and program instructions;
  • the storage data area can store various data such as neural networks.
  • the memory 220 may include a high-speed random access memory, and may also include a non-volatile memory, such as a magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the input unit 240 may be used to receive information such as data or operation instructions input by the user.
  • the input unit 240 may include input devices such as a touch panel, function keys, a physical keyboard, a mouse, a camera, and a monitor.
  • the display unit 250 can realize human-computer interaction, and is used to display information input by the user and information provided to the user through the user interface.
  • the display unit 250 may include a display panel 251.
  • the display panel 251 may be configured in the form of a liquid crystal display (liquid crystal) (LCD), an organic light-emitting diode (OLED), or the like.
  • the touch panel can cover the display panel 251, and when the touch panel detects a touch event on or near it, it is transmitted to the processor 210 To determine the type of touch event to perform the corresponding operation.
  • the processor 210 is a control center of a computer device, and uses various interfaces and lines to connect the above components.
  • the processor 210 may execute program instructions stored in the memory 220 and call data stored in the memory 220 to complete various functions of the computer device and implement neural network compression provided by the embodiments of the present application method.
  • the processor 210 may include one or more processing units.
  • the processor 210 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application program, and the like, and the modem process The device mainly handles wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 210.
  • the processing unit may compress the neural network.
  • the processor 210 may be a central processing unit (CPU), a graphics processor (Graphics Processing Unit, GPU), or a combination of CPU and GPU.
  • the processor 210 may also be a network processor (network processor) unit (NPU), a tensor processor (tensor processing unit, TPU), and other artificial intelligence (AI) chips that support neural network processing.
  • the processor 210 may further include a hardware chip.
  • the hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), a digital signal processing device (DSP), or a combination thereof.
  • the PLD may be a complex programmable logic device (complex programmable logic device (CPLD), field programmable gate array (FPGA), general array logic (GAL) or any combination thereof.
  • the computer device also includes a power source 260 (such as a battery) for powering various components.
  • a power source 260 such as a battery
  • the power supply 260 may be logically connected to the processor 210 through a power management system, so as to realize functions such as charging and discharging the computer device through the power management system.
  • the computer device may further include components such as a camera, a sensor, and an audio collector, which will not be repeated here.
  • a neural network compression method provided by an embodiment of the present invention is applicable to the computer device shown in FIG. 2 and the neural network shown in FIG. 1.
  • the method may be executed by a processor in the computer device. Referring to FIG. 3, the specific flow of the method may include:
  • Step 301 The processor cuts the initial weight of the i-th layer according to the initial weight threshold of the i-th layer of the initial neural network model to obtain a cropped neural network model, and the i is taken from 1 to m Any positive integer of, m is the total number of layers of the neural network model.
  • the neural network model has many layers, that is, m is a positive integer greater than 1.
  • m is a positive integer greater than 1.
  • it is usually layered training or processing.
  • the same operations are performed on each layer of the initial neural network model, that is, one layer at a time, and each layer is based on The initial weight threshold of the layer trims the initial weight of the layer.
  • the processor may obtain the first weight of the initial weight of the i-th layer before trimming the initial weight of the i-th layer according to the initial weight threshold of the i-th layer of the initial neural network model The initial weights of the i-th layer of the initial neural network model; then the mean and standard deviation of the initial weights of the i-th layer are determined according to the initial weights of the i-th layer, and according to the The mean and standard deviation of the initial weights determine the initial weight threshold of the i-th layer.
  • the initial weight threshold of the i-th layer may meet the following formula 1:
  • T i is the initial weight threshold of the i-th layer of the initial neural network model
  • ⁇ i is the average value of the initial weight of the i-th layer
  • ⁇ i is the standard of the initial weight of the i-th layer Poor
  • is the set value, ⁇ 0.
  • the mean value of the initial weights of the i-th layer may meet the following formula 2:
  • ⁇ in is the weight of the i-th layer of the initial neural network model
  • P i is the number of weights of the i-th layer of the neural network model
  • P i is a positive integer greater than 1 .
  • the standard deviation ⁇ i of the initial weight of the i-th layer may meet the following formula 3:
  • the processor before acquiring the initial weight value of the i-th layer of the initial neural network model, the processor needs to train the neural network to obtain the ownership value of the neural network, and then obtain the initial Neural network model.
  • training the neural network to obtain the ownership value in the neural network may specifically include: building the neural network structure and the ownership value in the neural network through data input and neural network model construction.
  • the specific process may be that the processor will The weight in the i-th layer that is smaller than the initial weight threshold of the i-th layer is set to zero, and the weight in the i-th layer that is not smaller than the initial weight threshold of the i-th layer is kept unchanged. It should be noted that after the above-mentioned tailoring process, only some weights in the i-th layer are set to zero, not to delete the corresponding branches, that is to say, the branch with zero weights in the neural network model still exists, but the weights It's just zero. In the same way, the principle of zero-setting of weights (that is, the tailoring method) involved in the subsequent process is the same, and the specific follow-up will not be described in detail.
  • Step 302 The processor performs multiple trainings on the cropped neural network model.
  • the first threshold is determined according to the weight threshold of the i-th layer of the neural network model obtained from the t-1 training
  • the weight threshold of the i-th layer during t times of training according to the weight threshold of the i-th layer during the t-th training, the current weight of the i-th layer during the t-th training is cropped; Pass any positive number from 1 to q, where q is the total number of times of training.
  • step 301 After the processor trims the weights of all layers of the initial neural network, the processor executes step 302.
  • the processor determines the weight threshold of the i-th layer during the t-th training according to the weight threshold of the i-th layer of the neural network model obtained from the t-1th training, which may meet the following Formula 4:
  • the processor compares the current value of the i-th layer during the t-th training according to the weight threshold of the i-th layer during the t-th training
  • the specific formula 5 can be met:
  • ⁇ t in is the weight of the i-th layer during the t-th training.
  • the processor sets zero when the absolute value of the weight in the i-th layer is less than the corresponding weight threshold, otherwise it remains unchanged.
  • the specific principle is the same as that involved in step 301
  • the principle of trimming the weights of each layer in the initial neural network model is similar, and you can refer to each other for details.
  • the above training process is a cyclic process
  • the weight of each training process is the weight obtained after the last training
  • the weight threshold of each layer in each training is based on the last time
  • the weight threshold of this layer is obtained during training.
  • the weights of all layers of the neural network model are processed, and then the next training is performed until the training results meet certain conditions and the training ends.
  • the weight threshold may be converged, and so on.
  • the weight threshold is no longer dependent on the setting and does not need to be repeatedly tried and trained, but the weight threshold can be adaptively adjusted according to the actual tailoring situation, which can be flexible Realize the compression of the neural network without being affected by human factors, which can make the final neural network model more stable.
  • the first training is the first training of the cropped neural network model.
  • t is 1, the nerve obtained by training at the t-1th time (that is, the 0th time)
  • the weight threshold of the i-th layer of the network model is the initial weight threshold used for the i-th layer of the initial neural network model during the cutting process. That is to say, the weight threshold of the i-th layer during the first training is obtained based on the initial weight threshold corresponding to the i-th layer.
  • the processor can adaptively adjust the weight threshold of each crop, that is, it can flexibly perform neural network compression, which can avoid the artificial factors caused by artificially setting the weight threshold. Influence, which can enhance the performance stability of neural network models.
  • the embodiments of the present application also provide an example of a neural network compression method, which is applicable to the computer device shown in FIG. 2 and the neural network shown in FIG. 1.
  • the specific process of the example shown in FIG. 4 may include the following steps:
  • Step 401 The processor obtains a layer of initial weights from the initial neural network model.
  • Step 402 The processor determines the mean and variance of the initial weight of the layer according to the obtained initial weight of the layer.
  • Step 403 The processor determines the initial weight threshold of the layer according to the mean and variance of the initial weight of the layer.
  • Step 404 The processor trims the initial weight of the layer according to the initial weight threshold of the layer.
  • Step 405 The processor updates the initial weight value of the layer to the cropped weight value of the layer (that is, the processor writes the cropped weight value back to the neural network model).
  • Step 406 The processor determines whether the weights of all layers of the initial neural network model have been clipped. If yes, step 407 is executed to enter the neural network model training process, otherwise step 401 is executed.
  • Step 407 The processor obtains the gradient corresponding to the weight of a layer of the current neural network model and the weight threshold of the layer during the last training.
  • Step 408 The processor determines the weight threshold of the layer during the current training according to the gradient corresponding to the weight of the layer and the weight threshold of the layer during the previous training.
  • the weight threshold of the layer during the first training is determined based on the initial threshold of the layer when the weight of the layer of the initial neural network model is trimmed.
  • Step 409 The processor trims the current weight of the layer according to the determined weight threshold of the layer.
  • Step 410 The processor updates the weight of the layer in the neural network model. (That is, the processor writes back the weighted value of the layer after cropping back to the neural network model).
  • Step 411 The processor judges whether the weights of all layers have been processed in the current training. If yes, step 412 is entered; otherwise, step 407 is repeated.
  • Step 412 The processor completes a neural network model training.
  • Step 413 The processor judges whether the training of the neural network model is finished, if it is, then it is finished, otherwise step 407 is executed.
  • the processor can adaptively adjust the weight threshold of each crop, that is to say, it can flexibly compress the neural network, which can avoid the influence of human factors caused by artificially setting the weight threshold, so that the neural network model can be enhanced Performance stability.
  • the embodiment of the present application further provides a neural network compression device, which is used to implement the embodiment shown in FIGS. 3 and 4 to provide a neural network compression method.
  • the neural network compression device 500 includes a weight trimming unit 501 and a training unit 502, where:
  • the weight cropping unit 501 is used to crop the initial weight of the i-th layer according to the initial weight threshold of the i-th layer of the initial neural network model to obtain a cropped neural network model. Any positive integer from 1 to m, where m is the total number of layers of the neural network model;
  • the training unit 502 is used to perform multiple trainings on the cropped neural network model
  • the weight cropping unit 501 is further used to determine the first weight threshold of the i-th layer of the neural network model obtained by the training unit 502 during the t-th training process during the t-th training process.
  • the weight threshold of the i-th layer during t times of training, according to the weight threshold of the i-th layer during the t-th training, the current weight of the i-th layer during the t-th training is cropped; Any positive number from 1 to q, where q is the total number of trainings.
  • the neural network compression device may further include a weight acquisition unit 503 and a threshold determination unit 504 as shown in FIG. 5, specifically:
  • the weight obtaining unit 503 is used to obtain the initial nerve before the weight cutting unit 501 crops the initial weight of the i-th layer according to the initial weight threshold of the i-th layer of the initial neural network model
  • the threshold determination unit 504 is used to determine the mean and standard deviation of the initial weight of the i-th layer according to the initial weight of the i-th layer;
  • the mean and standard deviation of the initial weight of the i-th layer determine the initial weight threshold of the i-th layer.
  • the functions of the weight acquisition unit 503 and the threshold determination unit 504 may also be directly implemented by the weight clipping unit 501, which is not limited in this application.
  • the initial weight threshold of the i-th layer may conform to the following formula:
  • T i is the initial weight threshold of the i-th layer of the initial neural network model
  • ⁇ i is the mean of the initial weight of the i-th layer
  • ⁇ i is the standard deviation of the initial weight of the i-th layer
  • is the set Fixed value, ⁇ 0.
  • the weight clipping unit 501 determines the weight threshold of the i-th layer at the t-th training based on the weight threshold of the i-th layer of the neural network model obtained at the t-1th training Can meet the following formula:
  • the weight clipping unit 501 is specifically used to: cut the weight of any layer when cutting the weight of the any layer according to the weight threshold of any layer
  • the weight value less than the weight threshold value of any layer is set to zero, and the weight value of the weight value of any layer that is greater than or equal to the weight threshold value of the any layer is kept unchanged.
  • the neural network compression device provided by the embodiment of the present application can adaptively adjust the weight threshold of each crop, that is, the neural network compression can be flexibly performed, and the influence of human factors caused by artificially setting the weight threshold can be avoided. Therefore, the performance stability of the neural network model can be enhanced.
  • the division of the units in the embodiments of the present application is schematic, and is only a division of logical functions, and there may be another division manner in actual implementation.
  • the functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or software function unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solution of the present application essentially or part of the contribution to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods described in the embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .
  • an embodiment of the present application further provides a neural network compression device, which is used to implement the neural network compression method shown in FIG. 3 or FIG. 4.
  • the neural network compression device 600 may include: a processor 601 and a memory 602, where:
  • the processor 601 may be a CPU, GPU, or a combination of CPU and GPU.
  • the processor 601 may also be an NPU, TPU, etc. AI chip that supports neural network processing.
  • the processor 601 may further include a hardware chip.
  • the above hardware chip may be ASIC, PLD, DSP or a combination thereof.
  • the above PLD may be CPLD, FPGA, GAL or any combination thereof. It should be noted that the processor 601 is not limited to the above-mentioned cases, and the processor 601 may be any processing device capable of implementing neural network operations.
  • the processor 601 and the memory 602 are connected to each other.
  • the processor 601 and the memory 602 are connected to each other through a bus 603;
  • the bus 603 may be a Peripheral Component Interconnect (PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture) , EISA) bus and so on.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into an address bus, a data bus, and a control bus. For ease of representation, only a thick line is used in FIG. 6, but it does not mean that there is only one bus or one type of bus.
  • the processor 601 when used to implement the neural network compression method provided by the embodiment shown in FIG. 3 of the present application, it may specifically perform steps 301 and 302 in the above embodiments shown in FIG. 3 Operation, and other operations in steps 301 and 302 can be performed, the specific descriptions involved can refer to each other, and will not be repeated here.
  • the processor 601 when used to implement the neural network compression method provided by the embodiment shown in FIG. 4 of the present application, it may specifically execute steps 401 to 413 in the above embodiments shown in FIG. 4 The operations in can be referred to each other, and will not be repeated here.
  • the memory 602 is used to store programs and data.
  • the program may include program code, and the program code includes instructions for computer operation.
  • the memory 602 may include random access memory (random access memory, RAM), or may also include non-volatile memory (non-volatile memory), for example, at least one magnetic disk memory.
  • the processor 601 executes the program stored in the memory 602 to realize the above functions, thereby implementing the neural network compression method shown in FIG. 3 or FIG. 4.
  • the neural network compression device shown in FIG. 6 when the neural network compression device shown in FIG. 6 can be applied to a computer device, the neural network compression device may be embodied as the computer device shown in FIG. 2.
  • the processor 601 may be the same as the processor 210 shown in FIG. 2
  • the memory 602 may be the same as the memory 220 shown in FIG. 2.
  • the neural network compression method and device provided by the embodiments of the present application can adaptively adjust the weight threshold of each crop, that is to say, the neural network compression can be flexibly performed, which can avoid artificially setting the weight threshold band
  • the influence of human factors can enhance the performance stability of the neural network model.
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present application may take the form of a computer program product implemented on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code.
  • computer usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer readable memory that can guide a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer readable memory produce an article of manufacture including an instruction device, the instructions
  • the device implements the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to produce computer-implemented processing, which is executed on the computer or other programmable device
  • the instructions provide steps for implementing the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Image Analysis (AREA)

Abstract

A neural network compression method and apparatus, used to solve the problem in the prior art of inflexible model compression. The method comprises: according to an initial weight value threshold of an ith layer of an initial neural network model, cropping an initial weight value of the ith layer, and obtaining a cropped neural network model, wherein i is any positive integer from 1 to m, and m is the total number of layers of the neural network model; performing multiple rounds of training on the cropped neural network model, and during a tth round of training, determining a weight value threshold of the ith layer during the tth round of training according to a weight value threshold of the ith layer of the neural network model obtained from a tth-1 round of training, and cropping a current weight value of the ith layer during the tth round of training according to the weight value threshold of the ith layer during the tth round of training, wherein t is any positive number from 1 to q, and q is the total number of rounds of training. The present invention thus adaptively adjusts the weight value threshold of each cropping, and flexibly compresses the neural network.

Description

一种神经网络压缩方法及装置Neural network compression method and device 技术领域Technical field
本申请涉及神经网络领域,尤其涉及一种神经网络压缩方法及装置。This application relates to the field of neural networks, and in particular to a neural network compression method and device.
背景技术Background technique
随着智能芯片的不断发展,深度学习技术已经越来越多地应用到了人们身边的智能设备之中,比如手机、智能家居设备、可穿戴设备、车载设备等。但是深度学习模型(也即神经网络模型)对于硬件资源的消耗仍然是巨大的,例如存储空间、内存、中央处理器(central processing unit,CPU)/神经网络处理器(network processing unit,NPU)计算资源、电池续航等方面。例如,对于嵌入式设备而言,即使采用了专门用于加速深度学习的NPU,硬件资源也非常有限,难以承载多个模型的业务需求,严重制约了智能设备的性能和用户体验。With the continuous development of smart chips, deep learning technology has been increasingly applied to smart devices around people, such as mobile phones, smart home devices, wearable devices, and vehicle-mounted devices. However, the consumption of hardware resources by deep learning models (that is, neural network models) is still huge, such as storage space, memory, central processing unit (CPU) / neural network processor (network processing unit, NPU) calculation Resources, battery life, etc. For example, for embedded devices, even if an NPU specifically used to accelerate deep learning is used, the hardware resources are very limited, and it is difficult to carry the business needs of multiple models, which seriously restricts the performance and user experience of smart devices.
为了解决上述问题,对深度学习模型的压缩技术应运而生。该类技术可在精度可容许的下降范围内,剔除模型权重参数的大量冗余,大幅缩小模型体积。目前权值稀疏化是常用的压缩技术。具体的,权值稀疏化通过丢弃一部分小权重连接(权值置为0)来实现权值稀疏化,从而剔除冗余和加速运算。权值稀疏化不仅可以通过裁剪掉模型冗余权值来缩小模型体积,更重要的是,通过剔除一部分较小权值,可以剪除模型多余的弱性能分支,并通过后续训练加强性能优越的剩余分支,从而提升模型的最终精度。In order to solve the above problems, the compression technology for deep learning models came into being. This type of technology can eliminate a large amount of redundancy of the model's weight parameters within a tolerable drop in accuracy, and greatly reduce the size of the model. Sparse weights are currently commonly used compression techniques. Specifically, the weight sparseness achieves the weight sparseness by discarding a part of the small weight connections (weight is set to 0), thereby eliminating redundancy and speeding up operations. The weight sparseness can not only reduce the model volume by cutting out the redundant weights of the model, but more importantly, by excluding a part of the smaller weights, the redundant weak performance branches of the model can be cut out, and the residuals with superior performance can be strengthened through subsequent training. Branches to improve the final accuracy of the model.
目前权值稀疏化的运用中,最常用的方法是基于权值阈值将小于阈值的部分权值置零,进行权值稀疏化。然而在实际应用中,权值阈值的配置比较盲目,通常是依赖人为经验设定,并且需要经过试探性地对多个设定值进行反复训练,以选择一个模型精度较好的设定值。In the current application of weight sparseness, the most commonly used method is to set the weights less than the threshold to zero based on the weight threshold to perform weight sparseness. However, in practical applications, the configuration of the weight threshold is relatively blind, usually relying on human experience to set, and iteratively needs to repeatedly train multiple settings to select a setting with better model accuracy.
显然,上述方法实现模型稀疏化的过程耗时较长,权值阈值容易受人为因素影响,导致模型性能不稳定。也就是说上述方法在实现模型压缩时并不灵活。Obviously, the above method takes a long time to achieve model sparseness, and the weight threshold is easily affected by human factors, resulting in unstable model performance. In other words, the above method is not flexible when implementing model compression.
发明内容Summary of the invention
本申请实施例提供了一种神经网络压缩方法及装置,用以解决现有技术中模型压缩不灵活的问题。Embodiments of the present application provide a neural network compression method and device to solve the problem of inflexible model compression in the prior art.
第一方面,本申请提供了一种神经网络压缩方法,根据初始神经网络模型的第i层的初始权值阈值,对所述第i层的初始权值进行裁剪,得到裁剪后的神经网络模型,所述i取遍1至m中的任意一个正整数,所述m为所述神经网络模型的总层数;对所述裁剪后的神经网络模型进行多次训练,在第t次训练过程中,根据第t-1次训练得到的神经网络模型第i层的权值阈值确定第t次训练时第i层的权值阈值,根据所述第t次训练时第i层的权值阈值对所述第t次训练时第i层当前的权值进行裁剪;所述t取遍1至q中的任意一个正数,所述q为多次训练的总次数。In the first aspect, the present application provides a neural network compression method, according to the initial weight threshold of the i-th layer of the initial neural network model, the initial weight of the i-th layer is clipped to obtain a clipped neural network model , The i takes any positive integer from 1 to m, where m is the total number of layers of the neural network model; multiple trainings are performed on the cropped neural network model, during the t-th training process , The weight threshold of the i-th layer during the t-th training is determined according to the weight threshold of the i-th layer of the neural network model obtained from the t-1th training, and the weight threshold of the i-th layer during the t-th training Crop the current weight value of the i-th layer during the t-th training; the t takes any positive number from 1 to q, and the q is the total number of trainings.
通过上述方法,可以自适应地调整每次裁剪的权重阈值,也就是说可以灵活进行神经网络压缩,可以避免人为设定权值阈值带来的人为因素的影响,从而可以增强神经网络模型性能稳定性。Through the above method, the weight threshold of each crop can be adjusted adaptively, which means that the neural network can be flexibly compressed, which can avoid the influence of human factors caused by artificially setting the weight threshold, thereby enhancing the stability of the performance of the neural network model. Sex.
在一种可能的设计中,根据初始神经网络模型的第i层的初始权值阈值对所述第i层的初始权值进行裁剪之前,获取所述初始神经网络模型的所述第i层的初始权值;然后根据所述第i层的初始权值确定所述第i层的初始权值的均值和标准差;最后根据所述第i层的初始权值的均值和标准差,确定所述第i层的初始权值阈值。In a possible design, before clipping the initial weight value of the i-th layer according to the initial weight threshold of the i-th layer of the initial neural network model, obtain the i-th layer of the initial neural network model Initial weights; then determine the mean and standard deviation of the initial weights of the i-th layer according to the initial weights of the i-th layer; and finally determine the average weights and standard deviations of the initial weights of the i-th layer The initial weight threshold of the i-th layer.
通过上述方式,可以准确地确定每一层的初始权值阈值,以使对初始神经网络的每一层的权值进行裁剪。In the above manner, the initial weight threshold of each layer can be accurately determined, so that the weight of each layer of the initial neural network can be tailored.
在一种可能的设计中,所述第i层的初始权值阈值可以符合以下公式:In a possible design, the initial weight threshold of the i-th layer may conform to the following formula:
T i=μ i-λ·σ i T i = μ i -λ·σ i
其中,T i为所述初始神经网络模型第i层的初始权值阈值;μ i为第i层的初始权值的均值;σ i为第i层的初始权值的标准差;λ为设定值,λ≥0。 Where T i is the initial weight threshold of the i-th layer of the initial neural network model; μ i is the mean of the initial weight of the i-th layer; σ i is the standard deviation of the initial weight of the i-th layer; λ is the set Fixed value, λ≥0.
通过上述公式可以准确地确定每一层的初始权值阈值,以使对初始神经网络的每一层的权值进行裁剪。Through the above formula, the initial weight threshold of each layer can be accurately determined, so that the weight of each layer of the initial neural network can be tailored.
在一种可能的设计中,根据第t-1次训练得到的神经网络模型第i层的权值阈值确定第t次训练时第i层的权值阈值,可以符合以下公式:In a possible design, the weight threshold of the i-th layer during the t-th training can be determined according to the weight threshold of the i-th layer of the neural network model obtained from the t-1th training, which can meet the following formula:
Figure PCTCN2018125372-appb-000001
Figure PCTCN2018125372-appb-000001
其中,
Figure PCTCN2018125372-appb-000002
为第t次训练时第i层的权值阈值;
Figure PCTCN2018125372-appb-000003
为第t-1次训练时第i层的权值阈值;
Figure PCTCN2018125372-appb-000004
为第t-1次训练得到的神经网络模型第i层中不为零的权值的梯度均值;
Figure PCTCN2018125372-appb-000005
为第t-1次训练得到的神经网络模型第i层中不为零的权值的梯度;N i为第t-1次训练得到的神经网络模型第i层中不为零的权值的个数,N i为大于1的正整数。
among them,
Figure PCTCN2018125372-appb-000002
Is the weight threshold of the i-th layer during the t-th training;
Figure PCTCN2018125372-appb-000003
Is the weight threshold of the i-th layer during the t-1th training;
Figure PCTCN2018125372-appb-000004
The gradient mean of the non-zero weights in the i-th layer of the neural network model obtained from the t-1th training;
Figure PCTCN2018125372-appb-000005
Is the gradient of the non-zero weights in the i-th layer of the neural network model obtained from the t-1 training; N i is the non-zero weights in the i-th layer of the neural network model obtained from the t-1 training Number, N i is a positive integer greater than 1.
通过上述方法,可以在每一次训练过程中自适应地得到每一层的权值阈值,从而可以灵活进行神经网络压缩。Through the above method, the weight threshold of each layer can be adaptively obtained during each training process, so that neural network compression can be flexibly performed.
在一种可能的设计中,根据任一层的权值阈值对所述任一层的权值进行裁剪,具体方法可以为:将所述任一层的权值中小于所述任一层的权值阈值的权值置零,以及将所述任一层的权值中大于或者等于所述任一层的权值阈值的权值保持不变。这样可以成功完成每一层的权值裁剪。In a possible design, the weight of any layer is tailored according to the weight threshold of any layer. The specific method may be: the weight of any layer is less than the weight of the any layer The weight of the weight threshold is set to zero, and the weight of the weight of any layer that is greater than or equal to the weight threshold of the any layer is kept unchanged. This can successfully complete the weight cutting of each layer.
第二方面,本申请还提供了一种神经网络压缩装置,该神经网络压缩装置具有实现上述第一方面方法的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。In a second aspect, the present application also provides a neural network compression device, which has the function of implementing the method of the first aspect described above. The function can be realized by hardware, or can also be realized by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions.
在一个可能的设计中,所述神经网络压缩装置的结构中可以包括权值裁剪单元和训练单元,这些单元可以执行上述方法示例中的相应功能,具体参见方法示例中的详细描述,此处不做赘述。In a possible design, the structure of the neural network compression device may include a weight cutting unit and a training unit, and these units may perform the corresponding functions in the above method examples. For details, see the detailed description in the method examples. To repeat.
在一个可能的设计中,所述神经网络压缩装置的结构中可以包括处理器和存储器,所述处理器被配置为执行上述提及的方法。所述存储器与所述处理器耦合,其保存所述神经网络压缩装置必要的程序指令和数据。In a possible design, the structure of the neural network compression device may include a processor and a memory, and the processor is configured to perform the above-mentioned method. The memory is coupled to the processor, and stores necessary program instructions and data of the neural network compression device.
第三方面,本申请还提供了一种计算机存储介质,所述计算机存储介质中存储有计算 机可执行指令,所述计算机可执行指令在被所述计算机调用时用于使所述计算机执行上述第一方面提及的任一种方法。In a third aspect, the present application also provides a computer storage medium that stores computer-executable instructions, which when used by the computer are used to cause the computer to execute the first Any one of the methods mentioned on the one hand.
第四方面,本申请还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面提及的任一种方法。According to a fourth aspect, the present application also provides a computer program product containing instructions, which when executed on a computer, causes the computer to perform any of the methods mentioned in the first aspect above.
第五方面,本申请还提供了一种芯片,所述芯片与存储器耦合,用于读取并执行存储器中存储的程序指令,以实现上述第一方面提及的任一种方法。According to a fifth aspect, the present application further provides a chip coupled to a memory, and used to read and execute program instructions stored in the memory to implement any of the methods mentioned in the first aspect above.
附图说明BRIEF DESCRIPTION
图1为本申请实施例提供的一种神经网络的示意图;1 is a schematic diagram of a neural network provided by an embodiment of this application;
图2为本申请实施例提供的一种计算机装置的结构图;2 is a structural diagram of a computer device provided by an embodiment of the present application;
图3为本申请实施例提供的一种神经网络压缩方法的流程图;3 is a flowchart of a neural network compression method provided by an embodiment of this application;
图4为本申请实施例提供的一种神经网络压缩方法的示例的流程图;4 is a flowchart of an example of a neural network compression method provided by an embodiment of the present application;
图5为本申请实施例提供的一种神经网络压缩装置的结构示意图;5 is a schematic structural diagram of a neural network compression device provided by an embodiment of the present application;
图6为本申请实施例提供的一种神经网络压缩装置的结构图。6 is a structural diagram of a neural network compression device provided by an embodiment of the present application.
具体实施方式detailed description
下面将结合附图对本申请作进一步地详细描述。The application will be described in further detail below with reference to the drawings.
本申请实施例提供一种神经网络压缩方法及装置,用以解决现有技术中模型压缩不灵活的问题。其中,本申请所述方法和装置基于同一发明构思,由于方法及装置解决问题的原理相似,因此装置与方法的实施可以相互参见,重复之处不再赘述。Embodiments of the present application provide a neural network compression method and device to solve the problem of inflexible model compression in the prior art. Among them, the method and the device described in this application are based on the same inventive concept. Since the principles of the method and the device to solve the problem are similar, the implementation of the device and the method can be referred to each other, and the repetition is not repeated.
以下,对本申请中的神经网络进行解释说明,以便于本领域技术人员理解:In the following, the neural network in this application will be explained to facilitate the understanding of those skilled in the art:
神经网络是模仿动物神经网络行为特征,类似于大脑神经突触连接的结构进行数据处理。神经网络作为一种数学运算模型,由大量的节点(或称为神经元)之间相互连接构成。神经网络由输入层、隐藏层、输出层组成,例如图1所示。其中,输入层为神经网络的输入数据;输出层为神经网络的输出数据;而隐藏层由输入层和输出层之间众多节点连接组成的,用于对输入数据进行运算处理。其中,隐藏层可以由一层或多层构成。神经网络中隐藏层的层数、节点数与该神经网络实际解决的问题的复杂程度、输入层的节点以及输出层的节点的个数有着直接关系。Neural network is to imitate the behavior characteristics of animal neural network, similar to the structure of brain synapse connection for data processing. As a mathematical operation model, a neural network consists of a large number of nodes (or neurons) connected to each other. The neural network consists of an input layer, a hidden layer, and an output layer, such as shown in Figure 1. Among them, the input layer is the input data of the neural network; the output layer is the output data of the neural network; and the hidden layer is composed of many nodes connected between the input layer and the output layer, and is used to perform arithmetic processing on the input data. Among them, the hidden layer may be composed of one or more layers. The number of hidden layers in the neural network and the number of nodes are directly related to the complexity of the problem actually solved by the neural network, the number of nodes in the input layer and the number of nodes in the output layer.
在本申请实施例中,执行神经网络压缩方法的可以但不限于是处理器,其中,当神经网络压缩装置为处理器时,所述处理器可以为计算机装置中的处理器,也可以为其他设备(例如芯片系统)中的处理器,还可以为单独存在的处理器。在本申请实施例中,以计算机装置中的处理器执行神经网络压缩方法为例进行说明。In the embodiment of the present application, the neural network compression method may be but not limited to a processor. When the neural network compression device is a processor, the processor may be a processor in a computer device or other The processor in the device (for example, a chip system) may also be a separate processor. In the embodiments of the present application, a description will be made by taking a processor in a computer device executing a neural network compression method as an example.
为了更加清晰地描述本申请实施例的技术方案,下面结合附图,对本申请实施例提供的神经网络压缩方法及装置进行详细说明。In order to more clearly describe the technical solutions of the embodiments of the present application, the neural network compression method and device provided by the embodiments of the present application will be described in detail below with reference to the drawings.
图2示出了本申请实施例提供的神经网络压缩方法适用的一种可能的计算机装置的结构图。参阅图2所示,所述计算机装置中包括:处理器210、存储器220、通信模块230、输入单元240、显示单元250、电源260等部件。本领域技术人员可以理解,图2中示出的计算机装置的结构并不构成对计算机装置的限定,本申请实施例提供的计算机装置可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。FIG. 2 shows a structural diagram of a possible computer device applicable to the neural network compression method provided by the embodiment of the present application. Referring to FIG. 2, the computer device includes: a processor 210, a memory 220, a communication module 230, an input unit 240, a display unit 250, a power supply 260 and other components. Those skilled in the art can understand that the structure of the computer device shown in FIG. 2 does not constitute a limitation on the computer device. The computer device provided in the embodiments of the present application may include more or less components than the illustration, or a combination of Components, or different component arrangements.
下面结合图2对计算机装置的各个构成部件进行具体的介绍:The following describes each component of the computer device in detail with reference to FIG. 2:
所述通信模块230可以通过无线连接或物理连接的方式连接其他设备,实现计算机装置的数据发送和接收。可选的,所述通信模块230可以包含射频(radio frequency,RF)电路、无线保真(wireless fidelity,WiFi)模块、通信接口,蓝牙模块等任一项或组合,本申请实施例对此不作限定。The communication module 230 may be connected to other devices through a wireless connection or a physical connection to implement data transmission and reception by a computer device. Optionally, the communication module 230 may include any one or a combination of a radio frequency (RF) circuit, a wireless fidelity (WiFi) module, a communication interface, a Bluetooth module, etc. This embodiment of the present application does not make any limited.
所述存储器220可用于存储程序指令和数据。所述处理器210通过运行存储在所述存储器220的程序指令,从而执行计算机装置的各种功能应用以及数据处理。其中,所述程序指令中存在可使所述处理器210执行本申请以下实施例提供的神经网络压缩方法的程序指令。The memory 220 can be used to store program instructions and data. The processor 210 executes program instructions stored in the memory 220 to execute various functional applications and data processing of the computer device. Among the program instructions, there are program instructions that enable the processor 210 to execute the neural network compression method provided by the following embodiments of the present application.
可选的,所述存储器220可以主要包括存储程序区和存储数据区。其中,存储程序区可存储操作系统、各种应用程序,以及程序指令等;存储数据区可存储神经网络等各种数据。此外,所述存储器220可以包括高速随机存取存储器,还可以包括非易失性存储器,例如磁盘存储器件、闪存器件、或其他易失性固态存储器件。Optionally, the memory 220 may mainly include a program storage area and a data storage area. Among them, the storage program area can store the operating system, various application programs, and program instructions; the storage data area can store various data such as neural networks. In addition, the memory 220 may include a high-speed random access memory, and may also include a non-volatile memory, such as a magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
所述输入单元240可用于接收用户输入的数据或操作指令等信息。可选的,所述输入单元240可包括触控面板、功能键、物理键盘、鼠标、摄像头、监控器等输入设备。The input unit 240 may be used to receive information such as data or operation instructions input by the user. Optionally, the input unit 240 may include input devices such as a touch panel, function keys, a physical keyboard, a mouse, a camera, and a monitor.
所述显示单元250可以实现人机交互,用于通过用户界面显示由用户输入的信息,提供给用户的信息等内容。其中,所述显示单元250可以包括显示面板251。可选的,所述显示面板251可以采用液晶显示屏(liquid crystal display,LCD)、有机发光二极管(organic light-emitting diode,OLED)等形式来配置。The display unit 250 can realize human-computer interaction, and is used to display information input by the user and information provided to the user through the user interface. Wherein, the display unit 250 may include a display panel 251. Optionally, the display panel 251 may be configured in the form of a liquid crystal display (liquid crystal) (LCD), an organic light-emitting diode (OLED), or the like.
进一步的,当输入单元中包含触控面板时,该触控面板可覆盖所述显示面板251,当所述触控面板检测到在其上或附近的触摸事件后,传送给所述处理器210以确定触摸事件的类型从而执行相应的操作。Further, when the input unit includes a touch panel, the touch panel can cover the display panel 251, and when the touch panel detects a touch event on or near it, it is transmitted to the processor 210 To determine the type of touch event to perform the corresponding operation.
所述处理器210是计算机装置的控制中心,利用各种接口和线路连接以上各个部件。所述处理器210可以通过执行存储在所述存储器220内的程序指令,以及调用存储在所述存储器220内的数据,以完成计算机装置的各种功能,实现本申请实施例提供的神经网络压缩方法。The processor 210 is a control center of a computer device, and uses various interfaces and lines to connect the above components. The processor 210 may execute program instructions stored in the memory 220 and call data stored in the memory 220 to complete various functions of the computer device and implement neural network compression provided by the embodiments of the present application method.
可选的,所述处理器210可包括一个或多个处理单元。在一种可实现方式中,所述处理器210可集成应用处理器和调制解调处理器,其中,所述应用处理器主要处理操作系统、用户界面和应用程序等,所述调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到所述处理器210中。在本申请实施例中,所述处理单元可以对神经网络进行压缩。其中,示例性的,所述处理器210可以是中央处理器(central processing unit,CPU),图形处理器(Graphics Processing Unit,GPU)或者CPU和GPU的组合。所述处理器210还可以是网络处理器(network processor unit,NPU)、张量处理器(tensor processing unit,TPU)等等支持神经网络处理的人工智能(artificial intelligence,AI)芯片。所述处理器210还可以进一步包括硬件芯片。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD),数字信号处理器件(digital sgnal processing,DSP)或其组合。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD),现场可编程逻辑门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。Optionally, the processor 210 may include one or more processing units. In an implementable manner, the processor 210 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application program, and the like, and the modem process The device mainly handles wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 210. In the embodiment of the present application, the processing unit may compress the neural network. For example, the processor 210 may be a central processing unit (CPU), a graphics processor (Graphics Processing Unit, GPU), or a combination of CPU and GPU. The processor 210 may also be a network processor (network processor) unit (NPU), a tensor processor (tensor processing unit, TPU), and other artificial intelligence (AI) chips that support neural network processing. The processor 210 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), a digital signal processing device (DSP), or a combination thereof. The PLD may be a complex programmable logic device (complex programmable logic device (CPLD), field programmable gate array (FPGA), general array logic (GAL) or any combination thereof.
所述计算机装置还包括用于给各个部件供电的电源260(比如电池)。可选的,所述电源260可以通过电源管理系统与所述处理器210逻辑相连,从而通过电源管理系统实现对所述计算机装置的充电、放电等功能。The computer device also includes a power source 260 (such as a battery) for powering various components. Optionally, the power supply 260 may be logically connected to the processor 210 through a power management system, so as to realize functions such as charging and discharging the computer device through the power management system.
尽管未示出,所述计算机装置还可以包括摄像头、传感器、音频采集器等部件,在此不再赘述。Although not shown, the computer device may further include components such as a camera, a sensor, and an audio collector, which will not be repeated here.
本发明实施例提供的一种神经网络压缩方法,适用于图2所示的计算机装置及图1所示的神经网络。该方法可以有所述计算机装置中的处理器执行。参阅图3所示,所述方法的具体流程可以包括:A neural network compression method provided by an embodiment of the present invention is applicable to the computer device shown in FIG. 2 and the neural network shown in FIG. 1. The method may be executed by a processor in the computer device. Referring to FIG. 3, the specific flow of the method may include:
步骤301:处理器根据初始神经网络模型的第i层的初始权值阈值,对所述第i层的初始权值进行裁剪,得到裁剪后的神经网络模型,所述i取遍1至m中的任意一个正整数,所述m为所述神经网络模型的总层数。Step 301: The processor cuts the initial weight of the i-th layer according to the initial weight threshold of the i-th layer of the initial neural network model to obtain a cropped neural network model, and the i is taken from 1 to m Any positive integer of, m is the total number of layers of the neural network model.
众所周知,神经网络模型有很多层,即所述m为大于1的正整数。对神经网络进行训练或者基于神经网络模型对数据进行处理时,通常是分层训练或处理的。例如,在步骤301中的过程,对初始神经网络模型进行剪裁时,是对所述初始神经网络模型的各个层分别进行相同的操作,也就是说一层一层的,对每一层均根据该层的初始权值阈值对该层的初始权值进行剪裁。As we all know, the neural network model has many layers, that is, m is a positive integer greater than 1. When training a neural network or processing data based on a neural network model, it is usually layered training or processing. For example, in the process in step 301, when the initial neural network model is trimmed, the same operations are performed on each layer of the initial neural network model, that is, one layer at a time, and each layer is based on The initial weight threshold of the layer trims the initial weight of the layer.
在一种可选的实施方式中,所述处理器在根据所述初始神经网络模型的第i层的初始权值阈值对所述第i层的初始权值进行裁剪之前,可以先获取所述初始神经网络模型的所述第i层的初始权值;然后根据所述第i层的初始权值确定所述第i层的初始权值的均值和标准差,并根据所述第i层的初始权值的均值和标准差,确定所述第i层的初始权值阈值。In an optional implementation manner, the processor may obtain the first weight of the initial weight of the i-th layer before trimming the initial weight of the i-th layer according to the initial weight threshold of the i-th layer of the initial neural network model The initial weights of the i-th layer of the initial neural network model; then the mean and standard deviation of the initial weights of the i-th layer are determined according to the initial weights of the i-th layer, and according to the The mean and standard deviation of the initial weights determine the initial weight threshold of the i-th layer.
示例性的,所述第i层的初始权值阈值可以符合以下公式一:Exemplarily, the initial weight threshold of the i-th layer may meet the following formula 1:
T i=μ i-λ·σ i                公式一 T ii -λ·σ i formula one
其中,在上述公式一中T i为所述初始神经网络模型第i层的初始权值阈值;μ i为第i层的初始权值的均值;σ i为第i层的初始权值的标准差;λ为设定值,λ≥0。 Where, in the above formula 1, T i is the initial weight threshold of the i-th layer of the initial neural network model; μ i is the average value of the initial weight of the i-th layer; σ i is the standard of the initial weight of the i-th layer Poor; λ is the set value, λ≥0.
在一种可选的实施方式中,所述第i层的初始权值的均值可以符合以下公式二:In an optional embodiment, the mean value of the initial weights of the i-th layer may meet the following formula 2:
Figure PCTCN2018125372-appb-000006
Figure PCTCN2018125372-appb-000006
其中,在上述公式二中,ω in为所述初始神经网络模型第i层的权值;P i为所述神经网络模型第i层的权值的个数,P i为大于1的正整数。 In the above formula 2, ω in is the weight of the i-th layer of the initial neural network model; P i is the number of weights of the i-th layer of the neural network model, and P i is a positive integer greater than 1 .
在一种可选的实施方式中,所述第i层的初始权值的标准差σ i可以符合以下公式三: In an optional implementation manner, the standard deviation σ i of the initial weight of the i-th layer may meet the following formula 3:
Figure PCTCN2018125372-appb-000007
Figure PCTCN2018125372-appb-000007
在一种可选的实施方式中,所述处理器在获取所述初始神经网络模型第i层的初始权值之前,需要对神经网络进行训练,得到神经网络的所有权值,进而得到所述初始神经网络模型。示例性的,对所述神经网络进行训练,得到所述神经网络中的所有权值,具体可以为:通过数据输入和神经网络模型构建,得到神经网络的结构和神经网络中的所有权值。In an optional embodiment, before acquiring the initial weight value of the i-th layer of the initial neural network model, the processor needs to train the neural network to obtain the ownership value of the neural network, and then obtain the initial Neural network model. Exemplarily, training the neural network to obtain the ownership value in the neural network may specifically include: building the neural network structure and the ownership value in the neural network through data input and neural network model construction.
在一种可选的实施方式中,所述处理器根据所述第i层的初始权值阈值,对所述第i 层的初始权值进行剪裁时,具体过程可以是,所述处理器将所述第i层中小于所述第i层的初始权值阈值的权值置零,将所述第i层中不小于所述第i层的初始权值阈值的权值保持不变。需要说明的是,经过上述剪裁过程,只是将第i层中的一些权值置零,并不是将相应的分支删除,也就是说神经网络模型中权值置零的分支依然存在,只是权值为零而已。同理后续涉及到的权值置零原理(即剪裁方法)相同,具体的后续不再详细说明。In an optional implementation manner, when the processor trims the initial weight value of the i-th layer according to the initial weight threshold value of the i-th layer, the specific process may be that the processor will The weight in the i-th layer that is smaller than the initial weight threshold of the i-th layer is set to zero, and the weight in the i-th layer that is not smaller than the initial weight threshold of the i-th layer is kept unchanged. It should be noted that after the above-mentioned tailoring process, only some weights in the i-th layer are set to zero, not to delete the corresponding branches, that is to say, the branch with zero weights in the neural network model still exists, but the weights It's just zero. In the same way, the principle of zero-setting of weights (that is, the tailoring method) involved in the subsequent process is the same, and the specific follow-up will not be described in detail.
步骤302、所述处理器对所述裁剪后的神经网络模型进行多次训练,在第t次训练过程中,根据第t-1次训练得到的神经网络模型第i层的权值阈值确定第t次训练时第i层的权值阈值,根据所述第t次训练时第i层的的权值阈值对所述第t次训练时第i层当前的权值进行裁剪;所述t取遍1至q中的任意一个正数,所述q为多次训练的总次数。Step 302: The processor performs multiple trainings on the cropped neural network model. During the t-th training process, the first threshold is determined according to the weight threshold of the i-th layer of the neural network model obtained from the t-1 training The weight threshold of the i-th layer during t times of training, according to the weight threshold of the i-th layer during the t-th training, the current weight of the i-th layer during the t-th training is cropped; Pass any positive number from 1 to q, where q is the total number of times of training.
基于步骤301,在所述处理器将初始神经网络所有层的权值进行裁剪之后,所述处理器执行步骤302。Based on step 301, after the processor trims the weights of all layers of the initial neural network, the processor executes step 302.
在一种可选的实施方式中,所述处理器根据第t-1次训练得到的神经网络模型第i层的权值阈值确定第t次训练时第i层的权值阈值,可以符合以下公式四:In an optional embodiment, the processor determines the weight threshold of the i-th layer during the t-th training according to the weight threshold of the i-th layer of the neural network model obtained from the t-1th training, which may meet the following Formula 4:
Figure PCTCN2018125372-appb-000008
Figure PCTCN2018125372-appb-000008
其中,在上述公式四中,
Figure PCTCN2018125372-appb-000009
为第t次训练时第i层的权值阈值;
Figure PCTCN2018125372-appb-000010
为第t-1次训练时第i层的权值阈值;
Figure PCTCN2018125372-appb-000011
为第t-1次训练得到的神经网络模型第i层中不为零的权值的梯度均值;
Figure PCTCN2018125372-appb-000012
为第t-1次训练得到的神经网络模型第i层中不为零的权值的梯度;N i为第t-1次训练得到的神经网络模型第i层中不为零的权值的个数,N i为大于1的正整数。
Among them, in the above formula 4,
Figure PCTCN2018125372-appb-000009
Is the weight threshold of the i-th layer during the t-th training;
Figure PCTCN2018125372-appb-000010
Is the weight threshold of the i-th layer during the t-1th training;
Figure PCTCN2018125372-appb-000011
Is the gradient mean of the non-zero weights in the i-th layer of the neural network model obtained from the t-1th training;
Figure PCTCN2018125372-appb-000012
Is the gradient of the non-zero weights in the i-th layer of the neural network model obtained from the t-1 training; N i is the non-zero weights in the i-th layer of the neural network model obtained from the t-1 training Number, N i is a positive integer greater than 1.
在一种可选的实施方式中,在每一次训练过程中,所述处理器根据所述第t次训练时第i层的的权值阈值对所述第t次训练时第i层当前的权值进行裁剪时,具体可以符合以下公式五:In an optional embodiment, during each training process, the processor compares the current value of the i-th layer during the t-th training according to the weight threshold of the i-th layer during the t-th training When cutting weights, the specific formula 5 can be met:
Figure PCTCN2018125372-appb-000013
Figure PCTCN2018125372-appb-000013
其中,在上述公式五中,ω t in为第t次训练时第i层的权值。 In the above formula 5, ω t in is the weight of the i-th layer during the t-th training.
通过上述公式五可以明显看出,所述处理器是将所述第i层中权值的绝对值小于相应的权值阈值时置零,否则保持不变,具体原理与步骤301中涉及的对初始神经网络模型中每一层权值进行剪裁的原理类似,具体可以相互参见。It can be clearly seen from the above formula 5 that the processor sets zero when the absolute value of the weight in the i-th layer is less than the corresponding weight threshold, otherwise it remains unchanged. The specific principle is the same as that involved in step 301 The principle of trimming the weights of each layer in the initial neural network model is similar, and you can refer to each other for details.
通过上述对每一层的权值进行裁剪后,在对神经网络模型训练(如前向推理与反向传播更新)时,可以将等于0的权值排除在矩阵运算之外,例如,当某次矩阵运算时,如果输入的权值向量为零向量,就可以直接略过该次运算,以使达到运算加速的效果。After cutting the weights of each layer through the above, when training the neural network model (such as forward reasoning and back propagation update), you can exclude weights equal to 0 from matrix operations, for example, when a certain During the sub-matrix operation, if the input weight vector is a zero vector, the operation can be skipped directly to achieve the effect of speeding up the operation.
可以理解的是,上述训练过程是一个循环过程,每一次训练的权值均是对上一次训练后得到的权值进行处理,并且每一次训练中每一层的权值阈值均是基于上一次训练时该层的权值阈值得到的。其中每一次训练均是对神经网络模型的所有层的权值进行处理之后,再进行下一次训练,直至训练结果满足一定条件后训练结束。示例性的,训练结果满足一定条件可以为权值阈值收敛等等。通过上述训练方法,在神经网络压缩过程中,权值阈值 不再是依赖认为设定,也不需要再反复试探训练,而是可以根据实际剪裁情况自适应性地调整权值阈值,这样可以灵活地实现对神经网络进行压缩,而不受人为因素的影响,可以使最终得到的神经网络模型比较稳定。It can be understood that the above training process is a cyclic process, the weight of each training process is the weight obtained after the last training, and the weight threshold of each layer in each training is based on the last time The weight threshold of this layer is obtained during training. In each training, the weights of all layers of the neural network model are processed, and then the next training is performed until the training results meet certain conditions and the training ends. Exemplarily, when the training result meets certain conditions, the weight threshold may be converged, and so on. Through the above training method, in the neural network compression process, the weight threshold is no longer dependent on the setting and does not need to be repeatedly tried and trained, but the weight threshold can be adaptively adjusted according to the actual tailoring situation, which can be flexible Realize the compression of the neural network without being affected by human factors, which can make the final neural network model more stable.
需要说明的是,在步骤302中,第一次训练是第一次对所述剪裁后的神经网络模型进行训练,当t取1,第t-1次(即第0次)训练得到的神经网络模型第i层的权值阈值即为对初始神经网络模型剪裁过程第i层所用的初始权值阈值。也就是说第一次训练时第i层的权值阈值是基于第i层对应的初始权值阈值得到的。It should be noted that in step 302, the first training is the first training of the cropped neural network model. When t is 1, the nerve obtained by training at the t-1th time (that is, the 0th time) The weight threshold of the i-th layer of the network model is the initial weight threshold used for the i-th layer of the initial neural network model during the cutting process. That is to say, the weight threshold of the i-th layer during the first training is obtained based on the initial weight threshold corresponding to the i-th layer.
采用本申请实施例提供的神经网络压缩方法,处理器可以自适应地调整每次裁剪的权重阈值,也就是说可以灵活进行神经网络压缩,可以避免人为设定权值阈值带来的人为因素的影响,从而可以增强神经网络模型性能稳定性。Using the neural network compression method provided by the embodiment of the present application, the processor can adaptively adjust the weight threshold of each crop, that is, it can flexibly perform neural network compression, which can avoid the artificial factors caused by artificially setting the weight threshold. Influence, which can enhance the performance stability of neural network models.
基于以上实施例,本申请实施例还提供了一种神经网络压缩方法的示例,适用于图2所示的计算机装置及图1所示的神经网络。参阅图4所示该示例的具体流程可以包括如下步骤:Based on the above embodiments, the embodiments of the present application also provide an example of a neural network compression method, which is applicable to the computer device shown in FIG. 2 and the neural network shown in FIG. 1. The specific process of the example shown in FIG. 4 may include the following steps:
步骤401:处理器从初始神经网络模型中获取一层的初始权值。Step 401: The processor obtains a layer of initial weights from the initial neural network model.
步骤402:所述处理器根据获取的该层的初始权值确定该层的初始权值的均值和方差。Step 402: The processor determines the mean and variance of the initial weight of the layer according to the obtained initial weight of the layer.
步骤403:所述处理器根据该层的初始权值的均值和方差,确定该层的初始权值阈值。Step 403: The processor determines the initial weight threshold of the layer according to the mean and variance of the initial weight of the layer.
步骤404:所述处理器根据该层的初始权值阈值对该层的初始权值进行裁剪。Step 404: The processor trims the initial weight of the layer according to the initial weight threshold of the layer.
步骤405:所述处理器将该层的初始权值更新成该层裁剪后的权值(也即所述处理器将裁剪后的权值写回神经网络模型)。Step 405: The processor updates the initial weight value of the layer to the cropped weight value of the layer (that is, the processor writes the cropped weight value back to the neural network model).
步骤406:所述处理器判断当前是否已对所述初始神经网络模型的所有层的权值进行了裁剪,若是,则执行步骤407进入神经网络模型训练过程,否则执行步骤401。Step 406: The processor determines whether the weights of all layers of the initial neural network model have been clipped. If yes, step 407 is executed to enter the neural network model training process, otherwise step 401 is executed.
步骤407:所述处理器获取当前神经网络模型的一层的权值对应的梯度以及上一次训练时该层的权值阈值。Step 407: The processor obtains the gradient corresponding to the weight of a layer of the current neural network model and the weight threshold of the layer during the last training.
步骤408:所述处理器根据该层的权值对应的梯度和上一次训练时该层的权值阈值,确定当前次训练时该层的权值阈值。Step 408: The processor determines the weight threshold of the layer during the current training according to the gradient corresponding to the weight of the layer and the weight threshold of the layer during the previous training.
需要说明的是,第一次训练时该层的权值阈值是基于对初始神经网络模型该层的权值进行剪裁时该层的初始阈值确定的。It should be noted that the weight threshold of the layer during the first training is determined based on the initial threshold of the layer when the weight of the layer of the initial neural network model is trimmed.
步骤409:所述处理器根据确定的该层的权值阈值剪裁该层当前的权值。Step 409: The processor trims the current weight of the layer according to the determined weight threshold of the layer.
步骤410:所述处理器更新神经网络模型中该层的权值。(即所述处理器将裁剪后的该层的权值写回神经网络模型)。Step 410: The processor updates the weight of the layer in the neural network model. (That is, the processor writes back the weighted value of the layer after cropping back to the neural network model).
步骤411:所述处理器判断当次训练是否处理完所有层的权值,若是则进入步骤412,否则重复执行步骤407。Step 411: The processor judges whether the weights of all layers have been processed in the current training. If yes, step 412 is entered; otherwise, step 407 is repeated.
步骤412:所述处理器完成一次神经网络模型训练。Step 412: The processor completes a neural network model training.
步骤413:所述处理器判断神经网络模型训练是否结束,若是则结束,否则执行步骤407。Step 413: The processor judges whether the training of the neural network model is finished, if it is, then it is finished, otherwise step 407 is executed.
基于上述示例,处理器可以自适应地调整每次裁剪的权重阈值,也就是说可以灵活进行神经网络压缩,可以避免人为设定权值阈值带来的人为因素的影响,从而可以增强神经网络模型性能稳定性。Based on the above example, the processor can adaptively adjust the weight threshold of each crop, that is to say, it can flexibly compress the neural network, which can avoid the influence of human factors caused by artificially setting the weight threshold, so that the neural network model can be enhanced Performance stability.
基于上述实施例,本申请实施例还提供了一种神经网络压缩装置,用于实现图3和图4所示的实施例提供神经网络压缩方法。参阅图5所示,所述神经网络压缩装置500中包括:权值剪裁单元501和训练单元502,其中:Based on the above embodiment, the embodiment of the present application further provides a neural network compression device, which is used to implement the embodiment shown in FIGS. 3 and 4 to provide a neural network compression method. Referring to FIG. 5, the neural network compression device 500 includes a weight trimming unit 501 and a training unit 502, where:
所述权值裁剪单元501用于根据初始神经网络模型的第i层的初始权值阈值,对所述第i层的初始权值进行裁剪,得到裁剪后的神经网络模型,所述i取遍1至m中的任意一个正整数,所述m为所述神经网络模型的总层数;The weight cropping unit 501 is used to crop the initial weight of the i-th layer according to the initial weight threshold of the i-th layer of the initial neural network model to obtain a cropped neural network model. Any positive integer from 1 to m, where m is the total number of layers of the neural network model;
所述训练单元502用于对所述裁剪后的神经网络模型进行多次训练;The training unit 502 is used to perform multiple trainings on the cropped neural network model;
所述权值裁剪单元501还用于在所述训练单元502进行第t次训练过程中,根据所述训练单元502第t-1次训练得到的神经网络模型第i层的权值阈值确定第t次训练时第i层的权值阈值,根据所述第t次训练时第i层的权值阈值对所述第t次训练时第i层当前的权值进行裁剪;所述t取遍1至q中的任意一个正数,所述q为多次训练的总次数。The weight cropping unit 501 is further used to determine the first weight threshold of the i-th layer of the neural network model obtained by the training unit 502 during the t-th training process during the t-th training process. The weight threshold of the i-th layer during t times of training, according to the weight threshold of the i-th layer during the t-th training, the current weight of the i-th layer during the t-th training is cropped; Any positive number from 1 to q, where q is the total number of trainings.
在一种可选的实施方式中,所述神经网络压缩装置还可以包括如图5中所示的权值获取单元503和阈值确定单元504,具体的:In an optional embodiment, the neural network compression device may further include a weight acquisition unit 503 and a threshold determination unit 504 as shown in FIG. 5, specifically:
所述权值获取单元503用于在所述权值裁剪单元501根据初始神经网络模型的第i层的初始权值阈值对所述第i层的初始权值进行裁剪之前,获取所述初始神经网络模型的所述第i层的初始权值;所述阈值确定单元504用于根据所述第i层的初始权值确定所述第i层的初始权值的均值和标准差;根据所述第i层的初始权值的均值和标准差,确定所述第i层的初始权值阈值。The weight obtaining unit 503 is used to obtain the initial nerve before the weight cutting unit 501 crops the initial weight of the i-th layer according to the initial weight threshold of the i-th layer of the initial neural network model The initial weight of the i-th layer of the network model; the threshold determination unit 504 is used to determine the mean and standard deviation of the initial weight of the i-th layer according to the initial weight of the i-th layer; The mean and standard deviation of the initial weight of the i-th layer determine the initial weight threshold of the i-th layer.
其中示例性的,所述权值获取单元503和所述阈值确定单元504的功能也可以由所述权值裁剪单元501直接实现,本申请对此不作限定。Exemplarily, the functions of the weight acquisition unit 503 and the threshold determination unit 504 may also be directly implemented by the weight clipping unit 501, which is not limited in this application.
在一种可选的实施方式中,所述第i层的初始权值阈值可以符合以下公式:In an optional implementation manner, the initial weight threshold of the i-th layer may conform to the following formula:
T i=μ i-λ·σ i T i = μ i -λ·σ i
其中,T i为所述初始神经网络模型第i层的初始权值阈值;μ i为第i层的初始权值的均值;σ i为第i层的初始权值的标准差;λ为设定值,λ≥0。 Where T i is the initial weight threshold of the i-th layer of the initial neural network model; μ i is the mean of the initial weight of the i-th layer; σ i is the standard deviation of the initial weight of the i-th layer; λ is the set Fixed value, λ≥0.
在一种可选的实施方式中,所述权值裁剪单元501在根据第t-1次训练得到的神经网络模型第i层的权值阈值确定第t次训练时第i层的权值阈值时,可以符合以下公式:In an optional embodiment, the weight clipping unit 501 determines the weight threshold of the i-th layer at the t-th training based on the weight threshold of the i-th layer of the neural network model obtained at the t-1th training Can meet the following formula:
Figure PCTCN2018125372-appb-000014
Figure PCTCN2018125372-appb-000014
其中,
Figure PCTCN2018125372-appb-000015
为第t次训练时第i层的权值阈值;
Figure PCTCN2018125372-appb-000016
为第t-1次训练时第i层的权值阈值;
Figure PCTCN2018125372-appb-000017
为第t-1次训练得到的神经网络模型第i层中不为零的权值的梯度均值;
Figure PCTCN2018125372-appb-000018
为第t-1次训练得到的神经网络模型第i层中不为零的权值的梯度;N i为第t-1次训练得到的神经网络模型第i层中不为零的权值的个数,N i为大于1的正整数。
among them,
Figure PCTCN2018125372-appb-000015
Is the weight threshold of the i-th layer during the t-th training;
Figure PCTCN2018125372-appb-000016
Is the weight threshold of the i-th layer during the t-1th training;
Figure PCTCN2018125372-appb-000017
Is the gradient mean of the non-zero weights in the i-th layer of the neural network model obtained from the t-1th training;
Figure PCTCN2018125372-appb-000018
Is the gradient of the non-zero weights in the i-th layer of the neural network model obtained from the t-1 training; N i is the non-zero weights in the i-th layer of the neural network model obtained from the t-1 training Number, N i is a positive integer greater than 1.
在一种可选的实施方式中,所述权值裁剪单元501在根据任一层的权值阈值对所述任一层的权值进行裁剪时具体用于:将所述任一层的权值中小于所述任一层的权值阈值的权值置零,以及将所述任一层的权值中大于或者等于所述任一层的权值阈值的权值保持不变。In an optional implementation manner, the weight clipping unit 501 is specifically used to: cut the weight of any layer when cutting the weight of the any layer according to the weight threshold of any layer The weight value less than the weight threshold value of any layer is set to zero, and the weight value of the weight value of any layer that is greater than or equal to the weight threshold value of the any layer is kept unchanged.
采用本申请实施例提供的神经网络压缩装置,可以自适应地调整每次裁剪的权重阈值,也就是说可以灵活进行神经网络压缩,可以避免人为设定权值阈值带来的人为因素的 影响,从而可以增强神经网络模型性能稳定性。The neural network compression device provided by the embodiment of the present application can adaptively adjust the weight threshold of each crop, that is, the neural network compression can be flexibly performed, and the influence of human factors caused by artificially setting the weight threshold can be avoided. Therefore, the performance stability of the neural network model can be enhanced.
要说明的是,本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。在本申请的实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。It should be noted that the division of the units in the embodiments of the present application is schematic, and is only a division of logical functions, and there may be another division manner in actual implementation. The functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or software function unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application essentially or part of the contribution to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods described in the embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .
基于以上实施例,本申请实施例还提供了一种神经网络压缩装置,所述神经网络压缩装置,用于实现图3或图4所示的神经网络压缩方法。参阅图6所示,所述神经网络压缩装置600可以包括:处理器601和存储器602,其中:Based on the above embodiments, an embodiment of the present application further provides a neural network compression device, which is used to implement the neural network compression method shown in FIG. 3 or FIG. 4. Referring to FIG. 6, the neural network compression device 600 may include: a processor 601 and a memory 602, where:
所述处理器601可以是CPU,GPU或者CPU和GPU的组合。所述处理器601还可以是NPU、TPU等等支持神经网络处理的AI芯片。所述处理器601还可以进一步包括硬件芯片。上述硬件芯片可以是ASIC,PLD,DSP或其组合。上述PLD可以是CPLD,FPGA,GAL或其任意组合。需要说明的是,所述处理器601不限于上述列举的情况,所述处理器601可以是能够实现神经网络运算的任何处理器件。The processor 601 may be a CPU, GPU, or a combination of CPU and GPU. The processor 601 may also be an NPU, TPU, etc. AI chip that supports neural network processing. The processor 601 may further include a hardware chip. The above hardware chip may be ASIC, PLD, DSP or a combination thereof. The above PLD may be CPLD, FPGA, GAL or any combination thereof. It should be noted that the processor 601 is not limited to the above-mentioned cases, and the processor 601 may be any processing device capable of implementing neural network operations.
所述处理器601以及所述存储器602之间相互连接。可选的,所述处理器601以及所述存储器602通过总线603相互连接;所述总线603可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图6中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The processor 601 and the memory 602 are connected to each other. Optionally, the processor 601 and the memory 602 are connected to each other through a bus 603; the bus 603 may be a Peripheral Component Interconnect (PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture) , EISA) bus and so on. The bus can be divided into an address bus, a data bus, and a control bus. For ease of representation, only a thick line is used in FIG. 6, but it does not mean that there is only one bus or one type of bus.
一种实现方式中,所述处理器601用于实现本申请图3所示的实施例提供的神经网络压缩方法时,具体可以执行图3所示的以上实施例中的步骤301、步骤302中的操作,以及可以执行步骤301和302中的其他操作,涉及的具体描述可以相互参照,此处不再赘述。In an implementation manner, when the processor 601 is used to implement the neural network compression method provided by the embodiment shown in FIG. 3 of the present application, it may specifically perform steps 301 and 302 in the above embodiments shown in FIG. 3 Operation, and other operations in steps 301 and 302 can be performed, the specific descriptions involved can refer to each other, and will not be repeated here.
另一种实现方式中,所述处理器601用于实现本申请图4所示的实施例提供的神经网络压缩方法时,具体可以执行图4所示的以上实施例中的步骤401-步骤413中的操作,具体可以相互参照,此处不再赘述。In another implementation manner, when the processor 601 is used to implement the neural network compression method provided by the embodiment shown in FIG. 4 of the present application, it may specifically execute steps 401 to 413 in the above embodiments shown in FIG. 4 The operations in can be referred to each other, and will not be repeated here.
所述存储器602,用于存放程序和数据等。具体地,程序可以包括程序代码,该程序代码包括计算机操作的指令。存储器602可能包含随机存取存储器(random access memory,RAM),也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。处理器601执行存储器602所存放的程序,实现上述功能,从而实现如图3或图4所示的神经网络压缩方法。The memory 602 is used to store programs and data. Specifically, the program may include program code, and the program code includes instructions for computer operation. The memory 602 may include random access memory (random access memory, RAM), or may also include non-volatile memory (non-volatile memory), for example, at least one magnetic disk memory. The processor 601 executes the program stored in the memory 602 to realize the above functions, thereby implementing the neural network compression method shown in FIG. 3 or FIG. 4.
需要说明的是,当图6所示的神经网络压缩装置可以应用于计算机装置时,所述神经 网络压缩装置可以体现为图2所示的计算机装置。此时,所述处理器601可以与图2中示出的处理器210相同,所述存储器602可以与图2中示出的存储器220相同。It should be noted that, when the neural network compression device shown in FIG. 6 can be applied to a computer device, the neural network compression device may be embodied as the computer device shown in FIG. 2. At this time, the processor 601 may be the same as the processor 210 shown in FIG. 2, and the memory 602 may be the same as the memory 220 shown in FIG. 2.
综上所述,采用本申请实施例提供的神经网络压缩方法及装置,可以自适应地调整每次裁剪的权重阈值,也就是说可以灵活进行神经网络压缩,可以避免人为设定权值阈值带来的人为因素的影响,从而可以增强神经网络模型性能稳定性。In summary, the neural network compression method and device provided by the embodiments of the present application can adaptively adjust the weight threshold of each crop, that is to say, the neural network compression can be flexibly performed, which can avoid artificially setting the weight threshold band The influence of human factors can enhance the performance stability of the neural network model.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present application may take the form of a computer program product implemented on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。This application is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the application. It should be understood that each flow and/or block in the flowchart and/or block diagram and a combination of the flow and/or block in the flowchart and/or block diagram may be implemented by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, special-purpose computer, embedded processing machine, or other programmable data processing device to produce a machine that enables the generation of instructions executed by the processor of the computer or other programmable data processing device A device for realizing the functions specified in one block or multiple blocks of one flow or multiple flows of a flowchart and/or one block or multiple blocks of a block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer readable memory that can guide a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer readable memory produce an article of manufacture including an instruction device, the instructions The device implements the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to produce computer-implemented processing, which is executed on the computer or other programmable device The instructions provide steps for implementing the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
尽管已描述了本申请的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。Although the preferred embodiments of the present application have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic inventive concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the present application.
显然,本领域的技术人员可以对本申请实施例进行各种改动和变型而不脱离本申请实施例的范围。这样,倘若本申请实施例的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various modifications and variations to the embodiments of the present application without departing from the scope of the embodiments of the present application. In this way, if these modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalent technologies, the present application is also intended to include these modifications and variations.

Claims (18)

  1. 一种神经网络压缩方法,其特征在于,包括:A neural network compression method, characterized in that it includes:
    根据初始神经网络模型的第i层的初始权值阈值,对所述第i层的初始权值进行裁剪,得到裁剪后的神经网络模型,所述i取遍1至m中的任意一个正整数,所述m为所述神经网络模型的总层数;According to the initial weight threshold of the i-th layer of the initial neural network model, the initial weights of the i-th layer are clipped to obtain a clipped neural network model, and i is taken through any positive integer from 1 to m , M is the total number of layers of the neural network model;
    对所述裁剪后的神经网络模型进行多次训练,在第t次训练过程中,根据第t-1次训练得到的神经网络模型第i层的权值阈值确定第t次训练时第i层的权值阈值,根据所述第t次训练时第i层的权值阈值对所述第t次训练时第i层当前的权值进行裁剪;所述t取遍1至q中的任意一个正数,所述q为多次训练的总次数。Perform multiple trainings on the cropped neural network model. During the t-th training process, determine the i-th layer during the t-th training based on the weight threshold of the i-th layer of the neural network model obtained from the t-1th training The weight threshold of, the current weight of the i-th layer during the t-th training is clipped according to the weight threshold of the i-th layer during the t-th training; the t takes any one of 1 to q Positive number, the q is the total number of times of training.
  2. 如权利要求1所述的方法,其特征在于,根据初始神经网络模型的第i层的初始权值阈值对所述第i层的初始权值进行裁剪之前,所述方法还包括:The method according to claim 1, wherein before the initial weight of the i-th layer is clipped according to the initial weight threshold of the i-th layer of the initial neural network model, the method further comprises:
    获取所述初始神经网络模型的所述第i层的初始权值;Acquiring the initial weight of the i-th layer of the initial neural network model;
    根据所述第i层的初始权值确定所述第i层的初始权值的均值和标准差;Determining the mean and standard deviation of the initial weight of the i-th layer according to the initial weight of the i-th layer;
    根据所述第i层的初始权值的均值和标准差,确定所述第i层的初始权值阈值。The initial weight threshold of the i-th layer is determined according to the mean and standard deviation of the initial weight of the i-th layer.
  3. 如权利要求1或2所述的方法,其特征在于,所述第i层的初始权值阈值符合以下公式:The method according to claim 1 or 2, wherein the initial weight threshold of the i-th layer conforms to the following formula:
    T i=μ i-λ·σ i T i = μ i -λ·σ i
    其中,T i为所述初始神经网络模型第i层的初始权值阈值;μ i为第i层的初始权值的均值;σ i为第i层的初始权值的标准差;λ为设定值,λ≥0。 Where T i is the initial weight threshold of the i-th layer of the initial neural network model; μ i is the mean of the initial weight of the i-th layer; σ i is the standard deviation of the initial weight of the i-th layer; λ is the set Fixed value, λ≥0.
  4. 如权利要求1-3任一项所述的方法,其特征在于,根据第t-1次训练得到的神经网络模型第i层的权值阈值确定第t次训练时第i层的权值阈值,符合以下公式:The method according to any one of claims 1 to 3, wherein the weight threshold of the i-th layer during the t-th training is determined according to the weight threshold of the i-th layer of the neural network model obtained by the t-1th training , In line with the following formula:
    Figure PCTCN2018125372-appb-100001
    Figure PCTCN2018125372-appb-100001
    其中,
    Figure PCTCN2018125372-appb-100002
    为第t次训练时第i层的权值阈值;
    Figure PCTCN2018125372-appb-100003
    为第t-1次训练时第i层的权值阈值;
    Figure PCTCN2018125372-appb-100004
    为第t-1次训练得到的神经网络模型第i层中不为零的权值的梯度均值;
    Figure PCTCN2018125372-appb-100005
    为第t-1次训练得到的神经网络模型第i层中不为零的权值的梯度;N i为第t-1次训练得到的神经网络模型第i层中不为零的权值的个数,N i为大于1的正整数。
    among them,
    Figure PCTCN2018125372-appb-100002
    Is the weight threshold of the i-th layer during the t-th training;
    Figure PCTCN2018125372-appb-100003
    Is the weight threshold of the i-th layer during the t-1th training;
    Figure PCTCN2018125372-appb-100004
    Is the gradient mean of the non-zero weights in the i-th layer of the neural network model obtained from the t-1th training;
    Figure PCTCN2018125372-appb-100005
    Is the gradient of the non-zero weights in the i-th layer of the neural network model obtained from the t-1 training; N i is the non-zero weights in the i-th layer of the neural network model obtained from the t-1 training Number, N i is a positive integer greater than 1.
  5. 如权利要求1-4任一项所述的方法,其特征在于,根据任一层的权值阈值对所述任一层的权值进行裁剪,包括:The method according to any one of claims 1 to 4, wherein the weighting of any layer according to the weighting threshold of any layer includes:
    将所述任一层的权值中小于所述任一层的权值阈值的权值置零,以及将所述任一层的权值中大于或者等于所述任一层的权值阈值的权值保持不变。Set the weights of the weights of any layer less than the weight threshold of the any layer to zero, and set the weights of any layer greater than or equal to the weight threshold of the any layer The weights remain unchanged.
  6. 一种神经网络压缩装置,其特征在于,包括:A neural network compression device, characterized in that it includes:
    权值裁剪单元,用于根据初始神经网络模型的第i层的初始权值阈值,对所述第i层的初始权值进行裁剪,得到裁剪后的神经网络模型,所述i取遍1至m中的任意一个正整数,所述m为所述神经网络模型的总层数;The weight clipping unit is used to crop the initial weight of the i-th layer according to the initial weight threshold of the i-th layer of the initial neural network model to obtain a cropped neural network model, where i is taken from 1 to any positive integer in m, where m is the total number of layers of the neural network model;
    训练单元,用于对所述裁剪后的神经网络模型进行多次训练;A training unit, configured to perform multiple trainings on the cropped neural network model;
    所述权值裁剪单元,还用于在所述训练单元进行第t次训练过程中,根据所述训练单元第t-1次训练得到的神经网络模型第i层的权值阈值确定第t次训练时第i层的权值阈值,根据所述第t次训练时第i层的权值阈值对所述第t次训练时第i层当前的权值进行裁剪;所述t取遍1至q中的任意一个正数,所述q为多次训练的总次数。The weight clipping unit is also used to determine the t-th time according to the weight threshold of the i-th layer of the neural network model obtained by the t-1th training of the training unit during the t-th training process of the training unit The weight threshold of the i-th layer during training, according to the weight threshold of the i-th layer during the t-th training, the current weight value of the i-th layer during the t-th training is clipped; Any positive number in q, where q is the total number of times of training.
  7. 如权利要求6所述的装置,其特征在于,还包括:The apparatus of claim 6, further comprising:
    权值获取单元,用于在所述权值裁剪单元根据初始神经网络模型的第i层的初始权值阈值对所述第i层的初始权值进行裁剪之前,获取所述初始神经网络模型的所述第i层的初始权值;The weight acquisition unit is used to acquire the initial neural network model's initial weight before the weight clipping unit cuts the initial weight of the i-th layer according to the initial weight threshold of the i-th layer of the initial neural network model The initial weight of the i-th layer;
    阈值确定单元,用于根据所述第i层的初始权值确定所述第i层的初始权值的均值和标准差;根据所述第i层的初始权值的均值和标准差,确定所述第i层的初始权值阈值。The threshold determining unit is used to determine the mean and standard deviation of the initial weight of the i-th layer according to the initial weight of the i-th layer; determine the value of the mean and standard deviation of the initial weight of the i-th layer The initial weight threshold of the i-th layer.
  8. 如权利要求6或7所述的装置,其特征在于,所述第i层的初始权值阈值符合以下公式:The device according to claim 6 or 7, wherein the initial weight threshold of the i-th layer conforms to the following formula:
    T i=μ i-λ·σ i T i = μ i -λ·σ i
    其中,T i为所述初始神经网络模型第i层的初始权值阈值;μ i为第i层的初始权值的均值;σ i为第i层的初始权值的标准差;λ为设定值,λ≥0。 Where T i is the initial weight threshold of the i-th layer of the initial neural network model; μ i is the mean of the initial weight of the i-th layer; σ i is the standard deviation of the initial weight of the i-th layer; λ is the set Fixed value, λ≥0.
  9. 如权利要求6-8任一项所述的装置,其特征在于,所述权值裁剪单元根据第t-1次训练得到的神经网络模型第i层的权值阈值确定第t次训练时第i层的权值阈值,符合以下公式:The device according to any one of claims 6-8, wherein the weight cutting unit determines the t-th training time based on the weight threshold of the i-th layer of the neural network model obtained from the t-1th training The weight threshold of the i-layer conforms to the following formula:
    Figure PCTCN2018125372-appb-100006
    Figure PCTCN2018125372-appb-100006
    其中,
    Figure PCTCN2018125372-appb-100007
    为第t次训练时第i层的权值阈值;
    Figure PCTCN2018125372-appb-100008
    为第t-1次训练时第i层的权值阈值;
    Figure PCTCN2018125372-appb-100009
    为第t-1次训练得到的神经网络模型第i层中不为零的权值的梯度均值;
    Figure PCTCN2018125372-appb-100010
    为第t-1次训练得到的神经网络模型第i层中不为零的权值的梯度;N i为第t-1次训练得到的神经网络模型第i层中不为零的权值的个数,N i为大于1的正整数。
    among them,
    Figure PCTCN2018125372-appb-100007
    Is the weight threshold of the i-th layer during the t-th training;
    Figure PCTCN2018125372-appb-100008
    Is the weight threshold of the i-th layer during the t-1th training;
    Figure PCTCN2018125372-appb-100009
    Is the gradient mean of the non-zero weights in the i-th layer of the neural network model obtained from the t-1th training;
    Figure PCTCN2018125372-appb-100010
    Is the gradient of the non-zero weights in the i-th layer of the neural network model obtained from the t-1 training; N i is the non-zero weights in the i-th layer of the neural network model obtained from the t-1 training Number, N i is a positive integer greater than 1.
  10. 如权利要求6-9任一项所述的装置,其特征在于,所述权值裁剪单元,在根据任一层的权值阈值对所述任一层的权值进行裁剪时,具体用于:The device according to any one of claims 6-9, wherein the weight clipping unit is specifically used when clipping the weight of any layer according to the weight threshold of any layer :
    将所述任一层的权值中小于所述任一层的权值阈值的权值置零,以及将所述任一层的权值中大于或者等于所述任一层的权值阈值的权值保持不变。Set the weights of the weights of any layer less than the weight threshold of the any layer to zero, and set the weights of any layer greater than or equal to the weight threshold of the any layer The weights remain unchanged.
  11. 一种神经网络压缩装置,其特征在于,包括:A neural network compression device, characterized in that it includes:
    存储器,用于存储程序指令;Memory, used to store program instructions;
    处理器,用于与所述存储器耦合,调用所述存储器中的程序指令,执行以下操作:A processor, configured to couple with the memory, call program instructions in the memory, and perform the following operations:
    根据初始神经网络模型的第i层的初始权值阈值,对所述第i层的初始权值进行裁剪,得到裁剪后的神经网络模型,所述i取遍1至m中的任意一个正整数,所述m为所述神经网络模型的总层数;According to the initial weight threshold of the i-th layer of the initial neural network model, the initial weights of the i-th layer are clipped to obtain a clipped neural network model, and i is taken through any positive integer from 1 to m , M is the total number of layers of the neural network model;
    对所述裁剪后的神经网络模型进行多次训练,在第t次训练过程中,根据第t-1次训练得到的神经网络模型第i层的权值阈值确定第t次训练时第i层的权值阈值,根据所述第t次训练时第i层的权值阈值对所述第t次训练时第i层当前的权值进行裁剪;所述t取遍1 至q中的任意一个正数,所述q为多次训练的总次数。Perform multiple trainings on the cropped neural network model. During the t-th training process, determine the i-th layer during the t-th training based on the weight threshold of the i-th layer of the neural network model obtained from the t-1th training The weight threshold of, the current weight of the i-th layer during the t-th training is clipped according to the weight threshold of the i-th layer during the t-th training; the t takes any one of 1 to q Positive number, the q is the total number of times of training.
  12. 如权利要求11所述的装置,其特征在于,所述处理器,还用于:The apparatus according to claim 11, wherein the processor is further configured to:
    根据初始神经网络模型的第i层的初始权值阈值对所述第i层的初始权值进行裁剪之前,获取所述初始神经网络模型的所述第i层的初始权值;Acquiring the initial weight of the i-th layer of the initial neural network model before trimming the initial weight of the i-th layer according to the initial weight threshold of the i-th layer of the initial neural network model;
    根据所述第i层的初始权值确定所述第i层的初始权值的均值和标准差;Determining the mean and standard deviation of the initial weight of the i-th layer according to the initial weight of the i-th layer;
    根据所述第i层的初始权值的均值和标准差,确定所述第i层的初始权值阈值。The initial weight threshold of the i-th layer is determined according to the mean and standard deviation of the initial weight of the i-th layer.
  13. 如权利要求11或12所述的装置,其特征在于,所述第i层的初始权值阈值符合以下公式:The apparatus according to claim 11 or 12, wherein the initial weight threshold of the i-th layer conforms to the following formula:
    T i=μ i-λ·σ i T i = μ i -λ·σ i
    其中,T i为所述初始神经网络模型第i层的初始权值阈值;μ i为第i层的初始权值的均值;σ i为第i层的初始权值的标准差;λ为设定值,λ≥0。 Where T i is the initial weight threshold of the i-th layer of the initial neural network model; μ i is the mean of the initial weight of the i-th layer; σ i is the standard deviation of the initial weight of the i-th layer; λ is the set Fixed value, λ≥0.
  14. 如权利要求11-13任一项所述的装置,其特征在于,所述处理器根据第t-1次训练得到的神经网络模型第i层的权值阈值确定第t次训练时第i层的权值阈值,符合以下公式:The device according to any one of claims 11-13, wherein the processor determines the i-th layer at the t-th training according to the weight threshold of the i-th layer of the neural network model obtained at the t-1th training The weight threshold of, meets the following formula:
    Figure PCTCN2018125372-appb-100011
    Figure PCTCN2018125372-appb-100011
    其中,
    Figure PCTCN2018125372-appb-100012
    为第t次训练时第i层的权值阈值;
    Figure PCTCN2018125372-appb-100013
    为第t-1次训练时第i层的权值阈值;
    Figure PCTCN2018125372-appb-100014
    为第t-1次训练得到的神经网络模型第i层中不为零的权值的梯度均值;
    Figure PCTCN2018125372-appb-100015
    为第t-1次训练得到的神经网络模型第i层中不为零的权值的梯度;N i为第t-1次训练得到的神经网络模型第i层中不为零的权值的个数,N i为大于1的正整数。
    among them,
    Figure PCTCN2018125372-appb-100012
    Is the weight threshold of the i-th layer during the t-th training;
    Figure PCTCN2018125372-appb-100013
    Is the weight threshold of the i-th layer during the t-1th training;
    Figure PCTCN2018125372-appb-100014
    Is the gradient mean of the non-zero weights in the i-th layer of the neural network model obtained from the t-1th training;
    Figure PCTCN2018125372-appb-100015
    Is the gradient of the non-zero weights in the i-th layer of the neural network model obtained from the t-1 training; N i is the non-zero weights in the i-th layer of the neural network model obtained from the t-1 training Number, N i is a positive integer greater than 1.
  15. 如权利要求11-14任一项所述的装置,其特征在于,所述处理器,在根据任一层的权值阈值对所述任一层的权值进行裁剪时,具体用于:The apparatus according to any one of claims 11-14, wherein the processor, when tailoring the weight of any layer according to the weight threshold of any layer, is specifically used to:
    将所述任一层的权值中小于所述任一层的权值阈值的权值置零,以及将所述任一层的权值中大于或者等于所述任一层的权值阈值的权值保持不变。Set the weights of the weights of any layer less than the weight threshold of the any layer to zero, and set the weights of any layer greater than or equal to the weight threshold of the any layer The weights remain unchanged.
  16. 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得计算机执行如权利要求1-5任一项所述的方法。A computer program product containing instructions, characterized in that, when the computer program product runs on a computer, it causes the computer to execute the method according to any one of claims 1-5.
  17. 一种计算机存储介质,其特征在于,所述计算机存储介质中存储有计算机程序,所述计算机程序被计算机执行时,使得所述计算机执行如权利要求1-5任一项所述的方法。A computer storage medium, characterized in that a computer program is stored in the computer storage medium, and when the computer program is executed by a computer, the computer is caused to execute the method according to any one of claims 1-5.
  18. 一种芯片,其特征在于,所述芯片与存储器耦合,用于读取并执行所述存储器中存储的程序指令,以实现如权利要求1-5任一项所述的方法。A chip, characterized in that the chip is coupled to a memory, and is used to read and execute program instructions stored in the memory to implement the method according to any one of claims 1-5.
PCT/CN2018/125372 2018-12-29 2018-12-29 Neural network compression method and apparatus WO2020133364A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/125372 WO2020133364A1 (en) 2018-12-29 2018-12-29 Neural network compression method and apparatus
CN201880099986.9A CN113168565A (en) 2018-12-29 2018-12-29 Neural network compression method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/125372 WO2020133364A1 (en) 2018-12-29 2018-12-29 Neural network compression method and apparatus

Publications (1)

Publication Number Publication Date
WO2020133364A1 true WO2020133364A1 (en) 2020-07-02

Family

ID=71126278

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/125372 WO2020133364A1 (en) 2018-12-29 2018-12-29 Neural network compression method and apparatus

Country Status (2)

Country Link
CN (1) CN113168565A (en)
WO (1) WO2020133364A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040199482A1 (en) * 2002-04-15 2004-10-07 Wilson Scott B. Systems and methods for automatic and incremental learning of patient states from biomedical signals
CN105184362A (en) * 2015-08-21 2015-12-23 中国科学院自动化研究所 Depth convolution neural network acceleration and compression method based on parameter quantification
CN107634943A (en) * 2017-09-08 2018-01-26 中国地质大学(武汉) A kind of weights brief wireless sense network data compression method, equipment and storage device
CN108038546A (en) * 2017-12-29 2018-05-15 百度在线网络技术(北京)有限公司 Method and apparatus for compressing neutral net
CN108229681A (en) * 2017-12-28 2018-06-29 郑州云海信息技术有限公司 A kind of neural network model compression method, system, device and readable storage medium storing program for executing

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180107926A1 (en) * 2016-10-19 2018-04-19 Samsung Electronics Co., Ltd. Method and apparatus for neural network quantization
CN108229644A (en) * 2016-12-15 2018-06-29 上海寒武纪信息科技有限公司 The device of compression/de-compression neural network model, device and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040199482A1 (en) * 2002-04-15 2004-10-07 Wilson Scott B. Systems and methods for automatic and incremental learning of patient states from biomedical signals
CN105184362A (en) * 2015-08-21 2015-12-23 中国科学院自动化研究所 Depth convolution neural network acceleration and compression method based on parameter quantification
CN107634943A (en) * 2017-09-08 2018-01-26 中国地质大学(武汉) A kind of weights brief wireless sense network data compression method, equipment and storage device
CN108229681A (en) * 2017-12-28 2018-06-29 郑州云海信息技术有限公司 A kind of neural network model compression method, system, device and readable storage medium storing program for executing
CN108038546A (en) * 2017-12-29 2018-05-15 百度在线网络技术(北京)有限公司 Method and apparatus for compressing neutral net

Also Published As

Publication number Publication date
CN113168565A (en) 2021-07-23

Similar Documents

Publication Publication Date Title
CN110880036B (en) Neural network compression method, device, computer equipment and storage medium
US11741339B2 (en) Deep neural network-based method and device for quantifying activation amount
CN110458294B (en) Model operation method, device, terminal and storage medium
CN113469340A (en) Model processing method, federal learning method and related equipment
US11351458B2 (en) Method for controlling target object, apparatus, device, and storage medium
JP2019106181A (en) Method of pruning neural network and its weight
EP3889846A1 (en) Deep learning model training method and system
WO2022028323A1 (en) Classification model training method, hyper-parameter searching method, and device
CN112215353B (en) Channel pruning method based on variational structure optimization network
WO2023098544A1 (en) Structured pruning method and apparatus based on local sparsity constraints
US20210383190A1 (en) Electronic device and control method therefor
CN113168554B (en) Neural network compression method and device
US20230289567A1 (en) Data Processing Method, System and Device, and Readable Storage Medium
CN115860100A (en) Neural network model training method and device and computing equipment
CN116188878A (en) Image classification method, device and storage medium based on neural network structure fine adjustment
CN111343602A (en) Joint layout and task scheduling optimization method based on evolutionary algorithm
CN115238883A (en) Neural network model training method, device, equipment and storage medium
CN113191504A (en) Federated learning training acceleration method for computing resource heterogeneity
WO2020133364A1 (en) Neural network compression method and apparatus
CN111276138B (en) Method and device for processing voice signal in voice wake-up system
US20220121936A1 (en) Neural Network Model Processing Method and Apparatus
CN116011550A (en) Model pruning method, image processing method and related devices
US11195094B2 (en) Neural network connection reduction
CN115983366A (en) Model pruning method and system for federal learning
US11748943B2 (en) Cleaning dataset for neural network training

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18944533

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18944533

Country of ref document: EP

Kind code of ref document: A1