WO2020133492A1

WO2020133492A1 - Neural network compression method and apparatus

Info

Publication number: WO2020133492A1
Application number: PCT/CN2018/125812
Authority: WO
Inventors: 朱佳峰; 刘刚毅; 卢惠莉; 高伟; 芮祥麟; 杨鋆源; 夏军
Original assignee: 华为技术有限公司
Priority date: 2018-12-29
Filing date: 2018-12-29
Publication date: 2020-07-02
Also published as: CN113168554A; CN113168554B

Abstract

A neural network compression method and apparatus, used to solve the problem in the prior art that it is not possible to effectively adapt to the capability of a processing device and achieve a better processing effect. The method comprises: determining a sparse unit length according to processing capability information of a processing device; when performing a current round of training on a neural network model, according to a jth set of weights referenced in a previous round of training, adjusting the jth set of weights obtained after the previous round of training, and obtaining a jth set of weights referenced in the current round of training; performing the current round of training on the neural network model according to various obtained sets of weights referenced in the current round of training. The sparse unit length is the data length of one operation when the processing device performs matrix operations, the number of weights included in the jth set of weights is the sparse unit length, j is any positive integer from 1 to m, and m is the total number of sets of weights obtained after grouping all the weights of the neural network model according to the sparse unit length.

Description

Neural network compression method and device

Technical field

This application relates to the field of neural networks, and in particular to a neural network compression method and device.

Background technique

At present, deep learning technology is in full swing in the industry, and various industries are applying deep learning technology in their respective fields. As we all know, when the deep learning model (that is, the neural network model) is run, it involves a large number of floating-point matrix operations, and the neural network is usually over-parameterized, and the deep learning model has obvious redundancy, which leads to computation and storage. waste. In order to simplify the calculation and storage space of the model, the current industry has proposed a variety of compression methods, such as a variety of model sparse methods, these methods through pruning, quantization, etc., to reset the weight of the model weight matrix with weak expression Zero to achieve the purpose of simplifying model calculation and storage.

At present, for example, when sparseing the deep learning model, the value of each weight in the deep learning model is automatically learned based on the training set. Random sparseness is performed during the training process, and the weights cannot be sparsely targeted. Processing, so that subsequent processing equipment can only rely on the deep learning model obtained by random sparseness for data processing, can not be well adapted to the processing equipment's ability, and can not achieve a better processing effect.

Summary of the invention

The embodiments of the present application provide a neural network compression method and device to solve the problem that the prior art cannot adapt well to the processing equipment's ability and cannot achieve a better processing effect.

In the first aspect, the present application provides a neural network compression method, which determines the sparse unit length according to the processing capability information of the processing device; then, when performing the current training on the neural network model, according to the jth set of weights referred to in the previous training , Adjust the j-th group weight obtained after the last training to obtain the j-th group weight referred to in the current training; wherein the length of the sparse unit is the data length of one operation when the processing device performs matrix operation, the first The number of weights included in the set of j weights is the length of the sparse unit; the j takes any positive integer from 1 to m, where m is the weight of ownership of the neural network model in accordance with the sparse unit The total number of weight groups obtained after length grouping;

Perform the current training on the neural network model according to the obtained weights of the groups referred to in the current training.

Through the above method, when performing neural network compression, the sparse unit length can be determined based on the capability information of the processing device. During the training process, the weights after grouping based on the sparse unit length can be processed according to the capabilities of the processing device. The ability to adapt the neural network model to different processing equipment so that subsequent processing equipment can achieve better processing results.

In a possible design, the length of the sparse unit is determined according to the processing capability information of the processing device, and the specific method may be: determining the length of the register in the processing device or the maximum data length of the instruction set in the processing device, Then, the length of the register or the maximum data length processed by the instruction set at a time is used as the sparse unit length.

Through the above method, the sparse unit length can be accurately determined to adapt to the processing capability of the processing device.

In a possible design, the neural network compression device may further determine the bit width of the calculation unit in the processing device, and use the determined bit width of the calculation unit as the sparse unit length. Wherein, the computing unit may be, but not limited to, GPU, NPU, etc.

In a possible design, before the first training of the neural network, according to the initial weight threshold of the initial neural network model, the ownership weight of the initial neural network model is tailored.

Through the above method, the neural network is first trimmed, which can save some processing processes in the subsequent training process and improve the calculation speed.

In a possible design, the jth group weight obtained after the last training is adjusted according to the jth group weight referenced in the last training, which may specifically include the following five situations:

Case 1. When the jth group weights referred to in the previous training are all zero, and the jth group weights obtained after the last training are all less than the zero-setting weight threshold, the The weights of group j are all set to zero;

Case 2: When all the jth group weights referred to in the previous training are all zero, and not all the jth group weights obtained after the last training are less than the zero-setting weight threshold, keep the obtained after the last training The weight of group j remains unchanged;

Case 3. The jth group weights referred to in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is obtained after the last training. When the proportion of the total number of j groups of weights is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the previous training are all less than the zero-setting weight threshold, the last training The weights of the non-zero values in the obtained j-th group weights are all set to zero;

Case 4. The j-th group weights referenced in the previous training are not all zero, and the number of non-zero values in the j-th group weights obtained after the last training is obtained after the last training. When the proportion of the total number of j groups of weights is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the last training are not all less than the zero-setting weight threshold, the previous training is maintained The weight of the jth group obtained afterwards remains unchanged;

Case 5. The j-th group weights referred to in the previous training are not all zero, and the number of non-zero values in the j-th group weights obtained after the last training is obtained after the last training. When the proportion of the total number of j groups of weights is not less than the set proportion threshold, keep the jth group of weights obtained after the previous training unchanged.

Through the above method, the weights after the last training can be adjusted according to different actual conditions, so that the weight zero values of the neural network model obtained later are distributed more regularly, so that as many zero values as possible are continuously distributed in a set of weights In this way, when the neural network model is subsequently used for data processing, the time for accessing data is reduced, and the calculation speed is improved.

In a possible design, the zero-setting weight threshold may be determined based on the initial weight threshold, for example, the zero-setting weight threshold may set a multiple of the initial weight threshold, the setting The multiple is greater than 1. In this way, the value range of the current weight can be more closely matched in the subsequent judgment process.

In a possible design, determining whether the j-th group of weights referred to in the previous training are all zero. The specific method may be: determine whether the zero-setting flag corresponding to the j-th group of weights in the zero-setting marker data structure is zero; when When the zero-setting mark is zero, it is determined that the jth group weights of the last training reference are all zero; when the zero-setting mark is non-zero value, it is determined that the jth group weights of the previous training reference are incomplete Is zero.

Through the above method, it can be accurately determined whether the jth group weights referenced in the previous training are all zero, so as to perform subsequent processing according to the judgment result.

In a possible design, after the j-th group of weights obtained after the previous training are all set to zero, or after the weights of the non-zero values are all set to zero, the current zero-setting flag data is also set The zero-setting mark corresponding to the j-th group weight in the structure is updated to zero; or, after keeping the j-th group weight obtained after the last training unchanged, the j-th group weight in the current zero-marking data structure is also corresponded to The zero-setting flag of is updated to a non-zero value.

Through the above method, the zero-setting flags in the zero-setting flag data structure can be updated in real time, so that when the weight adjustment is performed, it can be more accurately judged whether the jth group weights referred to in the previous training are all zero.

In the second aspect, the present application provides a data processing method to obtain the weights of the target neural network model, and perform the following processing based on the weights of the target neural network model: at the pth processing, determine whether the qth group of weights are all Is zero, if yes, generate and save the first operation result according to the matrix operation type or according to the matrix operation type and the matrix data to be processed, otherwise according to the qth group weight, the matrix data to be processed and the matrix The operation type generates the second operation result and saves it; wherein, the target neural network model is the final neural network model obtained by training the weighted neural network based on the sparse unit length and grouping the neural network model; the sparse unit length is based on Determined by the processing capability information of the processing device, the sparse unit length is the data length of one operation when performing matrix operation; the number of weights included in the qth group of weights is the sparse unit length; the q is taken from 1 to f Any positive integer in, where f is the total number of groups in which the weight of the target neural network model is grouped according to the sparse unit length; p takes any positive integer from 1 to f.

Through the above method, due to the sparse unit length obtained by applying the processing capacity information of the processing equipment, the final neural network model obtained by training the neural network model after the weighting of the neural network is grouped, so that according to the characteristics of the matrix operation, the subsequent application of the When the final neural network model performs data processing, it can greatly reduce the amount of data access and calculation, which can increase the speed of operation.

In a possible design, a specific method for judging whether the q-th group weights are all zero may be: obtaining a zero-setting label data structure corresponding to the weight of the target neural network model; judging the zero-setting label data structure Whether the zero-setting mark corresponding to the q-th group weight is zero; specifically, when the zero-setting mark corresponding to the q-th group weight in the zero-setting mark data structure is zero, the q-th group's The weights are all zero; when the zero-setting flags corresponding to the weights of the q-th group in the zero-marking data structure are not zero, it is determined that the weights of the q-th group are not all zero.

Through the above method, it can be accurately determined whether the q-th group weights are all zero, so that subsequent matrix operation results can be directly generated when it is determined to be zero, which can reduce the amount of data access and calculation, and can increase the speed of calculation.

In a possible design, when the q-th group weights are all zero, the data processing device generates the first operation result according to the matrix operation type or according to the matrix operation type and the matrix data to be processed: When the matrix operation type is matrix multiplication, the data processing device directly obtains that the first operation result is zero; when the matrix operation type is matrix addition, the data processing device determines the matrix to be processed The data is the result of the first operation. This can reduce the amount of data access and calculations, which can increase the speed of operation.

In a third aspect, the present application also provides a neural network compression device, which has the function of implementing the method of the first aspect described above. The function can be realized by hardware, or can also be realized by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions.

In a possible design, the structure of the neural network compression device may include a determination unit, a weight adjustment unit, and a training unit, and these units may perform the corresponding functions in the method examples of the first aspect described above. For details, see the method examples of the first aspect The detailed description in is not repeated here.

In a possible design, the structure of the neural network compression device may include a processor and a memory, and the processor is configured to perform the method mentioned in the first aspect above. The memory is coupled to the processor, and stores necessary program instructions and data of the neural network compression device.

According to a fourth aspect, the present application further provides a data processing device having the function of implementing the method of the second aspect. The function can be realized by hardware, or can also be realized by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions.

In a possible design, the structure of the data processing device may include an acquisition unit and a processing unit, and these units may perform the corresponding functions in the method examples of the second aspect described above. For details, see the detailed description in the method examples of the second aspect. I will not repeat them here.

In a possible design, the structure of the data processing apparatus may include a processor and a memory, and the processor is configured to perform the method mentioned in the second aspect above. The memory is coupled to the processor, and stores necessary program instructions and data of the data processing device.

In a fifth aspect, the present application also provides a computer storage medium that stores computer-executable instructions, which when used by the computer are used to cause the computer to execute the first Any one of the methods mentioned in one aspect or the second aspect.

In a sixth aspect, the present application also provides a computer program product containing instructions, which when executed on a computer, causes the computer to perform any of the methods mentioned in the first aspect or the second aspect.

According to a seventh aspect, the present application further provides a chip coupled to a memory for reading and executing program instructions stored in the memory to implement any of the methods mentioned in the first aspect or the second aspect .

BRIEF DESCRIPTION

1 is a schematic diagram of a neural network provided by an embodiment of this application;

2 is a structural diagram of a terminal device provided by an embodiment of the present application;

3 is a flowchart of a neural network compression method provided by an embodiment of this application;

4 is a schematic diagram of a data structure and a weight matrix of a zero-setting mark provided by an embodiment of the present application;

5 is a schematic flowchart of a weight adjustment provided by an embodiment of the present application;

6 is a flowchart of a data processing method provided by an embodiment of this application;

7 is an example diagram of a data processing process provided by an embodiment of the present application;

8 is a schematic structural diagram of a neural network compression device provided by an embodiment of the present application;

9 is a schematic structural diagram of a data processing device according to an embodiment of the present application;

10 is a structural diagram of a neural network compression device provided by an embodiment of the present application;

FIG. 11 is a structural diagram of a data processing apparatus according to an embodiment of the present application.

detailed description

The application will be described in further detail below with reference to the drawings.

The embodiments of the present application provide a neural network compression method and device to solve the problem that the prior art cannot adapt well to the processing equipment's ability and cannot achieve a better processing effect. Among them, the method and the device described in this application are based on the same inventive concept. Since the principles of the method and the device to solve the problem are similar, the implementation of the device and the method can be referred to each other, and the repetition is not repeated.

In the following, some terms in this application will be explained to facilitate understanding by those skilled in the art.

It is well known that neural networks imitate the behavioral characteristics of animal neural networks, similar to the structure of brain synaptic connections for data processing. As a mathematical operation model, a neural network consists of a large number of nodes (or neurons) connected to each other. The neural network consists of an input layer, a hidden layer, and an output layer, such as shown in Figure 1. Among them, the input layer is the input data of the neural network; the output layer is the output data of the neural network; and the hidden layer is composed of many nodes connected between the input layer and the output layer, and is used to perform arithmetic processing on the input data. Among them, the hidden layer may be composed of one or more layers. The number of hidden layers in the neural network and the number of nodes are directly related to the complexity of the problem actually solved by the neural network, the number of nodes in the input layer and the number of nodes in the output layer.

Normally, neural network models with stable performance obtained after a large number of trainings by neural networks are widely deployed on data processing equipment to realize the application of neural network models in various fields. Since the process of training a neural network is a complicated process, generally, the platform for training a neural network model and the platform for deploying a neural network model are generally separated. In the embodiment of the present application, since the neural network needs to be compressed during the training of the neural network, the embodiment of the present application may be referred to as a neural network compression device. Wherein, exemplarily, the neural network compression device may be, but not limited to, a personal computer (personal computer, PC) and other terminal devices, a server, a cloud service platform, etc. In this embodiment of the present application, a neural network model The deployed platform may be referred to as a data processing device. Exemplarily, the data processing device may be, but not limited to, a mobile phone, a tablet computer, a PC, and other terminal devices, but may also be but not limited to a server, etc.

In order to more clearly describe the technical solutions of the embodiments of the present application, the neural network compression method and device and the data processing method and device provided by the embodiments of the present application will be described in detail below with reference to the drawings.

When the device implementing the neural network compression method provided by the embodiment of the present application is a terminal device, and when the device performing the data processing method provided by the embodiment of the present application is a terminal device, the neural network compression device or the data processing device Both can be applied to terminal equipment. Exemplarily, FIG. 2 shows a possible terminal device applicable to the neural network method or the data processing method provided by the embodiments of the present application. The terminal device includes: a processor 210, a memory 220, a communication module 230, and an input. Unit 240, display unit 250, power supply 260 and other components. Those skilled in the art can understand that the structure of the terminal device shown in FIG. 2 does not constitute a limitation on the terminal device. The terminal device provided in the embodiments of the present application may include more or fewer components than shown, or a combination of Components, or different component arrangements.

The following describes each component of the terminal device with reference to FIG. 2:

The communication module 230 may be connected to other devices through a wireless connection or a physical connection to implement data transmission and reception of terminal devices. Optionally, the communication module 230 may include any one or a combination of a radio frequency (RF) circuit, a wireless fidelity (WiFi) module, a communication interface, a Bluetooth module, etc. This embodiment of the present application does not make any limited.

The memory 220 can be used to store program instructions and data. The processor 210 executes program instructions stored in the memory 220 to execute various functional applications and data processing of the terminal device. Among the program instructions, there are program instructions that enable the processor 210 to execute the neural network compression method or the data processing method provided by the following embodiments of the present application.

Optionally, the memory 220 may mainly include a program storage area and a data storage area. Among them, the storage program area can store the operating system, various application programs, and program instructions; the storage data area can store various data such as neural networks. In addition, the memory 210 may include a high-speed random access memory, and may also include a non-volatile memory, such as a magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.

The input unit 240 may be used to receive information such as data or operation instructions input by the user. Optionally, the input unit 240 may include input devices such as a touch panel, function keys, a physical keyboard, a mouse, a camera, and a monitor.

The display unit 250 can realize human-computer interaction, and is used to display information input by the user and information provided to the user through the user interface. Wherein, the display unit 250 may include a display panel 251. Optionally, the display panel 251 may be configured in the form of a liquid crystal display (liquid crystal) (LCD), an organic light-emitting diode (OLED), or the like.

Further, when the input unit includes a touch panel, the touch panel can cover the display panel 251, and when the touch panel detects a touch event on or near it, it is transmitted to the processor 210 To determine the type of touch event to perform the corresponding operation.

The processor 210 is a control center of a computer device, and uses various interfaces and lines to connect the above components. The processor 210 may execute the program instructions stored in the memory 220 and call the data stored in the memory 220 to complete various functions of the computer device and implement the neural network compression provided by the embodiments of the present application Method or data processing method.

Optionally, the processor 210 may include one or more processing units. Specifically, the processor 210 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, application programs, etc., and the modem processor mainly handles wireless communication . It can be understood that the foregoing modem processor may not be integrated into the processor 210. In the embodiment of the present application, the processing unit may compress the neural network or process the data. For example, the processor 210 may be a central processing unit (CPU), a graphics processor (Graphics Processing Unit, GPU), or a combination of CPU and GPU. The processor 210 may also be a network processor (network processor) unit (NPU), a tensor processor (tensor processing unit, TPU), and other artificial intelligence (AI) chips that support neural network processing. The processor 210 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), a digital signal processing device (DSP), or a combination thereof. The PLD may be a complex programmable logic device (complex programmable logic device (CPLD), field programmable gate array (FPGA), general array logic (GAL) or any combination thereof.

The terminal device also includes a power supply 260 (such as a battery) for powering various components. Optionally, the power supply 260 may be logically connected to the processor 210 through a power management system, so as to realize functions such as charging and discharging the terminal device through the power management system.

Although not shown, the terminal device may further include components such as a camera, a sensor, and an audio collector, which are not repeated here.

It should be noted that the foregoing terminal device is only an example of a device to which the neural network compression method or data processing method provided in the embodiments of the present application is applicable. It should be understood that the neural network compression method or data processing method provided in the embodiments of the present application may also be applied to other devices than the above terminal devices, which is not limited in this application.

A neural network compression method provided by an embodiment of the present invention can be applied to the terminal device shown in FIG. 2 or other devices (such as a server, etc.). Referring to FIG. 3, the neural network compression device whose execution subject is a neural network compression device is taken as an example to illustrate the neural network compression method provided by the present application. The specific flow of the method may include:

Step 301: The neural network compression device determines the sparse unit length according to the processing capability information of the processing device, where the sparse unit length is the data length of one operation when the processing device performs matrix operation.

Wherein, the processing device is a device for processing the data to be processed after the neural network compression device finally obtains the neural network model. It should be noted that the processing device may be applied to the data processing device involved in this application.

Normally, the training of the neural network model is for one processing device, so the processing capability information of the processing device can be pre-configured in the neural network compression device, so that the neural network compression device obtains for the processing device When the neural network model that can be applied by the processing device, the subsequent process is directly performed according to the capability information of the processing device.

In an optional implementation manner, the capability information of the processing device may be indicated by the capability of the processing device to process data. In an implementation, the capability information of the processing device may be understood as capability information of a processor and a computing chip included in the processing device, where the processor or the computing chip may be, but not limited to, central processing Processor (central processing unit, CPU), graphics processor (Graphics Processing Unit, GPU), network processor (network processor unit, NPU), etc. In another implementation manner, the processing device may also be a processor or a computing chip directly.

Exemplarily, the capability information of the processing device may be embodied as a data length of one operation when the processing device performs matrix operation. Based on:

In an optional embodiment, the neural network compression device determines the sparse unit length according to the processing capability information of the processing device. The specific method may be: the neural network compression device determines the length of the register in the processing device or The maximum data length of the instruction set in the processing device at a time, and the length of the register or the maximum data length of the instruction set at a time is used as the sparse unit length.

In yet another optional embodiment, the neural network compression device may further determine the bit width of the calculation unit in the processing device, and use the determined bit width of the calculation unit as the sparse unit length . Optionally, the calculation unit may be a GPU, NPU, or the like.

In another optional embodiment, the neural network compression device may further determine one or more combinations of the bit widths of registers, caches, instruction sets, and calculation units in the processing device The maximum data length that can be supported, and the maximum data length that can be supported is used as the sparse unit length.

Through the subsequent step 301, the neural network model can be specifically trained for different hardware devices, which can be more adapted to the processing capabilities of the hardware devices and achieve better results.

Step 302: When performing the current training on the neural network model, the neural network compression device adjusts the jth group of weights obtained after the last training according to the jth group of weights referenced in the previous training to obtain the current training reference Group j weights; wherein, the number of weights included in group j weights is the length of the sparse unit; the j takes any positive integer from 1 to m, where m is the neural network model The total number of weights obtained after grouping according to the sparse unit length is the total number of groups.

In an optional implementation manner, each time the neural network compression device performs training, it obtains a continuous set of weights according to the sparse unit length to perform the training process. It can be understood that the neural network compression device groups the weight according to the sparse unit length. Optionally, during each training of the neural network compression device, the weight of the neural network model can be obtained first, and when the neural network compression device can directly obtain the specific data of the weight, the neural network model can also be obtained Model file, and parse the model file to obtain weighted data.

In an optional embodiment, before performing the first training on the neural network, the neural network compression device may perform weighting on the initial neural network model according to the initial weight threshold of the initial neural network model. Tailoring.

Wherein, exemplarily, the specific method for the neural network compression device to trim the weight of the initial neural network model may be: the neural network compression device separately obtains the weight of each layer of the initial neural network model Then, the weights of each layer are trimmed according to the initial weight threshold of each layer until the weights of all layers are trimmed. Among them, the above process can be called a sparse process. Specifically, the above process can use a variety of commonly used matrix sparse methods, such as the pruning method mentioned in the paper "Learning both Weights and Connections for Efficient Neural Networks", and It may be the quantization method mentioned in the paper "Ternary Weights" or other methods, which is not specifically limited in this application.

In an optional implementation manner, when the neural network compression device tailors the weight of each layer according to the initial weight threshold of each layer, the specific process may be that the neural network compression device The weights in each layer that are less than the initial weight threshold of each layer are set to zero, and the weights in each layer that are not less than the initial weight threshold of each layer are kept unchanged.

In an alternative embodiment, before obtaining the weight of each layer of the initial neural network model, the neural network compression device needs to train the neural network to obtain the weight of the neural network, and then obtain the initial Neural network model. Exemplarily, the neural network is trained to obtain the weight in the neural network, which may be specifically: through data input and neural network model construction, the structure of the neural network and the weight in the neural network are obtained. For example, the neural network may be trained through commonly used deep learning frameworks, such as TensorFlow, Caffe, MXNet, PyTorch, and so on.

In an optional implementation manner, the neural network compression device adjusts the jth group weight obtained after the last training according to the jth group weight referenced in the last training, which may specifically include the following 5 cases:

Case a1, when the jth group weights referred to in the previous training are all zero, and the jth group weights obtained after the last training are all less than the zero-setting weight threshold, the neural network compression device The weights of group j obtained after one training are all set to zero.

Case a2. When the weights of the j-th group referred to in the previous training are all zero, and all the weights of the j-th group obtained after the previous training are not less than the zero-setting weight threshold, the neural network compression device maintains all The weight of the jth group obtained after the last training is unchanged.

Case a3. The j-th group weights referenced in the previous training are not all zero, and the number of non-zero values in the j-th group weights obtained after the last training is obtained after the last training When the proportion of the total number of j groups of weights is less than the set weight threshold, and the non-zero values of the jth group of weights obtained after the previous training are all less than the zero-setting weight threshold, the neural network compression device will The weights of the non-zero values in the j-th group weights obtained after the last training are all set to zero.

For example, the set specific gravity threshold may be 30%, etc., and may also be other values, which is not limited in this application.

Case a4. The j-th group weights referred to in the previous training are not all zero, and the number of non-zero values in the j-th group weights obtained after the last training is obtained after the last training. When the proportion of the total number of j groups of weights is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the last training are not all less than the zero-setting weight threshold, the neural network compression device Keep the jth group weight obtained after the previous training unchanged.

Case a5. The j-th group weights referenced in the previous training are not all zero, and the number of non-zero values in the j-th group weights obtained after the last training is obtained after the last training. When the proportion of the total number of j groups of weights is not less than the set proportion threshold, the neural network compression device keeps the jth group of weights obtained after the previous training unchanged.

Through the above method, the distribution of zero values in the weight matrix of the final neural network model can be made more uniform, for example, continuous zero values can be distributed in a set of weights as much as possible, so that the subsequent application of the neural network model for data The zero-value regular distribution is used during processing to greatly reduce the number of memory accesses and the amount of calculation, which in turn can increase the speed of calculation.

It should be noted that the jth group weight referenced in the last training can be understood as the jth group weight that needs training last time; the jth group weight obtained after adjusting the jth group weight obtained after the last training is the current time The weight that needs to be trained, that is, the weight that is referenced in the current training. It should be understood that the jth group of weights referenced in the first training may be the jth group of weights of the initial neural network model.

In an optional embodiment, the zero-setting weight threshold may be determined based on the initial weight threshold. Exemplarily, the zero-setting weight threshold may set a multiple of the initial weight threshold, The set multiple is greater than 1. For example, when the initial weight threshold is 1, the zero-setting threshold may be 1.05.

In an alternative embodiment, the neural network compression device maintains a zero-setting mark data structure, and a set of weights corresponding to each zero-setting mark in the zero-setting mark data structure (where each A set of weights can be called a weight matrix). Among them, when the weights in a group of weights are all 0, the zero setting mark corresponding to the group weight is 0, and when at least one of the group weights is not 0, the zero setting mark corresponding to the group weight is a non-zero value (That is, 1 etc.). For example, the zero mark data structure and weight matrix can be represented as shown in the schematic diagram in FIG. 4. The weight of each consecutive sparse unit length in the weight matrix corresponds to 1 bit in a zero-mark data structure. As shown in the example shown in FIG. 4, when the sparse unit length is 4, every 4 The continuous weight corresponds to a zero-setting mark.

In an optional embodiment, based on the above zero-setting label data structure, when the neural network compression device determines whether the jth group weights referenced in the previous training are all zero, the specific method may be: the nerve The network compression device determines whether the zero-setting mark corresponding to the j-th set of weights in the zero-setting mark data structure is zero; when the zero-setting mark is zero, it is determined that the j-th set of weights referred to in the previous training are all zero ; When the zero-setting flag is a non-zero value, it is determined that the j-th group of weights referred to in the previous training is not all zero. For example, taking FIG. 4 as an example, in the data structure of the zero mark in FIG. 4, the first zero mark is 0, which means that the set of weights corresponding to the zero mark is all 0. For example, in the weight matrix in FIG. The first 4 weights in a row (that is, the first group of weights, or the first weight matrix) can be seen that the corresponding group of weights are all 0.

In an optional implementation manner, after the neural network compression device resets all the jth group weights obtained after the previous training to zero, or after all non-zero weights are set to zero, Update the zero-setting mark corresponding to the j-th group of weights in the current zero-setting mark data structure to zero. Similarly, in an optional implementation manner, after maintaining the jth group weight obtained after the last training unchanged, the neural network compression device changes the jth group weight in the current zero-marking data structure The corresponding zero mark is updated to a non-zero value (that is, 1). Through the above method, the zero-setting mark in the zero-setting mark data structure can be updated in real time, so that the weights can be adjusted more accurately during the training process, and the subsequent processing device can accurately base on the data processing based on the neural network model Weights are used for data processing.

The above five situations may actually be a cyclic process. The neural network compression device first determines whether the jth group weight referenced by the previous training is zero, and then performs subsequent processes according to the judgment results. According to the above five situations, Thereby, new weights of all groups of the neural network model are obtained, so that the neural network compression device subsequently trains the new weights. Exemplarily, a schematic diagram of a specific weight adjustment process may be shown in FIG. 5.

It should be noted that, in the process of obtaining the m, when the weight of the neural network model is grouped according to the sparse unit length, there may be multiple cases: in one case, the neural network model The weights are grouped together evenly. During the grouping process, the number of remaining weights in the last group may be less than the length of the sparse unit. At this time, even if the number of weights in the last group is less than the length of the sparse unit, the weight of the group is processed. It is also the same as the processing method of other groups of weights (the number is equal to the length of the sparse unit); another case is that the weight matrix composed of the weights of the neural network model is divided into rows (or columns) for each The weight of one row (or column) is grouped, so that when each row (or column) is grouped according to the sparse unit length, the number of weights in the last group in each row (or column) may also be less than the length of the sparse unit For the same reason, the processing method of the weight of the last group in each row (or column) is the same as the processing method of the weights of other groups (the number is equal to the length of the sparse unit).

Step 303: The neural network compression device performs the current training on the neural network model according to the obtained sets of weights referenced by the current training.

Based on the above step 302, all group weights of the neural network model can be obtained, so that step 303 can be performed.

In an optional implementation manner, the method for performing step 303 by the neural network may refer to a commonly used neural network training method, which is not specifically described in this application.

Using the neural network compression method provided in the embodiments of the present application, when performing neural network compression, the sparse unit length can be determined based on the capability information of the processing device, and the weights after grouping based on the sparse unit length are processed during the training process, According to the different capabilities of the processing equipment, the neural network model can be adapted to the capabilities of different processing equipment, so that the subsequent processing equipment can achieve a better processing effect.

The final neural network model obtained through the embodiment shown in FIG. 3 may be applied to a data processing device, so that the data processing device performs data processing based on the finally obtained neural network model. Based on this, an embodiment of the present application also provides a data processing method, which is implemented based on the final neural network model obtained in the embodiment shown in FIG. 3. As shown in FIG. 6, the data processing method provided by the present application is explained by taking an execution subject as a data processing device as an example. The specific flow of the method may include the following steps:

Step 601: The data processing device obtains the weight of the target neural network model, the target neural network model is a final neural network model obtained by training the weighted neural network of the neural network model based on the sparse unit length after grouping; the sparse unit length It is determined based on the processing capability information of the processing device, and the sparse unit length is the data length of one operation when performing matrix operation.

For the method for generating the target neural network model, reference may be made to the specific process in the embodiment shown in FIG. 3, and details are not repeated here.

Similarly, the processing device is the data processing device here, and for a specific method for determining the sparse unit length based on the processing capability information of the processing device, reference may also be made to the related reference in the embodiment shown in FIG. 3 The method will not be repeated here.

Step 602: Perform the following processing based on the weights of the target neural network model: in the pth processing, determine whether the qth group of weights are all zero, and if so, according to the matrix operation type or according to the matrix operation type and the to-be-processed The matrix data generates the first operation result and saves it; otherwise, generates and saves the second operation result according to the qth group weights, the matrix data to be processed, and the matrix operation type.

Among them, the number of weights included in the qth group of weights is the length of the sparse unit; the q takes any positive integer from 1 to f, and f is the weight of the target neural network model according to the sparse The total number of groups after unit length grouping; p takes any positive integer from 1 to f.

It should be noted that the grouping in the process of obtaining f is similar to the grouping in the process of obtaining m in the embodiment shown in FIG. 3, and the specific descriptions can be referred to each other, and are not described in detail here.

In an optional implementation manner, when the data processing device determines whether the q-th group weights are all zero, first obtain a zero-setting label data structure corresponding to the weight of the target neural network model, and then determine the zero-setting Mark whether the zero-setting mark corresponding to the qth group of weights in the data structure is zero. Specifically, when the zero-setting mark corresponding to the weight of the q-th group in the zero-marking data structure is zero, the data processing device determines that the weights of the q-th group are all zero; When the zero-setting flag corresponding to the weight of the q-th group in the zero-mark data structure is not zero, the data processing device determines that the weights of the q-th group are not all zero. For example, as shown in FIG. 4, when the zero-setting mark corresponding to the q-th group weight is acquired as the first zero-setting mark, since the zero-setting mark is 0, it is determined that the q-th group weights are all zero.

Since the target neural network model is adapted to the data processing device, information about the target neural network model (such as the zero-mark data structure) has been pre-configured in the data processing device in. For the description of the data structure of the zero-setting mark and the related description of the zero-setting mark, reference may be made to the description of the data structure of the zero-setting mark and the related description of the zero-setting mark in the embodiment shown in FIG. 3, and details are not repeated here.

In an example, when the q-th group weights are all zero, the data processing device generates the first operation result according to the matrix operation type or according to the matrix operation type and the matrix data to be processed: when the When the matrix operation type is matrix multiplication, the data processing device directly obtains that the first operation result is zero; when the matrix operation type is matrix addition, the data processing device determines that the matrix data to be processed is all The first operation result is described.

In another example, when the q-th group weights are not all zero, the data processing device generates a second operation result according to the q-th group weights, the matrix data to be processed, and the matrix operation type, A specific method is: the data processing device loads the qth group weights and the matrix data to be processed into a register, and then loads the qth group weights and the matrix to be processed according to the matrix operation type The data is subjected to a corresponding matrix operation to generate the second operation result.

After the ownership of the target neural network model is traversed through the above process, the final processing result can be generated.

In the above process, it can be clearly seen that when a set of weights are all zero, it can be understood as skipping the current most time-consuming matrix operation process to achieve acceleration.

It should be noted that the above processing process is a cyclic process, and the above processing is performed for each group of weights until the weights of all groups are traversed. Exemplarily, a specific data processing process may be shown in the schematic diagram in FIG. 7.

Using the data processing method provided in the embodiment of the present application, due to the application of the sparse unit length obtained according to the processing capability information of the data processing device (ie, processing device), the weights of the neural network model are grouped and the final neural network model is trained after the neural network is grouped In this way, according to the characteristics of the matrix operation, the subsequent application of the final neural network model for data processing can greatly reduce the amount of data access and calculation, thereby improving the speed of operation.

Based on the above embodiments, the embodiments of the present application further provide a neural network compression device, which is used to implement the neural network compression method provided in the embodiment shown in FIG. 3. Referring to FIG. 8, the neural network compression device 800 includes a determination unit 801, a weight adjustment unit 802, and a training unit 803, where:

The determining unit 801 is used to determine the sparse unit length according to the processing capability information of the processing device, and the sparse unit length is the data length of one operation when the processing device performs matrix operation; the weight adjustment unit 802 is used to When performing the current training on the neural network model, the jth group weight obtained after the last training is adjusted according to the jth group weight referenced in the previous training to obtain the jth group weight referenced in the current training; The number of weights included in the set of j weights is the length of the sparse unit; the j takes any positive integer from 1 to m, where m is the weight of ownership of the neural network model in accordance with the sparse unit The total number of groups of weights obtained after length grouping; the training unit 803 is configured to perform the current training of the neural network model according to the weights of the groups referenced by the current adjustment training unit obtained by the weight adjustment unit.

In an optional embodiment, when determining the length of the sparse unit according to the processing capability information of the processing device, the determining unit 801 determines the length of the register in the processing device or the instruction set in the processing device for processing at a time Maximum data length; the length of the register or the maximum data length processed by the instruction set at a time is used as the sparse unit length.

In an optional implementation manner, the neural network compression device may further include a weight trimming unit, the weight trimming unit is used to first train the neural network according to the initial neural network model before the training unit The initial weight threshold of, trims the initial weight of the initial neural network model.

In an optional implementation manner, when the weight adjustment unit 802 adjusts the jth group of weights obtained after the last training according to the jth group of weights referenced in the previous training, the weight adjustment unit 802 may be specifically classified into the following types: Happening:

When the jth group weights referred to in the previous training are all zero, and the jth group weights obtained after the last training are all less than the zero-setting weight threshold, the jth group weights obtained after the last training Zero all; or

When the weights of the jth group referenced in the previous training are all zero, and not all the weights of the jth group obtained after the last training are less than the zero-setting weight threshold, the jth group obtained after the last training is maintained The group weight is unchanged; or

The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number of is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the last training are all less than the zero-setting weight threshold, the The weights of the non-zero values in the group j weights are all set to zero; or

The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the last training are not all less than the zero-setting weight threshold, keep the value obtained after the last training Group j weights remain unchanged; or

The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number is not less than the set proportion threshold, keep the jth group weight obtained after the previous training unchanged.

In an optional implementation manner, the weight adjustment unit 802 is specifically configured to determine whether the j-th group of weights in the zero-mark data structure corresponds to the j-th group of weights referenced in the previous training is all zero Whether the zero-setting flag is zero; when the zero-setting flag is zero, it is determined that the jth group of weights referred to in the previous training are all zero; when the zero-setting flag is a non-zero value, it is determined that the upper The weights of the jth group referred to in one training are not all zero.

In an optional implementation manner, the weight adjustment unit 802 is further used to set the weights of the j-th group obtained after the last training to zero, or to set the weights of the non-zero values to all After zero, the zero-setting flag corresponding to the j-th group of weights in the current zero-setting flag data structure is updated to zero; or, the weight adjustment unit 802 is further used to maintain the j-th group of weights obtained after the previous training After unchanged, the zero-setting mark corresponding to the j-th group of weights in the current zero-setting mark data structure is updated to a non-zero value.

Using the neural network compression device provided in the embodiment of the present application, when performing neural network compression, the sparse unit length can be determined based on the capability information of the processing device, and the weights after grouping based on the sparse unit length can be processed during the training process. According to the different capabilities of the processing equipment, the neural network model can be adapted to the capabilities of different processing equipment, so that the subsequent processing equipment can achieve a better processing effect.

Based on the above embodiments, the embodiments of the present application further provide a data processing apparatus, which is used to implement the data processing method provided in the embodiment shown in FIG. 6. Referring to FIG. 9, the data processing apparatus 900 includes an acquiring unit 901 and a processing unit 902, where:

The obtaining unit 901 is used to obtain the weight of the target neural network model. The target neural network model is a final neural network model obtained by training the weighted neural network of the neural network model based on the sparse unit length after grouping; the processing unit 902 It is used to perform the following processing based on the weights of the target neural network model: in the pth processing, determine whether the qth group of weights are all zero, and if so, according to the matrix operation type or according to the matrix operation type and the matrix to be processed Generate and save the first operation result of the data, otherwise generate and save the second operation result according to the qth group weights, the matrix data to be processed and the matrix operation type; wherein, the length of the sparse unit is based on processing Determined by the processing capability information of the device, the sparse unit length is the data length of one operation when performing matrix operation; the number of weights included in the qth group of weights is the sparse unit length; the q is taken from 1 to f Any positive integer of, where f is the total number of groups after the weight of the target neural network model is grouped according to the sparse unit length; p takes any positive integer from 1 to f.

In an optional implementation manner, the processing unit 902 is specifically configured to: when determining whether the q-th group weights are all zero: obtain a zero-labeled data structure corresponding to the weight of the target neural network model; determine the Whether the zero-setting mark corresponding to the q-th group weight in the zero-setting mark data structure is zero.

Using the data processing device provided in the embodiments of the present application, due to the application of the sparse unit length obtained according to the processing capability information of the data processing device (ie, processing device), the final neural network model obtained by training the neural network model after weighting the neural network model is grouped In this way, according to the characteristics of the matrix operation, the subsequent application of the final neural network model for data processing can greatly reduce the amount of data access and calculation, thereby improving the speed of operation.

It should be noted that the division of the units in the embodiments of the present application is schematic, and is only a division of logical functions, and there may be other division manners in actual implementation. The functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or software function unit.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application essentially or part of the contribution to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods described in the embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .

Based on the above embodiments, the embodiments of the present application further provide a neural network compression device, which is used to implement the neural network compression method shown in FIG. 3. Referring to FIG. 10, the neural network compression device 1000 includes: a processor 1001 and a memory 1002, where:

The processor 1001 may be a CPU, GPU, or a combination of CPU and GPU. The processor 1001 may also be an AI chip that supports neural network processing such as NPU, TPU, and so on. The processor 1001 may further include a hardware chip. The above hardware chip may be ASIC, PLD, DSP or a combination thereof. The above PLD may be CPLD, FPGA, GAL or any combination thereof. It should be noted that the processor 1001 is not limited to the above enumerated cases, and the processor 1001 may be any processing device capable of implementing the neural network compression method shown in FIG. 3 described above.

The processor 1001 and the memory 1002 are connected to each other. Optionally, the processor 1001 and the memory 1002 are connected to each other through a bus 1003; the bus 1003 may be a peripheral component interconnection standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard structure (Extended Industry Standard Architecture) , EISA) bus and so on. The bus can be divided into an address bus, a data bus, and a control bus. For ease of representation, only a thick line is used in FIG. 10, but it does not mean that there is only one bus or one type of bus.

When the processor 1001 is used to implement the neural network compression method provided by the embodiment of the present application, it performs the following operations:

Determine the sparse unit length according to the processing capability information of the processing device, where the sparse unit length is the data length of one operation when the processing device performs matrix operation;

When performing the current training on the neural network model, the jth group weight obtained after the last training is adjusted according to the jth group weight referenced in the previous training to obtain the jth group weight referenced in the current training; The number of weights included in the set of j weights is the length of the sparse unit; the j takes any positive integer from 1 to m, where m is the weight of ownership of the neural network model in accordance with the sparse unit The total number of weight groups obtained after length grouping;

In an optional implementation manner, the processor 1001 may also perform other operations. For details, reference may be made to the specific descriptions involved in step 301, step 302, and step 303 in the embodiment shown in FIG. 3 above. Repeat again.

The memory 1002 is used to store programs and data. Specifically, the program may include program code, and the program code includes instructions for computer operation. The memory 1002 may include random access memory (random access memory, RAM), or may also include non-volatile memory (non-volatile memory), for example, at least one disk memory. The processor 1001 executes the program stored in the memory 1002 to realize the above-mentioned functions, thereby implementing the neural network compression method shown in FIG. 3.

It should be noted that, when the neural network compression device shown in FIG. 10 can be applied to a terminal device, the neural network compression device may be embodied as the terminal device shown in FIG. 2. At this time, the processor 1001 may be the same as the processor 210 shown in FIG. 2, and the memory 1002 may be the same as the memory 220 shown in FIG. 2.

Based on the above embodiment, an embodiment of the present application further provides a data processing apparatus, which is used to implement the data processing method shown in FIG. 4. Referring to FIG. 11, the data processing device 1100 includes a processor 1101 and a memory 1102, where:

The processor 1101 may be a CPU, GPU, or a combination of CPU and GPU. The processor 1101 may also be an AI chip that supports neural network processing such as NPU, TPU, and so on. The processor 1101 may further include a hardware chip. The above hardware chip may be ASIC, PLD, DSP or a combination thereof. The above PLD may be CPLD, FPGA, GAL or any combination thereof. It should be noted that the processor 1101 is not limited to the above-mentioned cases, and the processor 1101 may be any processing device capable of implementing neural network inference operation.

The processor 1101 and the memory 1102 are connected to each other. Optionally, the processor 1101 and the memory 1102 are connected to each other through a bus 1103; the bus 1103 may be a Peripheral Component Interconnect (PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture) , EISA) bus and so on. The bus can be divided into an address bus, a data bus, and a control bus. For ease of representation, only a thick line is used in FIG. 11, but it does not mean that there is only one bus or one type of bus.

When the processor 1101 is used to implement the data processing method provided by the embodiment of the present application, it may perform the following operations:

Obtain the weight of the target neural network model. The target neural network model is the final neural network model obtained by training the weighted neural network based on the sparse unit length and grouping the weight of the neural network model; the sparse unit length is based on processing by the processing device If the capability information is determined, the length of the sparse unit is the data length of one operation when performing matrix operation;

The following processing is performed based on the weights of the target neural network model: in the pth processing, it is determined whether the qth group of weights are all zero, and if so, it is generated according to the matrix operation type or according to the matrix operation type and the matrix data to be processed Save the first operation result, otherwise generate and save the second operation result according to the qth group weights, the matrix data to be processed and the matrix operation type;

In an optional implementation manner, the processor 1101 may also perform other operations. For details, reference may be made to the specific descriptions involved in step 601 and step 602 in the embodiment shown in FIG. 6 above, and details are not described herein again.

The memory 1102 is used to store programs and data. Specifically, the program may include program code, and the program code includes instructions for computer operation. The memory 1102 may include random access memory (random access memory, RAM), or may also include non-volatile memory (non-volatile memory), for example, at least one disk memory. The processor 1101 executes the program stored in the memory 1102 to realize the above functions, thereby implementing the data processing method shown in FIG. 6.

It should be noted that when the data processing apparatus shown in FIG. 11 can be applied to a terminal device, the data processing apparatus may be embodied as the terminal device shown in FIG. 2. At this time, the processor 1101 may be the same as the processor 210 shown in FIG. 2, and the memory 1102 may be the same as the memory 220 shown in FIG. 2.

Those skilled in the art should understand that the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present application may take the form of a computer program product implemented on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code.

This application is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the application. It should be understood that each flow and/or block in the flowchart and/or block diagram and a combination of the flow and/or block in the flowchart and/or block diagram may be implemented by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, special-purpose computer, embedded processing machine, or other programmable data processing device to produce a machine that enables the generation of instructions executed by the processor of the computer or other programmable data processing device A device for realizing the functions specified in one block or multiple blocks of one flow or multiple flows of a flowchart and/or one block or multiple blocks of a block diagram.

These computer program instructions may also be stored in a computer readable memory that can guide a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer readable memory produce an article of manufacture including an instruction device, the instructions The device implements the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.

These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to produce computer-implemented processing, which is executed on the computer or other programmable device The instructions provide steps for implementing the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.

Although the preferred embodiments of the present application have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic inventive concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the present application.

Obviously, those skilled in the art can make various modifications and variations to the embodiments of the present application without departing from the scope of the embodiments of the present application. In this way, if these modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalent technologies, the present application is also intended to include these modifications and variations.

Claims

A neural network compression method, characterized in that it includes:

Determine the sparse unit length according to the processing capability information of the processing device, where the sparse unit length is the data length of one operation when the processing device performs matrix operation;

When performing the current training on the neural network model, the jth group weight obtained after the last training is adjusted according to the jth group weight referenced in the previous training to obtain the jth group weight referenced in the current training; The number of weights included in the set of j weights is the length of the sparse unit; the j takes any positive integer from 1 to m, where m is the weight of ownership of the neural network model in accordance with the sparse unit The total number of weight groups obtained after length grouping;

Perform the current training on the neural network model according to the obtained weights of the groups referred to in the current training.
The method according to claim 1, wherein determining the length of the sparse unit based on the processing capability information of the processing device includes:

Determine the length of the register in the processing device or the maximum length of data processed by the instruction set in the processing device at one time;

The length of the register or the maximum data length processed by the instruction set at a time is used as the sparse unit length.
The method according to claim 1 or 2, wherein before the first training of the neural network, the method further comprises:

According to the initial weight threshold of the initial neural network model, the ownership weight of the initial neural network model is tailored.
The method according to any one of claims 1 to 3, wherein adjusting the jth group weight obtained after the last training according to the jth group weight referred to in the previous training includes:

When the jth group weights referred to in the previous training are all zero, and the jth group weights obtained after the last training are all less than the zero-setting weight threshold, the jth group weights obtained after the last training Zero all; or

When the weights of the jth group referenced in the previous training are all zero, and not all the weights of the jth group obtained after the last training are less than the zero-setting weight threshold, the jth group obtained after the last training is maintained The group weight is unchanged; or

The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number of is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the last training are all less than the zero-setting weight threshold, the The weights of the non-zero values in the group j weights are all set to zero; or

The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the last training are not all less than the zero-setting weight threshold, keep the value obtained after the last training Group j weights remain unchanged; or

The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number is not less than the set proportion threshold, keep the jth group weight obtained after the previous training unchanged.
The method according to claim 4, wherein determining whether the jth group weights referred to in the previous training are all zero includes:

Determine whether the zero-setting mark corresponding to the j-th group of weights in the zero-setting mark data structure is zero;

When the zero-setting flag is zero, it is determined that the weight of the jth group referenced in the previous training is all zero;

When the zero-setting flag is a non-zero value, it is determined that the j-th group of weights referred to in the previous training is not all zero.
The method according to claim 4 or 5, characterized in that after all the j-th group weights obtained after the last training are set to zero, or after all non-zero weights are set to zero, include:

Update the zero mark corresponding to the j-th group of weights in the current zero mark data structure to zero; or

After keeping the j-th group weight obtained after the previous training unchanged, the method further includes:

Update the zero-setting mark corresponding to the j-th group of weights in the current zero-setting mark data structure to a non-zero value.
A data processing method, characterized in that it includes:

Obtain the weight of the target neural network model. The target neural network model is the final neural network model obtained by training the weighted neural network based on the sparse unit length and grouping the weight of the neural network model; the sparse unit length is based on processing by the processing device If the capability information is determined, the length of the sparse unit is the data length of one operation when performing matrix operation;

Perform the following processing based on the weights of the target neural network model:

During the p-th processing, it is judged whether the q-th group of weights are all zero, if so, the first operation result is generated and saved according to the matrix operation type or according to the matrix operation type and the matrix data to be processed, otherwise according to the q-th Group weight, the matrix data to be processed and the matrix operation type generate a second operation result and save it;

Among them, the number of weights included in the qth group of weights is the length of the sparse unit; the q takes any positive integer from 1 to f, and f is the weight of the target neural network model according to the sparse The total number of groups after unit length grouping; p takes any positive integer from 1 to f.
The method according to claim 7, wherein determining whether the q-th group weights are all zero includes:

Obtain the zero-mark data structure corresponding to the weight of the target neural network model;

Judging whether the zero-setting mark corresponding to the q-th group weight in the zero-setting mark data structure is zero.
A neural network compression device, characterized in that it includes:

A determining unit, configured to determine the sparse unit length according to the processing capability information of the processing device, where the sparse unit length is the data length of one operation when the processing device performs matrix operation;

The weight adjustment unit is used to adjust the weight of the jth group obtained after the last training according to the weight of the jth group referenced in the previous training when the current training of the neural network model is performed, to obtain the jth group of the current training reference Group weight; wherein, the number of weights included in the jth group weight is the length of the sparse unit; the j takes any positive integer from 1 to m, where m is the weight of ownership of the neural network model The total number of weight groups obtained after grouping according to the sparse unit length;

The training unit is configured to perform the current training on the neural network model according to each group of weights referenced by the current adjustment training unit obtained by the weight adjustment unit.
The apparatus according to claim 9, wherein the determining unit, when determining the sparse unit length according to the processing capability information of the processing device, is specifically used to:

Determine the length of the register in the processing device or the maximum length of data processed by the instruction set in the processing device at one time;

The length of the register or the maximum data length processed by the instruction set at a time is used as the sparse unit length.
The device according to claim 9 or 10, further comprising:

The weight trimming unit is configured to trim the weight of the initial neural network model according to the initial weight threshold of the initial neural network model before the training unit performs the first training on the neural network.
The apparatus according to any one of claims 9-11, wherein the weight adjustment unit adjusts the jth group weight obtained after the last training based on the jth group weight referred to in the previous training , Specifically for:

When the jth group weights referred to in the previous training are all zero, and the jth group weights obtained after the last training are all less than the zero-setting weight threshold, the jth group weights obtained after the last training Zero all; or

When the weights of the jth group referenced in the previous training are all zero, and not all the weights of the jth group obtained after the last training are less than the zero-setting weight threshold, the jth group obtained after the last training is maintained The group weight is unchanged; or

The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number of is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the last training are all less than the zero-setting weight threshold, the The weights of the non-zero values in the group j weights are all set to zero; or

The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the last training are not all less than the zero-setting weight threshold, keep the value obtained after the last training Group j weights remain unchanged; or

The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number is not less than the set proportion threshold, keep the jth group weight obtained after the previous training unchanged.
The apparatus according to claim 12, wherein the weight adjustment unit is specifically used to determine whether the weights of the jth group referred to in the previous training are all zero:

Determine whether the zero-setting mark corresponding to the j-th group of weights in the zero-setting mark data structure is zero;

When the zero-setting flag is zero, it is determined that the weight of the jth group referenced in the previous training is all zero;

When the zero-setting flag is a non-zero value, it is determined that the j-th group of weights referred to in the previous training is not all zero.
The apparatus according to claim 12 or 13, wherein the weight adjustment unit is further used to:

After the j-th set of weights obtained after the previous training are all set to zero, or after the non-zero-valued weights are all set to zero, the set corresponding to the j-th set of weights in the current zero-marking data structure is set The zero mark is updated to zero; or

The weight adjustment unit is also used to:

After keeping the j-th group weight obtained after the previous training unchanged, the zero-setting flag corresponding to the j-th group weight in the current zero-setting flag data structure is updated to a non-zero value.
A data processing device, characterized in that it includes:

An obtaining unit, used to obtain the weight of the target neural network model, the target neural network model is a final neural network model trained by grouping the weighted neural network of the neural network model based on the sparse unit length; the sparse unit length is Determined based on the processing capability information of the processing device, the sparse unit length is the data length of one operation when performing matrix operation;

The processing unit is configured to perform the following processing based on the weights of the target neural network model: in the pth processing, determine whether the qth group of weights are all zero, and if so, according to the matrix operation type or according to the matrix operation type and the pending The processed matrix data generates a first operation result and saves it; otherwise, generates and saves a second operation result according to the qth group weights, the matrix data to be processed, and the matrix operation type;

Among them, the number of weights included in the qth group of weights is the length of the sparse unit; the q takes any positive integer from 1 to f, and f is the weight of the target neural network model according to the sparse The total number of groups after unit length grouping; p takes any positive integer from 1 to f.
The apparatus according to claim 15, wherein the processing unit, when determining whether the q-th group weights are all zero, is specifically used for:

Obtain the zero-mark data structure corresponding to the weight of the target neural network model;

Judging whether the zero-setting mark corresponding to the q-th group weight in the zero-setting mark data structure is zero.
A neural network compression device, characterized in that it includes:

Memory, used to store program instructions;

A processor, configured to couple with the memory, call program instructions in the memory, and perform the following operations:

Determine the sparse unit length according to the processing capability information of the processing device, where the sparse unit length is the data length of one operation when the processing device performs matrix operation;

When performing the current training on the neural network model, the jth group weight obtained after the last training is adjusted according to the jth group weight referenced in the previous training to obtain the jth group weight referenced in the current training; The number of weights included in the set of j weights is the length of the sparse unit; the j takes any positive integer from 1 to m, where m is the weight of ownership of the neural network model in accordance with the sparse unit The total number of weight groups obtained after length grouping;

Perform the current training on the neural network model according to the obtained weights of the groups referred to in the current training.
The apparatus according to claim 17, wherein the processor, when determining the sparse unit length according to the processing capability information of the processing device, is specifically used to:

Determine the length of the register in the processing device or the maximum length of data processed by the instruction set in the processing device at one time;

The length of the register or the maximum data length processed by the instruction set at a time is used as the sparse unit length.
The apparatus according to claim 17 or 18, wherein the processor is further used to:

Before the first training of the neural network, according to the initial weight threshold of the initial neural network model, the ownership weight of the initial neural network model is tailored.
The apparatus according to any one of claims 17-19, wherein the processor, when adjusting the jth group weight obtained after the last training based on the jth group weight referenced by the last training, Specifically used for:

When the jth group weights referred to in the previous training are all zero, and the jth group weights obtained after the last training are all less than the zero-setting weight threshold, the jth group weights obtained after the last training Zero all; or

When the weights of the jth group referenced in the previous training are all zero, and not all the weights of the jth group obtained after the last training are less than the zero-setting weight threshold, the jth group obtained after the last training is maintained The group weight is unchanged; or

The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number of is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the last training are all less than the zero-setting weight threshold, the The weights of the non-zero values in the group j weights are all set to zero; or

The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number of is less than the set weight threshold, and the non-zero values in the j-th group of weights obtained after the last training are not all less than the zero-setting weight threshold, keep the value obtained after the last training Group j weights remain unchanged; or

The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number is not less than the set proportion threshold, keep the jth group weight obtained after the previous training unchanged.
The apparatus according to claim 20, wherein the processor, when judging whether the weights of the jth group referred to in the previous training are all zero, is specifically used for:

Determine whether the zero mark corresponding to the j-th group of weights in the zero mark data structure is zero;

When the zero-setting flag is zero, it is determined that the weights of the jth group referred to in the previous training are all zero;

When the zero-setting flag is a non-zero value, it is determined that the j-th group of weights referred to in the previous training is not all zero.
The apparatus according to claim 20 or 21, wherein the processor is further used to:

After the j-th set of weights obtained after the previous training are all set to zero, or after the non-zero-valued weights are all set to zero, the set corresponding to the j-th set of weights in the current zero-marking data structure is set The zero mark is updated to zero; or

The processor is also used to:

After keeping the j-th group weight obtained after the previous training unchanged, the zero-setting flag corresponding to the j-th group weight in the current zero-setting flag data structure is updated to a non-zero value.
A data processing device, characterized in that it includes:

Memory, used to store program instructions;

A processor, configured to couple with the memory, call program instructions in the memory, and perform the following operations:

Obtain the weight of the target neural network model. The target neural network model is the final neural network model obtained by training the weighted neural network based on the sparse unit length and grouping the weight of the neural network model; the sparse unit length is based on processing by the processing device If the capability information is determined, the length of the sparse unit is the data length of one operation when performing matrix operation;

The following processing is performed based on the weights of the target neural network model: in the pth processing, it is determined whether the qth group of weights are all zero, and if so, it is generated according to the matrix operation type or according to the matrix operation type and the matrix data to be processed Save the first operation result, otherwise generate and save the second operation result according to the qth group weights, the matrix data to be processed and the matrix operation type;

Among them, the number of weights included in the qth group of weights is the length of the sparse unit; the q takes any positive integer from 1 to f, and f is the weight of the target neural network model according to the sparse The total number of groups after unit length grouping; p takes any positive integer from 1 to f.
The apparatus of claim 23, wherein the processor, when determining whether the q-th group weights are all zero, is specifically used to:

Obtain the zero-mark data structure corresponding to the weight of the target neural network model;

Judging whether the zero-setting mark corresponding to the q-th group weight in the zero-setting mark data structure is zero.
A computer program product containing instructions, characterized in that, when the computer program product runs on a computer, it causes the computer to execute the method according to any one of claims 1-8.
A computer storage medium, characterized in that a computer program is stored in the computer storage medium, and when the computer program is executed by a computer, the computer is caused to execute the method according to any one of claims 1-8.
A chip, characterized in that the chip is coupled to a memory, and is used to read and execute program instructions stored in the memory to implement the method according to any one of claims 1-8.