WO2020133492A1 - Neural network compression method and apparatus - Google Patents

Neural network compression method and apparatus Download PDF

Info

Publication number
WO2020133492A1
WO2020133492A1 PCT/CN2018/125812 CN2018125812W WO2020133492A1 WO 2020133492 A1 WO2020133492 A1 WO 2020133492A1 CN 2018125812 W CN2018125812 W CN 2018125812W WO 2020133492 A1 WO2020133492 A1 WO 2020133492A1
Authority
WO
WIPO (PCT)
Prior art keywords
zero
weights
training
weight
group
Prior art date
Application number
PCT/CN2018/125812
Other languages
French (fr)
Chinese (zh)
Inventor
朱佳峰
刘刚毅
卢惠莉
高伟
芮祥麟
杨鋆源
夏军
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201880099983.5A priority Critical patent/CN113168554B/en
Priority to PCT/CN2018/125812 priority patent/WO2020133492A1/en
Publication of WO2020133492A1 publication Critical patent/WO2020133492A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of neural networks, and in particular to a neural network compression method and device.
  • deep learning technology is in full swing in the industry, and various industries are applying deep learning technology in their respective fields.
  • the deep learning model that is, the neural network model
  • the neural network is usually over-parameterized, and the deep learning model has obvious redundancy, which leads to computation and storage. waste.
  • the current industry has proposed a variety of compression methods, such as a variety of model sparse methods, these methods through pruning, quantization, etc., to reset the weight of the model weight matrix with weak expression Zero to achieve the purpose of simplifying model calculation and storage.
  • the value of each weight in the deep learning model is automatically learned based on the training set. Random sparseness is performed during the training process, and the weights cannot be sparsely targeted. Processing, so that subsequent processing equipment can only rely on the deep learning model obtained by random sparseness for data processing, can not be well adapted to the processing equipment's ability, and can not achieve a better processing effect.
  • the embodiments of the present application provide a neural network compression method and device to solve the problem that the prior art cannot adapt well to the processing equipment's ability and cannot achieve a better processing effect.
  • the present application provides a neural network compression method, which determines the sparse unit length according to the processing capability information of the processing device; then, when performing the current training on the neural network model, according to the jth set of weights referred to in the previous training , Adjust the j-th group weight obtained after the last training to obtain the j-th group weight referred to in the current training; wherein the length of the sparse unit is the data length of one operation when the processing device performs matrix operation, the first The number of weights included in the set of j weights is the length of the sparse unit; the j takes any positive integer from 1 to m, where m is the weight of ownership of the neural network model in accordance with the sparse unit The total number of weight groups obtained after length grouping;
  • the sparse unit length when performing neural network compression, can be determined based on the capability information of the processing device.
  • the weights after grouping based on the sparse unit length can be processed according to the capabilities of the processing device. The ability to adapt the neural network model to different processing equipment so that subsequent processing equipment can achieve better processing results.
  • the length of the sparse unit is determined according to the processing capability information of the processing device, and the specific method may be: determining the length of the register in the processing device or the maximum data length of the instruction set in the processing device, Then, the length of the register or the maximum data length processed by the instruction set at a time is used as the sparse unit length.
  • the sparse unit length can be accurately determined to adapt to the processing capability of the processing device.
  • the neural network compression device may further determine the bit width of the calculation unit in the processing device, and use the determined bit width of the calculation unit as the sparse unit length.
  • the computing unit may be, but not limited to, GPU, NPU, etc.
  • the sparse unit length can be accurately determined to adapt to the processing capability of the processing device.
  • the ownership weight of the initial neural network model is tailored before the first training of the neural network.
  • the neural network is first trimmed, which can save some processing processes in the subsequent training process and improve the calculation speed.
  • the jth group weight obtained after the last training is adjusted according to the jth group weight referenced in the last training, which may specifically include the following five situations:
  • the j-th group weights referred to in the previous training are not all zero, and the number of non-zero values in the j-th group weights obtained after the last training is obtained after the last training.
  • the proportion of the total number of j groups of weights is not less than the set proportion threshold, keep the jth group of weights obtained after the previous training unchanged.
  • the weights after the last training can be adjusted according to different actual conditions, so that the weight zero values of the neural network model obtained later are distributed more regularly, so that as many zero values as possible are continuously distributed in a set of weights In this way, when the neural network model is subsequently used for data processing, the time for accessing data is reduced, and the calculation speed is improved.
  • the zero-setting weight threshold may be determined based on the initial weight threshold, for example, the zero-setting weight threshold may set a multiple of the initial weight threshold, the setting The multiple is greater than 1. In this way, the value range of the current weight can be more closely matched in the subsequent judgment process.
  • determining whether the j-th group of weights referred to in the previous training are all zero may be: determine whether the zero-setting flag corresponding to the j-th group of weights in the zero-setting marker data structure is zero; when When the zero-setting mark is zero, it is determined that the jth group weights of the last training reference are all zero; when the zero-setting mark is non-zero value, it is determined that the jth group weights of the previous training reference are incomplete Is zero.
  • the current zero-setting flag data is also set
  • the zero-setting mark corresponding to the j-th group weight in the structure is updated to zero; or, after keeping the j-th group weight obtained after the last training unchanged, the j-th group weight in the current zero-marking data structure is also corresponded to The zero-setting flag of is updated to a non-zero value.
  • the zero-setting flags in the zero-setting flag data structure can be updated in real time, so that when the weight adjustment is performed, it can be more accurately judged whether the jth group weights referred to in the previous training are all zero.
  • the present application provides a data processing method to obtain the weights of the target neural network model, and perform the following processing based on the weights of the target neural network model: at the pth processing, determine whether the qth group of weights are all Is zero, if yes, generate and save the first operation result according to the matrix operation type or according to the matrix operation type and the matrix data to be processed, otherwise according to the qth group weight, the matrix data to be processed and the matrix
  • the operation type generates the second operation result and saves it;
  • the target neural network model is the final neural network model obtained by training the weighted neural network based on the sparse unit length and grouping the neural network model;
  • the sparse unit length is based on Determined by the processing capability information of the processing device, the sparse unit length is the data length of one operation when performing matrix operation;
  • the number of weights included in the qth group of weights is the sparse unit length; the q is taken from 1 to f Any positive integer in, where f is
  • the final neural network model obtained by training the neural network model after the weighting of the neural network is grouped, so that according to the characteristics of the matrix operation, the subsequent application of the When the final neural network model performs data processing, it can greatly reduce the amount of data access and calculation, which can increase the speed of operation.
  • a specific method for judging whether the q-th group weights are all zero may be: obtaining a zero-setting label data structure corresponding to the weight of the target neural network model; judging the zero-setting label data structure Whether the zero-setting mark corresponding to the q-th group weight is zero; specifically, when the zero-setting mark corresponding to the q-th group weight in the zero-setting mark data structure is zero, the q-th group's The weights are all zero; when the zero-setting flags corresponding to the weights of the q-th group in the zero-marking data structure are not zero, it is determined that the weights of the q-th group are not all zero.
  • the data processing device when the q-th group weights are all zero, the data processing device generates the first operation result according to the matrix operation type or according to the matrix operation type and the matrix data to be processed:
  • the matrix operation type is matrix multiplication
  • the data processing device directly obtains that the first operation result is zero;
  • the matrix operation type is matrix addition
  • the data processing device determines the matrix to be processed
  • the data is the result of the first operation. This can reduce the amount of data access and calculations, which can increase the speed of operation.
  • the present application also provides a neural network compression device, which has the function of implementing the method of the first aspect described above.
  • the function can be realized by hardware, or can also be realized by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the structure of the neural network compression device may include a determination unit, a weight adjustment unit, and a training unit, and these units may perform the corresponding functions in the method examples of the first aspect described above. For details, see the method examples of the first aspect The detailed description in is not repeated here.
  • the structure of the neural network compression device may include a processor and a memory, and the processor is configured to perform the method mentioned in the first aspect above.
  • the memory is coupled to the processor, and stores necessary program instructions and data of the neural network compression device.
  • the present application further provides a data processing device having the function of implementing the method of the second aspect.
  • the function can be realized by hardware, or can also be realized by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the structure of the data processing device may include an acquisition unit and a processing unit, and these units may perform the corresponding functions in the method examples of the second aspect described above. For details, see the detailed description in the method examples of the second aspect. I will not repeat them here.
  • the structure of the data processing apparatus may include a processor and a memory, and the processor is configured to perform the method mentioned in the second aspect above.
  • the memory is coupled to the processor, and stores necessary program instructions and data of the data processing device.
  • the present application also provides a computer storage medium that stores computer-executable instructions, which when used by the computer are used to cause the computer to execute the first Any one of the methods mentioned in one aspect or the second aspect.
  • the present application also provides a computer program product containing instructions, which when executed on a computer, causes the computer to perform any of the methods mentioned in the first aspect or the second aspect.
  • the present application further provides a chip coupled to a memory for reading and executing program instructions stored in the memory to implement any of the methods mentioned in the first aspect or the second aspect .
  • FIG. 1 is a schematic diagram of a neural network provided by an embodiment of this application.
  • FIG. 2 is a structural diagram of a terminal device provided by an embodiment of the present application.
  • FIG. 3 is a flowchart of a neural network compression method provided by an embodiment of this application.
  • FIG. 4 is a schematic diagram of a data structure and a weight matrix of a zero-setting mark provided by an embodiment of the present application
  • FIG. 5 is a schematic flowchart of a weight adjustment provided by an embodiment of the present application.
  • FIG. 6 is a flowchart of a data processing method provided by an embodiment of this application.
  • FIG. 7 is an example diagram of a data processing process provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a neural network compression device provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of a data processing device according to an embodiment of the present application.
  • FIG. 10 is a structural diagram of a neural network compression device provided by an embodiment of the present application.
  • FIG. 11 is a structural diagram of a data processing apparatus according to an embodiment of the present application.
  • the embodiments of the present application provide a neural network compression method and device to solve the problem that the prior art cannot adapt well to the processing equipment's ability and cannot achieve a better processing effect.
  • the method and the device described in this application are based on the same inventive concept. Since the principles of the method and the device to solve the problem are similar, the implementation of the device and the method can be referred to each other, and the repetition is not repeated.
  • a neural network consists of a large number of nodes (or neurons) connected to each other.
  • the neural network consists of an input layer, a hidden layer, and an output layer, such as shown in Figure 1.
  • the input layer is the input data of the neural network
  • the output layer is the output data of the neural network
  • the hidden layer is composed of many nodes connected between the input layer and the output layer, and is used to perform arithmetic processing on the input data.
  • the hidden layer may be composed of one or more layers. The number of hidden layers in the neural network and the number of nodes are directly related to the complexity of the problem actually solved by the neural network, the number of nodes in the input layer and the number of nodes in the output layer.
  • the embodiment of the present application may be referred to as a neural network compression device.
  • the neural network compression device may be, but not limited to, a personal computer (personal computer, PC) and other terminal devices, a server, a cloud service platform, etc.
  • a neural network model The deployed platform may be referred to as a data processing device.
  • the data processing device may be, but not limited to, a mobile phone, a tablet computer, a PC, and other terminal devices, but may also be but not limited to a server, etc.
  • FIG. 2 shows a possible terminal device applicable to the neural network method or the data processing method provided by the embodiments of the present application.
  • the terminal device includes: a processor 210, a memory 220, a communication module 230, and an input. Unit 240, display unit 250, power supply 260 and other components.
  • the terminal device provided in the embodiments of the present application may include more or fewer components than shown, or a combination of Components, or different component arrangements.
  • the communication module 230 may be connected to other devices through a wireless connection or a physical connection to implement data transmission and reception of terminal devices.
  • the communication module 230 may include any one or a combination of a radio frequency (RF) circuit, a wireless fidelity (WiFi) module, a communication interface, a Bluetooth module, etc. This embodiment of the present application does not make any limited.
  • the memory 220 can be used to store program instructions and data.
  • the processor 210 executes program instructions stored in the memory 220 to execute various functional applications and data processing of the terminal device.
  • program instructions there are program instructions that enable the processor 210 to execute the neural network compression method or the data processing method provided by the following embodiments of the present application.
  • the memory 220 may mainly include a program storage area and a data storage area.
  • the storage program area can store the operating system, various application programs, and program instructions;
  • the storage data area can store various data such as neural networks.
  • the memory 210 may include a high-speed random access memory, and may also include a non-volatile memory, such as a magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the input unit 240 may be used to receive information such as data or operation instructions input by the user.
  • the input unit 240 may include input devices such as a touch panel, function keys, a physical keyboard, a mouse, a camera, and a monitor.
  • the display unit 250 can realize human-computer interaction, and is used to display information input by the user and information provided to the user through the user interface.
  • the display unit 250 may include a display panel 251.
  • the display panel 251 may be configured in the form of a liquid crystal display (liquid crystal) (LCD), an organic light-emitting diode (OLED), or the like.
  • the touch panel can cover the display panel 251, and when the touch panel detects a touch event on or near it, it is transmitted to the processor 210 To determine the type of touch event to perform the corresponding operation.
  • the processor 210 is a control center of a computer device, and uses various interfaces and lines to connect the above components.
  • the processor 210 may execute the program instructions stored in the memory 220 and call the data stored in the memory 220 to complete various functions of the computer device and implement the neural network compression provided by the embodiments of the present application Method or data processing method.
  • the processor 210 may include one or more processing units. Specifically, the processor 210 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, application programs, etc., and the modem processor mainly handles wireless communication . It can be understood that the foregoing modem processor may not be integrated into the processor 210.
  • the processing unit may compress the neural network or process the data.
  • the processor 210 may be a central processing unit (CPU), a graphics processor (Graphics Processing Unit, GPU), or a combination of CPU and GPU.
  • the processor 210 may also be a network processor (network processor) unit (NPU), a tensor processor (tensor processing unit, TPU), and other artificial intelligence (AI) chips that support neural network processing.
  • the processor 210 may further include a hardware chip.
  • the hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), a digital signal processing device (DSP), or a combination thereof.
  • the PLD may be a complex programmable logic device (complex programmable logic device (CPLD), field programmable gate array (FPGA), general array logic (GAL) or any combination thereof.
  • the terminal device also includes a power supply 260 (such as a battery) for powering various components.
  • a power supply 260 (such as a battery) for powering various components.
  • the power supply 260 may be logically connected to the processor 210 through a power management system, so as to realize functions such as charging and discharging the terminal device through the power management system.
  • the terminal device may further include components such as a camera, a sensor, and an audio collector, which are not repeated here.
  • the foregoing terminal device is only an example of a device to which the neural network compression method or data processing method provided in the embodiments of the present application is applicable. It should be understood that the neural network compression method or data processing method provided in the embodiments of the present application may also be applied to other devices than the above terminal devices, which is not limited in this application.
  • a neural network compression method provided by an embodiment of the present invention can be applied to the terminal device shown in FIG. 2 or other devices (such as a server, etc.).
  • the neural network compression device whose execution subject is a neural network compression device is taken as an example to illustrate the neural network compression method provided by the present application.
  • the specific flow of the method may include:
  • Step 301 The neural network compression device determines the sparse unit length according to the processing capability information of the processing device, where the sparse unit length is the data length of one operation when the processing device performs matrix operation.
  • the processing device is a device for processing the data to be processed after the neural network compression device finally obtains the neural network model. It should be noted that the processing device may be applied to the data processing device involved in this application.
  • the training of the neural network model is for one processing device, so the processing capability information of the processing device can be pre-configured in the neural network compression device, so that the neural network compression device obtains for the processing device
  • the subsequent process is directly performed according to the capability information of the processing device.
  • the capability information of the processing device may be indicated by the capability of the processing device to process data.
  • the capability information of the processing device may be understood as capability information of a processor and a computing chip included in the processing device, where the processor or the computing chip may be, but not limited to, central processing Processor (central processing unit, CPU), graphics processor (Graphics Processing Unit, GPU), network processor (network processor unit, NPU), etc.
  • the processing device may also be a processor or a computing chip directly.
  • the capability information of the processing device may be embodied as a data length of one operation when the processing device performs matrix operation. Based on:
  • the neural network compression device determines the sparse unit length according to the processing capability information of the processing device.
  • the specific method may be: the neural network compression device determines the length of the register in the processing device or The maximum data length of the instruction set in the processing device at a time, and the length of the register or the maximum data length of the instruction set at a time is used as the sparse unit length.
  • the neural network compression device may further determine the bit width of the calculation unit in the processing device, and use the determined bit width of the calculation unit as the sparse unit length .
  • the calculation unit may be a GPU, NPU, or the like.
  • the neural network compression device may further determine one or more combinations of the bit widths of registers, caches, instruction sets, and calculation units in the processing device The maximum data length that can be supported, and the maximum data length that can be supported is used as the sparse unit length.
  • the neural network model can be specifically trained for different hardware devices, which can be more adapted to the processing capabilities of the hardware devices and achieve better results.
  • Step 302 When performing the current training on the neural network model, the neural network compression device adjusts the jth group of weights obtained after the last training according to the jth group of weights referenced in the previous training to obtain the current training reference Group j weights; wherein, the number of weights included in group j weights is the length of the sparse unit; the j takes any positive integer from 1 to m, where m is the neural network model The total number of weights obtained after grouping according to the sparse unit length is the total number of groups.
  • each time the neural network compression device performs training it obtains a continuous set of weights according to the sparse unit length to perform the training process. It can be understood that the neural network compression device groups the weight according to the sparse unit length.
  • the weight of the neural network model can be obtained first, and when the neural network compression device can directly obtain the specific data of the weight, the neural network model can also be obtained Model file, and parse the model file to obtain weighted data.
  • the neural network compression device may perform weighting on the initial neural network model according to the initial weight threshold of the initial neural network model. Tailoring.
  • the specific method for the neural network compression device to trim the weight of the initial neural network model may be: the neural network compression device separately obtains the weight of each layer of the initial neural network model Then, the weights of each layer are trimmed according to the initial weight threshold of each layer until the weights of all layers are trimmed.
  • the above process can be called a sparse process.
  • the above process can use a variety of commonly used matrix sparse methods, such as the pruning method mentioned in the paper "Learning both Weights and Connections for Efficient Neural Networks", and It may be the quantization method mentioned in the paper "Ternary Weights" or other methods, which is not specifically limited in this application.
  • the specific process may be that the neural network compression device The weights in each layer that are less than the initial weight threshold of each layer are set to zero, and the weights in each layer that are not less than the initial weight threshold of each layer are kept unchanged.
  • the neural network compression device before obtaining the weight of each layer of the initial neural network model, the neural network compression device needs to train the neural network to obtain the weight of the neural network, and then obtain the initial Neural network model.
  • the neural network is trained to obtain the weight in the neural network, which may be specifically: through data input and neural network model construction, the structure of the neural network and the weight in the neural network are obtained.
  • the neural network may be trained through commonly used deep learning frameworks, such as TensorFlow, Caffe, MXNet, PyTorch, and so on.
  • the neural network compression device adjusts the jth group weight obtained after the last training according to the jth group weight referenced in the last training, which may specifically include the following 5 cases:
  • the set specific gravity threshold may be 30%, etc., and may also be other values, which is not limited in this application.
  • the j-th group weights referred to in the previous training are not all zero, and the number of non-zero values in the j-th group weights obtained after the last training is obtained after the last training.
  • the neural network compression device Keep the jth group weight obtained after the previous training unchanged.
  • the distribution of zero values in the weight matrix of the final neural network model can be made more uniform, for example, continuous zero values can be distributed in a set of weights as much as possible, so that the subsequent application of the neural network model for data
  • the zero-value regular distribution is used during processing to greatly reduce the number of memory accesses and the amount of calculation, which in turn can increase the speed of calculation.
  • the jth group weight referenced in the last training can be understood as the jth group weight that needs training last time; the jth group weight obtained after adjusting the jth group weight obtained after the last training is the current time
  • the weight that needs to be trained that is, the weight that is referenced in the current training.
  • the jth group of weights referenced in the first training may be the jth group of weights of the initial neural network model.
  • the zero-setting weight threshold may be determined based on the initial weight threshold.
  • the zero-setting weight threshold may set a multiple of the initial weight threshold, The set multiple is greater than 1. For example, when the initial weight threshold is 1, the zero-setting threshold may be 1.05.
  • the neural network compression device maintains a zero-setting mark data structure, and a set of weights corresponding to each zero-setting mark in the zero-setting mark data structure (where each A set of weights can be called a weight matrix).
  • a weight matrix a set of weights corresponding to each zero-setting mark in the zero-setting mark data structure
  • each A set of weights can be called a weight matrix.
  • the zero mark data structure and weight matrix can be represented as shown in the schematic diagram in FIG. 4.
  • the weight of each consecutive sparse unit length in the weight matrix corresponds to 1 bit in a zero-mark data structure.
  • the sparse unit length is 4, every 4 The continuous weight corresponds to a zero-setting mark.
  • the specific method may be: the nerve The network compression device determines whether the zero-setting mark corresponding to the j-th set of weights in the zero-setting mark data structure is zero; when the zero-setting mark is zero, it is determined that the j-th set of weights referred to in the previous training are all zero ; When the zero-setting flag is a non-zero value, it is determined that the j-th group of weights referred to in the previous training is not all zero. For example, taking FIG. 4 as an example, in the data structure of the zero mark in FIG.
  • the first zero mark is 0, which means that the set of weights corresponding to the zero mark is all 0.
  • the weight matrix in FIG. The first 4 weights in a row (that is, the first group of weights, or the first weight matrix) can be seen that the corresponding group of weights are all 0.
  • the neural network compression device After the neural network compression device resets all the jth group weights obtained after the previous training to zero, or after all non-zero weights are set to zero, Update the zero-setting mark corresponding to the j-th group of weights in the current zero-setting mark data structure to zero. Similarly, in an optional implementation manner, after maintaining the jth group weight obtained after the last training unchanged, the neural network compression device changes the jth group weight in the current zero-marking data structure The corresponding zero mark is updated to a non-zero value (that is, 1).
  • the zero-setting mark in the zero-setting mark data structure can be updated in real time, so that the weights can be adjusted more accurately during the training process, and the subsequent processing device can accurately base on the data processing based on the neural network model Weights are used for data processing.
  • the above five situations may actually be a cyclic process.
  • the neural network compression device first determines whether the jth group weight referenced by the previous training is zero, and then performs subsequent processes according to the judgment results. According to the above five situations, Thereby, new weights of all groups of the neural network model are obtained, so that the neural network compression device subsequently trains the new weights.
  • a schematic diagram of a specific weight adjustment process may be shown in FIG. 5.
  • the weight of the neural network model when the weight of the neural network model is grouped according to the sparse unit length, there may be multiple cases: in one case, the neural network model The weights are grouped together evenly. During the grouping process, the number of remaining weights in the last group may be less than the length of the sparse unit. At this time, even if the number of weights in the last group is less than the length of the sparse unit, the weight of the group is processed.
  • the processing method of other groups of weights (the number is equal to the length of the sparse unit); another case is that the weight matrix composed of the weights of the neural network model is divided into rows (or columns) for each The weight of one row (or column) is grouped, so that when each row (or column) is grouped according to the sparse unit length, the number of weights in the last group in each row (or column) may also be less than the length of the sparse unit For the same reason, the processing method of the weight of the last group in each row (or column) is the same as the processing method of the weights of other groups (the number is equal to the length of the sparse unit).
  • Step 303 The neural network compression device performs the current training on the neural network model according to the obtained sets of weights referenced by the current training.
  • step 302 all group weights of the neural network model can be obtained, so that step 303 can be performed.
  • the method for performing step 303 by the neural network may refer to a commonly used neural network training method, which is not specifically described in this application.
  • the sparse unit length can be determined based on the capability information of the processing device, and the weights after grouping based on the sparse unit length are processed during the training process, According to the different capabilities of the processing equipment, the neural network model can be adapted to the capabilities of different processing equipment, so that the subsequent processing equipment can achieve a better processing effect.
  • the final neural network model obtained through the embodiment shown in FIG. 3 may be applied to a data processing device, so that the data processing device performs data processing based on the finally obtained neural network model.
  • an embodiment of the present application also provides a data processing method, which is implemented based on the final neural network model obtained in the embodiment shown in FIG. 3.
  • the data processing method provided by the present application is explained by taking an execution subject as a data processing device as an example.
  • the specific flow of the method may include the following steps:
  • Step 601 The data processing device obtains the weight of the target neural network model, the target neural network model is a final neural network model obtained by training the weighted neural network of the neural network model based on the sparse unit length after grouping; the sparse unit length It is determined based on the processing capability information of the processing device, and the sparse unit length is the data length of one operation when performing matrix operation.
  • the processing device is the data processing device here, and for a specific method for determining the sparse unit length based on the processing capability information of the processing device, reference may also be made to the related reference in the embodiment shown in FIG. 3 The method will not be repeated here.
  • Step 602 Perform the following processing based on the weights of the target neural network model: in the pth processing, determine whether the qth group of weights are all zero, and if so, according to the matrix operation type or according to the matrix operation type and the to-be-processed
  • the matrix data generates the first operation result and saves it; otherwise, generates and saves the second operation result according to the qth group weights, the matrix data to be processed, and the matrix operation type.
  • the number of weights included in the qth group of weights is the length of the sparse unit; the q takes any positive integer from 1 to f, and f is the weight of the target neural network model according to the sparse The total number of groups after unit length grouping; p takes any positive integer from 1 to f.
  • the data processing device determines whether the q-th group weights are all zero, first obtain a zero-setting label data structure corresponding to the weight of the target neural network model, and then determine the zero-setting Mark whether the zero-setting mark corresponding to the qth group of weights in the data structure is zero. Specifically, when the zero-setting mark corresponding to the weight of the q-th group in the zero-marking data structure is zero, the data processing device determines that the weights of the q-th group are all zero; When the zero-setting flag corresponding to the weight of the q-th group in the zero-mark data structure is not zero, the data processing device determines that the weights of the q-th group are not all zero. For example, as shown in FIG. 4, when the zero-setting mark corresponding to the q-th group weight is acquired as the first zero-setting mark, since the zero-setting mark is 0, it is determined that the q-th group weights are all zero.
  • the target neural network model is adapted to the data processing device, information about the target neural network model (such as the zero-mark data structure) has been pre-configured in the data processing device in.
  • information about the target neural network model (such as the zero-mark data structure) has been pre-configured in the data processing device in.
  • the data processing device when the q-th group weights are all zero, the data processing device generates the first operation result according to the matrix operation type or according to the matrix operation type and the matrix data to be processed: when the When the matrix operation type is matrix multiplication, the data processing device directly obtains that the first operation result is zero; when the matrix operation type is matrix addition, the data processing device determines that the matrix data to be processed is all The first operation result is described.
  • the data processing device when the q-th group weights are not all zero, the data processing device generates a second operation result according to the q-th group weights, the matrix data to be processed, and the matrix operation type,
  • a specific method is: the data processing device loads the qth group weights and the matrix data to be processed into a register, and then loads the qth group weights and the matrix to be processed according to the matrix operation type The data is subjected to a corresponding matrix operation to generate the second operation result.
  • the final processing result can be generated.
  • the above processing process is a cyclic process, and the above processing is performed for each group of weights until the weights of all groups are traversed.
  • a specific data processing process may be shown in the schematic diagram in FIG. 7.
  • the weights of the neural network model are grouped and the final neural network model is trained after the neural network is grouped
  • the subsequent application of the final neural network model for data processing can greatly reduce the amount of data access and calculation, thereby improving the speed of operation.
  • the embodiments of the present application further provide a neural network compression device, which is used to implement the neural network compression method provided in the embodiment shown in FIG. 3.
  • the neural network compression device 800 includes a determination unit 801, a weight adjustment unit 802, and a training unit 803, where:
  • the determining unit 801 is used to determine the sparse unit length according to the processing capability information of the processing device, and the sparse unit length is the data length of one operation when the processing device performs matrix operation;
  • the weight adjustment unit 802 is used to When performing the current training on the neural network model, the jth group weight obtained after the last training is adjusted according to the jth group weight referenced in the previous training to obtain the jth group weight referenced in the current training;
  • the number of weights included in the set of j weights is the length of the sparse unit; the j takes any positive integer from 1 to m, where m is the weight of ownership of the neural network model in accordance with the sparse unit
  • the training unit 803 is configured to perform the current training of the neural network model according to the weights of the groups referenced by the current adjustment training unit obtained by the weight adjustment unit.
  • the determining unit 801 determines the length of the register in the processing device or the instruction set in the processing device for processing at a time Maximum data length; the length of the register or the maximum data length processed by the instruction set at a time is used as the sparse unit length.
  • the neural network compression device may further include a weight trimming unit, the weight trimming unit is used to first train the neural network according to the initial neural network model before the training unit The initial weight threshold of, trims the initial weight of the initial neural network model.
  • the weight adjustment unit 802 when the weight adjustment unit 802 adjusts the jth group of weights obtained after the last training according to the jth group of weights referenced in the previous training, the weight adjustment unit 802 may be specifically classified into the following types: happening:
  • the jth group weights referred to in the previous training are all zero, and the jth group weights obtained after the last training are all less than the zero-setting weight threshold, the jth group weights obtained after the last training Zero all; or
  • the jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training
  • the proportion of the total number of is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the last training are all less than the zero-setting weight threshold
  • the The weights of the non-zero values in the group j weights are all set to zero; or
  • the jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training
  • the proportion of the total number is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the last training are not all less than the zero-setting weight threshold, keep the value obtained after the last training Group j weights remain unchanged; or
  • the jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number is not less than the set proportion threshold, keep the jth group weight obtained after the previous training unchanged.
  • the weight adjustment unit 802 is specifically configured to determine whether the j-th group of weights in the zero-mark data structure corresponds to the j-th group of weights referenced in the previous training is all zero Whether the zero-setting flag is zero; when the zero-setting flag is zero, it is determined that the jth group of weights referred to in the previous training are all zero; when the zero-setting flag is a non-zero value, it is determined that the upper The weights of the jth group referred to in one training are not all zero.
  • the weight adjustment unit 802 is further used to set the weights of the j-th group obtained after the last training to zero, or to set the weights of the non-zero values to all After zero, the zero-setting flag corresponding to the j-th group of weights in the current zero-setting flag data structure is updated to zero; or, the weight adjustment unit 802 is further used to maintain the j-th group of weights obtained after the previous training After unchanged, the zero-setting mark corresponding to the j-th group of weights in the current zero-setting mark data structure is updated to a non-zero value.
  • the sparse unit length can be determined based on the capability information of the processing device, and the weights after grouping based on the sparse unit length can be processed during the training process.
  • the neural network model can be adapted to the capabilities of different processing equipment, so that the subsequent processing equipment can achieve a better processing effect.
  • the embodiments of the present application further provide a data processing apparatus, which is used to implement the data processing method provided in the embodiment shown in FIG. 6.
  • the data processing apparatus 900 includes an acquiring unit 901 and a processing unit 902, where:
  • the obtaining unit 901 is used to obtain the weight of the target neural network model.
  • the target neural network model is a final neural network model obtained by training the weighted neural network of the neural network model based on the sparse unit length after grouping;
  • the processing unit 902 It is used to perform the following processing based on the weights of the target neural network model: in the pth processing, determine whether the qth group of weights are all zero, and if so, according to the matrix operation type or according to the matrix operation type and the matrix to be processed Generate and save the first operation result of the data, otherwise generate and save the second operation result according to the qth group weights, the matrix data to be processed and the matrix operation type; wherein, the length of the sparse unit is based on processing Determined by the processing capability information of the device, the sparse unit length is the data length of one operation when performing matrix operation; the number of weights included in the qth group of weights is the sparse unit length; the q is taken from 1 to f Any
  • the processing unit 902 is specifically configured to: when determining whether the q-th group weights are all zero: obtain a zero-labeled data structure corresponding to the weight of the target neural network model; determine the Whether the zero-setting mark corresponding to the q-th group weight in the zero-setting mark data structure is zero.
  • the final neural network model obtained by training the neural network model after weighting the neural network model is grouped In this way, according to the characteristics of the matrix operation, the subsequent application of the final neural network model for data processing can greatly reduce the amount of data access and calculation, thereby improving the speed of operation.
  • the division of the units in the embodiments of the present application is schematic, and is only a division of logical functions, and there may be other division manners in actual implementation.
  • the functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or software function unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solution of the present application essentially or part of the contribution to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods described in the embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .
  • the embodiments of the present application further provide a neural network compression device, which is used to implement the neural network compression method shown in FIG. 3.
  • the neural network compression device 1000 includes: a processor 1001 and a memory 1002, where:
  • the processor 1001 may be a CPU, GPU, or a combination of CPU and GPU.
  • the processor 1001 may also be an AI chip that supports neural network processing such as NPU, TPU, and so on.
  • the processor 1001 may further include a hardware chip.
  • the above hardware chip may be ASIC, PLD, DSP or a combination thereof.
  • the above PLD may be CPLD, FPGA, GAL or any combination thereof. It should be noted that the processor 1001 is not limited to the above enumerated cases, and the processor 1001 may be any processing device capable of implementing the neural network compression method shown in FIG. 3 described above.
  • the processor 1001 and the memory 1002 are connected to each other.
  • the processor 1001 and the memory 1002 are connected to each other through a bus 1003;
  • the bus 1003 may be a peripheral component interconnection standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard structure (Extended Industry Standard Architecture) , EISA) bus and so on.
  • PCI peripheral component interconnection standard
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into an address bus, a data bus, and a control bus. For ease of representation, only a thick line is used in FIG. 10, but it does not mean that there is only one bus or one type of bus.
  • processor 1001 When the processor 1001 is used to implement the neural network compression method provided by the embodiment of the present application, it performs the following operations:
  • the jth group weight obtained after the last training is adjusted according to the jth group weight referenced in the previous training to obtain the jth group weight referenced in the current training;
  • the number of weights included in the set of j weights is the length of the sparse unit;
  • the j takes any positive integer from 1 to m, where m is the weight of ownership of the neural network model in accordance with the sparse unit The total number of weight groups obtained after length grouping;
  • the processor 1001 may also perform other operations. For details, reference may be made to the specific descriptions involved in step 301, step 302, and step 303 in the embodiment shown in FIG. 3 above. Repeat again.
  • the memory 1002 is used to store programs and data.
  • the program may include program code, and the program code includes instructions for computer operation.
  • the memory 1002 may include random access memory (random access memory, RAM), or may also include non-volatile memory (non-volatile memory), for example, at least one disk memory.
  • the processor 1001 executes the program stored in the memory 1002 to realize the above-mentioned functions, thereby implementing the neural network compression method shown in FIG. 3.
  • the neural network compression device shown in FIG. 10 when the neural network compression device shown in FIG. 10 can be applied to a terminal device, the neural network compression device may be embodied as the terminal device shown in FIG. 2.
  • the processor 1001 may be the same as the processor 210 shown in FIG. 2
  • the memory 1002 may be the same as the memory 220 shown in FIG. 2.
  • an embodiment of the present application further provides a data processing apparatus, which is used to implement the data processing method shown in FIG. 4.
  • the data processing device 1100 includes a processor 1101 and a memory 1102, where:
  • the processor 1101 may be a CPU, GPU, or a combination of CPU and GPU.
  • the processor 1101 may also be an AI chip that supports neural network processing such as NPU, TPU, and so on.
  • the processor 1101 may further include a hardware chip.
  • the above hardware chip may be ASIC, PLD, DSP or a combination thereof.
  • the above PLD may be CPLD, FPGA, GAL or any combination thereof. It should be noted that the processor 1101 is not limited to the above-mentioned cases, and the processor 1101 may be any processing device capable of implementing neural network inference operation.
  • the processor 1101 and the memory 1102 are connected to each other.
  • the processor 1101 and the memory 1102 are connected to each other through a bus 1103;
  • the bus 1103 may be a Peripheral Component Interconnect (PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture) , EISA) bus and so on.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into an address bus, a data bus, and a control bus. For ease of representation, only a thick line is used in FIG. 11, but it does not mean that there is only one bus or one type of bus.
  • processor 1101 When the processor 1101 is used to implement the data processing method provided by the embodiment of the present application, it may perform the following operations:
  • the target neural network model is the final neural network model obtained by training the weighted neural network based on the sparse unit length and grouping the weight of the neural network model; the sparse unit length is based on processing by the processing device If the capability information is determined, the length of the sparse unit is the data length of one operation when performing matrix operation;
  • the following processing is performed based on the weights of the target neural network model: in the pth processing, it is determined whether the qth group of weights are all zero, and if so, it is generated according to the matrix operation type or according to the matrix operation type and the matrix data to be processed Save the first operation result, otherwise generate and save the second operation result according to the qth group weights, the matrix data to be processed and the matrix operation type;
  • the number of weights included in the qth group of weights is the length of the sparse unit; the q takes any positive integer from 1 to f, and f is the weight of the target neural network model according to the sparse The total number of groups after unit length grouping; p takes any positive integer from 1 to f.
  • processor 1101 may also perform other operations. For details, reference may be made to the specific descriptions involved in step 601 and step 602 in the embodiment shown in FIG. 6 above, and details are not described herein again.
  • the memory 1102 is used to store programs and data.
  • the program may include program code, and the program code includes instructions for computer operation.
  • the memory 1102 may include random access memory (random access memory, RAM), or may also include non-volatile memory (non-volatile memory), for example, at least one disk memory.
  • the processor 1101 executes the program stored in the memory 1102 to realize the above functions, thereby implementing the data processing method shown in FIG. 6.
  • the data processing apparatus shown in FIG. 11 can be applied to a terminal device, the data processing apparatus may be embodied as the terminal device shown in FIG. 2.
  • the processor 1101 may be the same as the processor 210 shown in FIG. 2
  • the memory 1102 may be the same as the memory 220 shown in FIG. 2.
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present application may take the form of a computer program product implemented on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code.
  • computer usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer readable memory that can guide a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer readable memory produce an article of manufacture including an instruction device, the instructions
  • the device implements the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to produce computer-implemented processing, which is executed on the computer or other programmable device
  • the instructions provide steps for implementing the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

A neural network compression method and apparatus, used to solve the problem in the prior art that it is not possible to effectively adapt to the capability of a processing device and achieve a better processing effect. The method comprises: determining a sparse unit length according to processing capability information of a processing device; when performing a current round of training on a neural network model, according to a jth set of weights referenced in a previous round of training, adjusting the jth set of weights obtained after the previous round of training, and obtaining a jth set of weights referenced in the current round of training; performing the current round of training on the neural network model according to various obtained sets of weights referenced in the current round of training. The sparse unit length is the data length of one operation when the processing device performs matrix operations, the number of weights included in the jth set of weights is the sparse unit length, j is any positive integer from 1 to m, and m is the total number of sets of weights obtained after grouping all the weights of the neural network model according to the sparse unit length.

Description

一种神经网络压缩方法及装置Neural network compression method and device 技术领域Technical field
本申请涉及神经网络领域,尤其涉及一种神经网络压缩方法及装置。This application relates to the field of neural networks, and in particular to a neural network compression method and device.
背景技术Background technique
当前深度学习技术在业界上如火如荼,各行业都在将深度学习技术应用于各自的领域中。众所周知,深度学习模型(也即神经网络模型)在运行时,涉及到大量的浮点型的矩阵运算,而神经网络通常过度参数化,深度学习模型存在明显的冗余,这导致计算和存储的浪费。为了简化模型的计算和存储空间,当前业界提出了多种压缩方法,例如多种模型稀疏化的方法,这些方法通过剪枝、量化等方式,将模型权重矩阵中表达力不强的权重置零,以达到简化模型计算和存储的目的。At present, deep learning technology is in full swing in the industry, and various industries are applying deep learning technology in their respective fields. As we all know, when the deep learning model (that is, the neural network model) is run, it involves a large number of floating-point matrix operations, and the neural network is usually over-parameterized, and the deep learning model has obvious redundancy, which leads to computation and storage. waste. In order to simplify the calculation and storage space of the model, the current industry has proposed a variety of compression methods, such as a variety of model sparse methods, these methods through pruning, quantization, etc., to reset the weight of the model weight matrix with weak expression Zero to achieve the purpose of simplifying model calculation and storage.
目前,例如在对深度学习模型进行稀疏化时,深度学习模型中每个权重的值都是根据训练集自动学习得到,在训练过程中进行随机的稀疏化,并不能针对性的对权重进行稀疏化处理,使得后续处理设备只能依赖随机稀疏化得到的深度学习模型进行数据处理,不能很好地适配处理设备的能力,达不到较好的处理效果。At present, for example, when sparseing the deep learning model, the value of each weight in the deep learning model is automatically learned based on the training set. Random sparseness is performed during the training process, and the weights cannot be sparsely targeted. Processing, so that subsequent processing equipment can only rely on the deep learning model obtained by random sparseness for data processing, can not be well adapted to the processing equipment's ability, and can not achieve a better processing effect.
发明内容Summary of the invention
本申请实施例提供了一种神经网络压缩方法及装置,用以解决现有技术中不能很好地适配处理设备的能力,达不到较好的处理效果的问题。The embodiments of the present application provide a neural network compression method and device to solve the problem that the prior art cannot adapt well to the processing equipment's ability and cannot achieve a better processing effect.
第一方面,本申请提供了一种神经网络压缩方法,根据处理设备的处理能力信息确定稀疏化单位长度;之后在对神经网络模型进行当前次训练时,根据上一次训练参照的第j组权重,对上一次训练后得到的第j组权重进行调整,得到当前次训练参照的第j组权重;其中,所述稀疏化单位长度为所述处理设备进行矩阵运算时一次运算的数据长度,第j组权重包括的权重个数为所述稀疏化单位长度;所述j取遍1至m中的任意一个正整数,所述m为对所述神经网络模型的所有权重按照所述稀疏化单位长度分组后得到的权重总组数;In the first aspect, the present application provides a neural network compression method, which determines the sparse unit length according to the processing capability information of the processing device; then, when performing the current training on the neural network model, according to the jth set of weights referred to in the previous training , Adjust the j-th group weight obtained after the last training to obtain the j-th group weight referred to in the current training; wherein the length of the sparse unit is the data length of one operation when the processing device performs matrix operation, the first The number of weights included in the set of j weights is the length of the sparse unit; the j takes any positive integer from 1 to m, where m is the weight of ownership of the neural network model in accordance with the sparse unit The total number of weight groups obtained after length grouping;
根据得到的当前次训练参照的各组权重,对所述神经网络模型进行当前次训练。Perform the current training on the neural network model according to the obtained weights of the groups referred to in the current training.
通过上述方法,在进行神经网络压缩时,可以基于处理设备的能力信息去确定稀疏化单位长度,在训练过程中对基于稀疏化单位长度分组后的权重进行处理,可以根据处理设备的能力不同,使神经网络模型适配不同的处理设备的能力,以使后续处理设备达到比较好的处理效果。Through the above method, when performing neural network compression, the sparse unit length can be determined based on the capability information of the processing device. During the training process, the weights after grouping based on the sparse unit length can be processed according to the capabilities of the processing device. The ability to adapt the neural network model to different processing equipment so that subsequent processing equipment can achieve better processing results.
在一种可能的设计中,根据处理设备的处理能力信息确定稀疏化单位长度,具体方法可以为:确定所述处理设备中寄存器的长度或者所述处理设备中指令集一次处理的最大数据长度,然后将所述寄存器的长度或者所述指令集一次处理的最大数据长度作为所述稀疏化单位长度。In a possible design, the length of the sparse unit is determined according to the processing capability information of the processing device, and the specific method may be: determining the length of the register in the processing device or the maximum data length of the instruction set in the processing device, Then, the length of the register or the maximum data length processed by the instruction set at a time is used as the sparse unit length.
通过上述方法,可以准确地确定所述稀疏化单位长度,以适配所述处理设备的处理能力。Through the above method, the sparse unit length can be accurately determined to adapt to the processing capability of the processing device.
在一种可能的设计中,所述神经网络压缩装置还可以确定所述处理设备中的计算单元 的位宽,并将确定的所述计算单元的位宽作为所述稀疏化单位长度。其中,所述计算单元可以但不限于为GPU、NPU等等。In a possible design, the neural network compression device may further determine the bit width of the calculation unit in the processing device, and use the determined bit width of the calculation unit as the sparse unit length. Wherein, the computing unit may be, but not limited to, GPU, NPU, etc.
通过上述方法,可以准确地确定所述稀疏化单位长度,以适配所述处理设备的处理能力。Through the above method, the sparse unit length can be accurately determined to adapt to the processing capability of the processing device.
在一种可能的设计中,在对所述神经网络进行首次训练之前,根据初始神经网络模型的初始权重阈值,对所述初始神经网络模型的所有权重进行剪裁。In a possible design, before the first training of the neural network, according to the initial weight threshold of the initial neural network model, the ownership weight of the initial neural network model is tailored.
通过上述方法,先对神经网络进行一次剪裁,可以节省后续训练过程中的一些处理流程,提高运算速度。Through the above method, the neural network is first trimmed, which can save some processing processes in the subsequent training process and improve the calculation speed.
在一种可能的设计中,根据上一次训练参照的第j组权重,对上一次训练后得到的第j组权重进行调整,具体可以包括以下五种情况:In a possible design, the jth group weight obtained after the last training is adjusted according to the jth group weight referenced in the last training, which may specifically include the following five situations:
情况1、在所述上一次训练参照的第j组权重全部为零、且所述上一次训练后得到的第j组权重全部小于置零权重阈值时,将所述上一次训练后得到的第j组权重全部置零; Case 1. When the jth group weights referred to in the previous training are all zero, and the jth group weights obtained after the last training are all less than the zero-setting weight threshold, the The weights of group j are all set to zero;
情况2、在所述上一次训练参照的第j组权重全部为零、且所述上一次训练后得到的第j组权重不全部都小于置零权重阈值时,保持所述上一次训练后得到的第j组权重不变;Case 2: When all the jth group weights referred to in the previous training are all zero, and not all the jth group weights obtained after the last training are less than the zero-setting weight threshold, keep the obtained after the last training The weight of group j remains unchanged;
情况3、在所述上一次训练参照的第j组权重不全部为零、且所述上一次训练后得到的第j组权重中非零值的个数在所述上一次训练后得到的第j组权重的总个数中所占比重小于设定比重阈值、且所述上一次训练后得到的第j组权重中的非零值均小于置零权重阈值时,将所述上一次训练后得到的第j组权重中的所述非零值的权重均置零; Case 3. The jth group weights referred to in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is obtained after the last training. When the proportion of the total number of j groups of weights is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the previous training are all less than the zero-setting weight threshold, the last training The weights of the non-zero values in the obtained j-th group weights are all set to zero;
情况4、在所述上一次训练参照的第j组权重不全部为零、且所述上一次训练后得到的第j组权重中非零值的个数在所述上一次训练后得到的第j组权重的总个数中所占比重小于设定比重阈值、且所述上一次训练后得到的第j组权重中的非零值不均小于置零权重阈值时,保持所述上一次训练后得到的第j组权重不变; Case 4. The j-th group weights referenced in the previous training are not all zero, and the number of non-zero values in the j-th group weights obtained after the last training is obtained after the last training. When the proportion of the total number of j groups of weights is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the last training are not all less than the zero-setting weight threshold, the previous training is maintained The weight of the jth group obtained afterwards remains unchanged;
情况5、在所述上一次训练参照的第j组权重不全部为零、且所述上一次训练后得到的第j组权重中非零值的个数在所述上一次训练后得到的第j组权重的总个数中所占比重不小于设定比重阈值时,保持所述上一次训练后得到的第j组权重不变。 Case 5. The j-th group weights referred to in the previous training are not all zero, and the number of non-zero values in the j-th group weights obtained after the last training is obtained after the last training. When the proportion of the total number of j groups of weights is not less than the set proportion threshold, keep the jth group of weights obtained after the previous training unchanged.
通过上述方法,可以根据不同的实际情况对上一次训练后的权重进行调整,以使后续得到神经网络模型的权重零值更规则地分布,以使尽可能多的零值连续分布在一组权重中,从而可以使后续应用该神经网络模型进行数据处理时,减少访问数据时间,提高运算速度。Through the above method, the weights after the last training can be adjusted according to different actual conditions, so that the weight zero values of the neural network model obtained later are distributed more regularly, so that as many zero values as possible are continuously distributed in a set of weights In this way, when the neural network model is subsequently used for data processing, the time for accessing data is reduced, and the calculation speed is improved.
在一种可能的设计中,所述置零权重阈值可以是基于所述初始权重阈值确定的,例如,所述置零权重阈值可以为所述初始权重阈值设定倍数的值,所述设定倍数大于1。这样可以使后续判断过程中更加贴合当前权重的取值范围。In a possible design, the zero-setting weight threshold may be determined based on the initial weight threshold, for example, the zero-setting weight threshold may set a multiple of the initial weight threshold, the setting The multiple is greater than 1. In this way, the value range of the current weight can be more closely matched in the subsequent judgment process.
在一种可能的设计中,判断所述上一次训练参照的第j组权重是否全部为零具体方法可以为:确定置零标记数据结构中第j组权重对应的置零标记是否为零;当所述置零标记为零时,判定所述上一次训练参照的第j组权重全部为零;当所述置零标记为非零值时,判定所述上一次训练参照的第j组权重不全为零。In a possible design, determining whether the j-th group of weights referred to in the previous training are all zero. The specific method may be: determine whether the zero-setting flag corresponding to the j-th group of weights in the zero-setting marker data structure is zero; when When the zero-setting mark is zero, it is determined that the jth group weights of the last training reference are all zero; when the zero-setting mark is non-zero value, it is determined that the jth group weights of the previous training reference are incomplete Is zero.
通过上述方法可以准确地确定上一次训练参照的第j组权重是否全部为零,以根据判断结果进行后续处理。Through the above method, it can be accurately determined whether the jth group weights referenced in the previous training are all zero, so as to perform subsequent processing according to the judgment result.
在一种可能的设计中,在将所述上一次训练后得到的第j组权重全部置零之后,或者在将所述非零值的权重均置零之后,还将当前的置零标记数据结构中第j组权重对应的置 零标记更新为零;或者,在保持所述上一次训练后得到的第j组权重不变之后,还将当前的置零标记数据结构中第j组权重对应的置零标记更新为非零值。In a possible design, after the j-th group of weights obtained after the previous training are all set to zero, or after the weights of the non-zero values are all set to zero, the current zero-setting flag data is also set The zero-setting mark corresponding to the j-th group weight in the structure is updated to zero; or, after keeping the j-th group weight obtained after the last training unchanged, the j-th group weight in the current zero-marking data structure is also corresponded to The zero-setting flag of is updated to a non-zero value.
通过上述方法,可以实时更新所述置零标记数据结构中的置零标记,以使进行权重调整时可以更准确地判断上一次训练参照的第j组权重是否全部为零。Through the above method, the zero-setting flags in the zero-setting flag data structure can be updated in real time, so that when the weight adjustment is performed, it can be more accurately judged whether the jth group weights referred to in the previous training are all zero.
第二方面,本申请提供了一种数据处理方法,获取目标神经网络模型的权重,并基于所述目标神经网络模型的权重进行如下处理:在第p次处理时,判断第q组权重是否全部为零,若是则根据矩阵运算类型或者根据所述矩阵运算类型和待处理的矩阵数据生成第一运算结果并保存,否则根据所述第q组权重、所述待处理的矩阵数据和所述矩阵运算类型生成第二运算结果并保存;其中,所述目标神经网络模型是基于稀疏化单位长度对神经网络模型的权重神经网络分组后训练得到的最终神经网络模型;所述稀疏化单位长度为基于处理设备的处理能力信息确定的,所述稀疏化单位长度为进行矩阵运算时一次运算的数据长度;第q组权重包括的权重的个数为稀疏化单位长度;所述q取遍1至f中的任意一个正整数,所述f为所述目标神经网络模型的所有权重按照所述稀疏化单位长度分组后的总组数;所述p取遍1到f中的任意一个正整数。In the second aspect, the present application provides a data processing method to obtain the weights of the target neural network model, and perform the following processing based on the weights of the target neural network model: at the pth processing, determine whether the qth group of weights are all Is zero, if yes, generate and save the first operation result according to the matrix operation type or according to the matrix operation type and the matrix data to be processed, otherwise according to the qth group weight, the matrix data to be processed and the matrix The operation type generates the second operation result and saves it; wherein, the target neural network model is the final neural network model obtained by training the weighted neural network based on the sparse unit length and grouping the neural network model; the sparse unit length is based on Determined by the processing capability information of the processing device, the sparse unit length is the data length of one operation when performing matrix operation; the number of weights included in the qth group of weights is the sparse unit length; the q is taken from 1 to f Any positive integer in, where f is the total number of groups in which the weight of the target neural network model is grouped according to the sparse unit length; p takes any positive integer from 1 to f.
通过上述方法,由于应用处理设备的处理能力信息得到的稀疏化单位长度,对神经网络模型的权重神经网络分组后训练得到的最终神经网络模型,这样可以根据矩阵运算的特性,可以使后续应用该最终神经网络模型进行数据处理时,可以大量减少对数据的访问和计算量,从而可以提高运算速度。Through the above method, due to the sparse unit length obtained by applying the processing capacity information of the processing equipment, the final neural network model obtained by training the neural network model after the weighting of the neural network is grouped, so that according to the characteristics of the matrix operation, the subsequent application of the When the final neural network model performs data processing, it can greatly reduce the amount of data access and calculation, which can increase the speed of operation.
在一种可能的设计中,判断第q组权重是否全部为零的具体方法可以为:获取所述目标神经网络模型的权重对应的置零标记数据结构;判断所述置零标记数据结构中所述第q组权重对应的置零标记是否为零;具体的,当所述置零标记数据结构中所述第q组的权重对应的置零标记为零时,则判定所述第q组的权重全部为零;当所述置零标记数据结构中所述第q组的权重对应的置零标记不为零时,则判定所述第q组的权重不全部为零。In a possible design, a specific method for judging whether the q-th group weights are all zero may be: obtaining a zero-setting label data structure corresponding to the weight of the target neural network model; judging the zero-setting label data structure Whether the zero-setting mark corresponding to the q-th group weight is zero; specifically, when the zero-setting mark corresponding to the q-th group weight in the zero-setting mark data structure is zero, the q-th group's The weights are all zero; when the zero-setting flags corresponding to the weights of the q-th group in the zero-marking data structure are not zero, it is determined that the weights of the q-th group are not all zero.
通过上述方法,可以准确确定第q组权重是否全部为零,以使后续在确定为零时可以直接生成矩阵运算结果,从而可以减少对数据的访问和计算量,从而可以提高运算速度。Through the above method, it can be accurately determined whether the q-th group weights are all zero, so that subsequent matrix operation results can be directly generated when it is determined to be zero, which can reduce the amount of data access and calculation, and can increase the speed of calculation.
在一种可能的设计中,当所述第q组权重全部为零,所述数据处理装置根据矩阵运算类型或者根据所述矩阵运算类型和待处理的矩阵数据生成所述第一运算结果时:当所述矩阵运算类型为矩阵乘法时,所述数据处理装置直接得到所述第一运算结果为零;当所述矩阵运算类型为矩阵加法时,所述数据处理装置确定所述待处理的矩阵数据为所述第一运算结果。这样可以减少对数据的访问和计算量,从而可以提高运算速度。In a possible design, when the q-th group weights are all zero, the data processing device generates the first operation result according to the matrix operation type or according to the matrix operation type and the matrix data to be processed: When the matrix operation type is matrix multiplication, the data processing device directly obtains that the first operation result is zero; when the matrix operation type is matrix addition, the data processing device determines the matrix to be processed The data is the result of the first operation. This can reduce the amount of data access and calculations, which can increase the speed of operation.
第三方面,本申请还提供了一种神经网络压缩装置,该神经网络压缩装置具有实现上述第一方面方法的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。In a third aspect, the present application also provides a neural network compression device, which has the function of implementing the method of the first aspect described above. The function can be realized by hardware, or can also be realized by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions.
在一个可能的设计中,所述神经网络压缩装置的结构中可以包括确定单元、权重调整单元和训练单元,这些单元可以执行上述第一方面方法示例中的相应功能,具体参见第一方面方法示例中的详细描述,此处不做赘述。In a possible design, the structure of the neural network compression device may include a determination unit, a weight adjustment unit, and a training unit, and these units may perform the corresponding functions in the method examples of the first aspect described above. For details, see the method examples of the first aspect The detailed description in is not repeated here.
在一个可能的设计中,所述神经网络压缩装置的结构中可以包括处理器和存储器,所述处理器被配置为执行上述第一方面提及的方法。所述存储器与所述处理器耦合,其保存 所述神经网络压缩装置必要的程序指令和数据。In a possible design, the structure of the neural network compression device may include a processor and a memory, and the processor is configured to perform the method mentioned in the first aspect above. The memory is coupled to the processor, and stores necessary program instructions and data of the neural network compression device.
第四方面,本申请还提供了一种数据处理装置,该数据处理装置具有实现上述第二方面方法的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。According to a fourth aspect, the present application further provides a data processing device having the function of implementing the method of the second aspect. The function can be realized by hardware, or can also be realized by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions.
在一个可能的设计中,所述数据处理装置的结构中可以包括获取单元和处理单元,这些单元可以执行上述第二方面方法示例中的相应功能,具体参见第二方面方法示例中的详细描述,此处不做赘述。In a possible design, the structure of the data processing device may include an acquisition unit and a processing unit, and these units may perform the corresponding functions in the method examples of the second aspect described above. For details, see the detailed description in the method examples of the second aspect. I will not repeat them here.
在一个可能的设计中,所述数据处理装置的结构中可以包括处理器和存储器,所述处理器被配置为执行上述第二方面提及的方法。所述存储器与所述处理器耦合,其保存所述数据处理装置必要的程序指令和数据。In a possible design, the structure of the data processing apparatus may include a processor and a memory, and the processor is configured to perform the method mentioned in the second aspect above. The memory is coupled to the processor, and stores necessary program instructions and data of the data processing device.
第五方面,本申请还提供了一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,所述计算机可执行指令在被所述计算机调用时用于使所述计算机执行上述第一方面或第二方面提及的任一种方法。In a fifth aspect, the present application also provides a computer storage medium that stores computer-executable instructions, which when used by the computer are used to cause the computer to execute the first Any one of the methods mentioned in one aspect or the second aspect.
第六方面,本申请还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面或第二方面提及的任一种方法。In a sixth aspect, the present application also provides a computer program product containing instructions, which when executed on a computer, causes the computer to perform any of the methods mentioned in the first aspect or the second aspect.
第七方面,本申请还提供了一种芯片,所述芯片与存储器耦合,用于读取并执行存储器中存储的程序指令,以实现上述第一方面或第二方面提及的任一种方法。According to a seventh aspect, the present application further provides a chip coupled to a memory for reading and executing program instructions stored in the memory to implement any of the methods mentioned in the first aspect or the second aspect .
附图说明BRIEF DESCRIPTION
图1为本申请实施例提供的一种神经网络的示意图;1 is a schematic diagram of a neural network provided by an embodiment of this application;
图2为本申请实施例提供的一种终端设备的结构图;2 is a structural diagram of a terminal device provided by an embodiment of the present application;
图3为本申请实施例提供的一种神经网络压缩方法的流程图;3 is a flowchart of a neural network compression method provided by an embodiment of this application;
图4为本申请实施例提供的一种置零标记数据结构与权重矩阵的示意图;4 is a schematic diagram of a data structure and a weight matrix of a zero-setting mark provided by an embodiment of the present application;
图5为本申请实施例提供的一种权重调整的流程示意图;5 is a schematic flowchart of a weight adjustment provided by an embodiment of the present application;
图6为本申请实施例提供的一种数据处理方法的流程图;6 is a flowchart of a data processing method provided by an embodiment of this application;
图7为本申请实施例提供的一种数据处理过程的示例图;7 is an example diagram of a data processing process provided by an embodiment of the present application;
图8为本申请实施例提供的一种神经网络压缩装置的结构示意图;8 is a schematic structural diagram of a neural network compression device provided by an embodiment of the present application;
图9为本申请实施例提供的一种数据处理装置的结构示意图;9 is a schematic structural diagram of a data processing device according to an embodiment of the present application;
图10为本申请实施例提供的一种神经网络压缩装置的结构图;10 is a structural diagram of a neural network compression device provided by an embodiment of the present application;
图11为本申请实施例提供的一种数据处理装置的结构图。FIG. 11 is a structural diagram of a data processing apparatus according to an embodiment of the present application.
具体实施方式detailed description
下面将结合附图对本申请作进一步地详细描述。The application will be described in further detail below with reference to the drawings.
本申请实施例提供一种神经网络压缩方法及装置,用以解决现有技术中不能很好地适配处理设备的能力,达不到较好的处理效果的问题。其中,本申请所述方法和装置基于同一发明构思,由于方法及装置解决问题的原理相似,因此装置与方法的实施可以相互参见,重复之处不再赘述。The embodiments of the present application provide a neural network compression method and device to solve the problem that the prior art cannot adapt well to the processing equipment's ability and cannot achieve a better processing effect. Among them, the method and the device described in this application are based on the same inventive concept. Since the principles of the method and the device to solve the problem are similar, the implementation of the device and the method can be referred to each other, and the repetition is not repeated.
以下,对本申请中的部分用语进行解释说明,以便于本领域技术人员理解。In the following, some terms in this application will be explained to facilitate understanding by those skilled in the art.
众所周知神经网络是模仿动物神经网络行为特征,类似于大脑神经突触连接的结构进行数据处理。神经网络作为一种数学运算模型,由大量的节点(或称为神经元)之间相互连接构成。神经网络由输入层、隐藏层、输出层组成,例如图1所示。其中,输入层为神经网络的输入数据;输出层为神经网络的输出数据;而隐藏层由输入层和输出层之间众多节点连接组成的,用于对输入数据进行运算处理。其中,隐藏层可以由一层或多层构成。神经网络中隐藏层的层数、节点数与该神经网络实际解决的问题的复杂程度、输入层的节点以及输出层的节点的个数有着直接关系。It is well known that neural networks imitate the behavioral characteristics of animal neural networks, similar to the structure of brain synaptic connections for data processing. As a mathematical operation model, a neural network consists of a large number of nodes (or neurons) connected to each other. The neural network consists of an input layer, a hidden layer, and an output layer, such as shown in Figure 1. Among them, the input layer is the input data of the neural network; the output layer is the output data of the neural network; and the hidden layer is composed of many nodes connected between the input layer and the output layer, and is used to perform arithmetic processing on the input data. Among them, the hidden layer may be composed of one or more layers. The number of hidden layers in the neural network and the number of nodes are directly related to the complexity of the problem actually solved by the neural network, the number of nodes in the input layer and the number of nodes in the output layer.
通常情况下,神经网络通过大量训练后得到的性能稳定的神经网络模型被广泛部署到数据处理设备上,以实现各个领域对神经网络模型的应用。由于对神经网络进行训练的过程是一个复杂的过程,因此,通常情况下神经网络模型训练的平台和神经网络模型部署的平台一般是分开的。在本申请实施例中由于在神经网络训练的过程中要实现对神经网络的压缩,因此本申请实施例可以称神经网络模型训练的平台为神经网络压缩装置。其中,示例性的,所述神经网络压缩装置可以但不限于为个人计算机((personal computer,PC)等终端设备、服务器(server)、云服务平台等等。在本申请实施例中神经网络模型部署的平台可以称为是数据处理装置,示例性的,所述数据处理装置可以但不限于是手机、平板电脑、PC等终端设备,还可以但不限于是server等等。Normally, neural network models with stable performance obtained after a large number of trainings by neural networks are widely deployed on data processing equipment to realize the application of neural network models in various fields. Since the process of training a neural network is a complicated process, generally, the platform for training a neural network model and the platform for deploying a neural network model are generally separated. In the embodiment of the present application, since the neural network needs to be compressed during the training of the neural network, the embodiment of the present application may be referred to as a neural network compression device. Wherein, exemplarily, the neural network compression device may be, but not limited to, a personal computer (personal computer, PC) and other terminal devices, a server, a cloud service platform, etc. In this embodiment of the present application, a neural network model The deployed platform may be referred to as a data processing device. Exemplarily, the data processing device may be, but not limited to, a mobile phone, a tablet computer, a PC, and other terminal devices, but may also be but not limited to a server, etc.
为了更加清晰地描述本申请实施例的技术方案,下面结合附图,对本申请实施例提供的神经网络压缩方法及装置和数据处理方法及装置进行详细说明。In order to more clearly describe the technical solutions of the embodiments of the present application, the neural network compression method and device and the data processing method and device provided by the embodiments of the present application will be described in detail below with reference to the drawings.
当执行本申请实施例提供的神经网络压缩方法的设备是一种终端设备时,以及当执行本申请实施例提供的数据处理方法的设备是一种终端设备时,神经网络压缩装置或者数据处理装置均可以应用于终端设备。示例性的,图2示出了本申请实施例提供的神经网络方法或者数据处理方法适用的一种可能的终端设备,所述终端设备中包括:处理器210、存储器220、通信模块230、输入单元240、显示单元250、电源260等部件。本领域技术人员可以理解,图2中示出的终端设备的结构并不构成对终端设备的限定,本申请实施例提供的终端设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。When the device implementing the neural network compression method provided by the embodiment of the present application is a terminal device, and when the device performing the data processing method provided by the embodiment of the present application is a terminal device, the neural network compression device or the data processing device Both can be applied to terminal equipment. Exemplarily, FIG. 2 shows a possible terminal device applicable to the neural network method or the data processing method provided by the embodiments of the present application. The terminal device includes: a processor 210, a memory 220, a communication module 230, and an input. Unit 240, display unit 250, power supply 260 and other components. Those skilled in the art can understand that the structure of the terminal device shown in FIG. 2 does not constitute a limitation on the terminal device. The terminal device provided in the embodiments of the present application may include more or fewer components than shown, or a combination of Components, or different component arrangements.
下面结合图2对终端设备的各个构成部件进行具体的介绍:The following describes each component of the terminal device with reference to FIG. 2:
所述通信模块230可以通过无线连接或物理连接的方式连接其他设备,实现终端设备的数据发送和接收。可选的,所述通信模块230可以包含射频(radio frequency,RF)电路、无线保真(wireless fidelity,WiFi)模块、通信接口,蓝牙模块等任一项或组合,本申请实施例对此不作限定。The communication module 230 may be connected to other devices through a wireless connection or a physical connection to implement data transmission and reception of terminal devices. Optionally, the communication module 230 may include any one or a combination of a radio frequency (RF) circuit, a wireless fidelity (WiFi) module, a communication interface, a Bluetooth module, etc. This embodiment of the present application does not make any limited.
所述存储器220可用于存储程序指令和数据。所述处理器210通过运行存储在所述存储器220的程序指令,从而执行终端设备的各种功能应用以及数据处理。其中,所述程序指令中存在可使所述处理器210执行本申请以下实施例提供的神经网络压缩方法或者数据处理方法的程序指令。The memory 220 can be used to store program instructions and data. The processor 210 executes program instructions stored in the memory 220 to execute various functional applications and data processing of the terminal device. Among the program instructions, there are program instructions that enable the processor 210 to execute the neural network compression method or the data processing method provided by the following embodiments of the present application.
可选的,所述存储器220可以主要包括存储程序区和存储数据区。其中,存储程序区可存储操作系统、各种应用程序,以及程序指令等;存储数据区可存储神经网络等各种数据。此外,所述存储器210可以包括高速随机存取存储器,还可以包括非易失性存储器,例如磁盘存储器件、闪存器件、或其他易失性固态存储器件。Optionally, the memory 220 may mainly include a program storage area and a data storage area. Among them, the storage program area can store the operating system, various application programs, and program instructions; the storage data area can store various data such as neural networks. In addition, the memory 210 may include a high-speed random access memory, and may also include a non-volatile memory, such as a magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
所述输入单元240可用于接收用户输入的数据或操作指令等信息。可选的,所述输入单元240可包括触控面板、功能键、物理键盘、鼠标、摄像头、监控器等输入设备。The input unit 240 may be used to receive information such as data or operation instructions input by the user. Optionally, the input unit 240 may include input devices such as a touch panel, function keys, a physical keyboard, a mouse, a camera, and a monitor.
所述显示单元250可以实现人机交互,用于通过用户界面显示由用户输入的信息,提供给用户的信息等内容。其中,所述显示单元250可以包括显示面板251。可选的,所述显示面板251可以采用液晶显示屏(liquid crystal display,LCD)、有机发光二极管(organic light-emitting diode,OLED)等形式来配置。The display unit 250 can realize human-computer interaction, and is used to display information input by the user and information provided to the user through the user interface. Wherein, the display unit 250 may include a display panel 251. Optionally, the display panel 251 may be configured in the form of a liquid crystal display (liquid crystal) (LCD), an organic light-emitting diode (OLED), or the like.
进一步的,当输入单元中包含触控面板时,该触控面板可覆盖所述显示面板251,当所述触控面板检测到在其上或附近的触摸事件后,传送给所述处理器210以确定触摸事件的类型从而执行相应的操作。Further, when the input unit includes a touch panel, the touch panel can cover the display panel 251, and when the touch panel detects a touch event on or near it, it is transmitted to the processor 210 To determine the type of touch event to perform the corresponding operation.
所述处理器210是计算机装置的控制中心,利用各种接口和线路连接以上各个部件。所述处理器210可以通过执行存储在所述存储器220内的程序指令,以及调用存储在所述存储器220内的数据,以完成计算机装置的各种功能,实现本申请实施例提供的神经网络压缩方法或者数据处理方法。The processor 210 is a control center of a computer device, and uses various interfaces and lines to connect the above components. The processor 210 may execute the program instructions stored in the memory 220 and call the data stored in the memory 220 to complete various functions of the computer device and implement the neural network compression provided by the embodiments of the present application Method or data processing method.
可选的,所述处理器210可包括一个或多个处理单元。具体的,所述处理器210可集成应用处理器和调制解调处理器,其中,所述应用处理器主要处理操作系统、用户界面和应用程序等,所述调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到所述处理器210中。在本申请实施例中,所述处理单元可以对神经网络进行压缩或者对数据进行处理。其中,示例性的,所述处理器210可以是中央处理器(central processing unit,CPU),图形处理器(Graphics Processing Unit,GPU)或者CPU和GPU的组合。所述处理器210还可以是网络处理器(network processor unit,NPU)、张量处理器(tensor processing unit,TPU)等等支持神经网络处理的人工智能(artificial intelligence,AI)芯片。所述处理器210还可以进一步包括硬件芯片。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD),数字信号处理器件(digital sgnal processing,DSP)或其组合。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD),现场可编程逻辑门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。Optionally, the processor 210 may include one or more processing units. Specifically, the processor 210 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, application programs, etc., and the modem processor mainly handles wireless communication . It can be understood that the foregoing modem processor may not be integrated into the processor 210. In the embodiment of the present application, the processing unit may compress the neural network or process the data. For example, the processor 210 may be a central processing unit (CPU), a graphics processor (Graphics Processing Unit, GPU), or a combination of CPU and GPU. The processor 210 may also be a network processor (network processor) unit (NPU), a tensor processor (tensor processing unit, TPU), and other artificial intelligence (AI) chips that support neural network processing. The processor 210 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), a digital signal processing device (DSP), or a combination thereof. The PLD may be a complex programmable logic device (complex programmable logic device (CPLD), field programmable gate array (FPGA), general array logic (GAL) or any combination thereof.
所述终端设备还包括用于给各个部件供电的电源260(比如电池)。可选的,所述电源260可以通过电源管理系统与所述处理器210逻辑相连,从而通过电源管理系统实现对所述终端设备的充电、放电等功能。The terminal device also includes a power supply 260 (such as a battery) for powering various components. Optionally, the power supply 260 may be logically connected to the processor 210 through a power management system, so as to realize functions such as charging and discharging the terminal device through the power management system.
尽管未示出,所述终端设备还可以包括摄像头、传感器、音频采集器等部件,在此不再赘述。Although not shown, the terminal device may further include components such as a camera, a sensor, and an audio collector, which are not repeated here.
需要说明的是,上述终端设备仅仅是本申请实施例提供的神经网络压缩方法或者数据处理方法适用的一种设备的示例。应理解,本申请实施例提供的神经网络压缩方法或者数据处理方法还可以应用于除上述终端设备以外的其它设备,本申请对此不作限定。It should be noted that the foregoing terminal device is only an example of a device to which the neural network compression method or data processing method provided in the embodiments of the present application is applicable. It should be understood that the neural network compression method or data processing method provided in the embodiments of the present application may also be applied to other devices than the above terminal devices, which is not limited in this application.
本发明实施例提供的一种神经网络压缩方法,可以适用于图2所示的终端设备,也可以适用于其它设备(例如服务器等)。参阅图3所示,以执行主体为神经网络压缩装置为例说明本申请提供的神经网络压缩方法,所述方法的具体流程可以包括:A neural network compression method provided by an embodiment of the present invention can be applied to the terminal device shown in FIG. 2 or other devices (such as a server, etc.). Referring to FIG. 3, the neural network compression device whose execution subject is a neural network compression device is taken as an example to illustrate the neural network compression method provided by the present application. The specific flow of the method may include:
步骤301:神经网络压缩装置根据处理设备的处理能力信息,确定稀疏化单位长度, 所述稀疏化单位长度为所述处理设备进行矩阵运算时一次运算的数据长度。Step 301: The neural network compression device determines the sparse unit length according to the processing capability information of the processing device, where the sparse unit length is the data length of one operation when the processing device performs matrix operation.
其中,所述处理设备为在所述神经网络压缩装置最终得到神经网络模型后,应用所述最终得到的神经网络模型对待处理的数据进行处理的设备。需要说明的是,所述处理设备可以应用于本申请涉及的数据处理装置。Wherein, the processing device is a device for processing the data to be processed after the neural network compression device finally obtains the neural network model. It should be noted that the processing device may be applied to the data processing device involved in this application.
通常情况下,神经网络模型的训练是针对一个处理设备进行的,所以处理设备的处理能力信息可以预先配置在所述神经网络压缩装置中,以使所述神经网络压缩装置针对所述处理设备获得所述处理设备能应用的神经网络模型时,直接根据所述处理设备的能力信息进行后续流程。Normally, the training of the neural network model is for one processing device, so the processing capability information of the processing device can be pre-configured in the neural network compression device, so that the neural network compression device obtains for the processing device When the neural network model that can be applied by the processing device, the subsequent process is directly performed according to the capability information of the processing device.
在一种可选的实施方式中,所述处理设备的能力信息可以通过所述处理设备处理数据的能力指示。其中,在一种实现方式中,所述处理设备的能力信息可以理解为所述处理设备中包括的处理器、计算芯片的能力信息,其中所述处理器或者计算芯片可以但不限于为中央处理器(central processing unit,CPU)、图形处理器(Graphics Processing Unit,GPU)、网络处理器(network processor unit,NPU)等。在另一种实现方式中,所述处理设备也可以直接就是处理器或计算芯片。In an optional implementation manner, the capability information of the processing device may be indicated by the capability of the processing device to process data. In an implementation, the capability information of the processing device may be understood as capability information of a processor and a computing chip included in the processing device, where the processor or the computing chip may be, but not limited to, central processing Processor (central processing unit, CPU), graphics processor (Graphics Processing Unit, GPU), network processor (network processor unit, NPU), etc. In another implementation manner, the processing device may also be a processor or a computing chip directly.
示例性的,所述处理设备的能力信息可以体现为所述处理设备进行矩阵运算时一次运算的数据长度。基于此:Exemplarily, the capability information of the processing device may be embodied as a data length of one operation when the processing device performs matrix operation. Based on:
在一种可选的实施方式中,所述神经网络压缩装置根据处理设备的处理能力信息确定稀疏化单位长度,具体方法可以为:所述神经网络压缩装置确定所述处理设备中寄存器的长度或者所述处理设备中指令集一次处理的最大数据长度,并将所述寄存器的长度或者所述指令集一次处理的最大数据长度作为所述稀疏化单位长度。In an optional embodiment, the neural network compression device determines the sparse unit length according to the processing capability information of the processing device. The specific method may be: the neural network compression device determines the length of the register in the processing device or The maximum data length of the instruction set in the processing device at a time, and the length of the register or the maximum data length of the instruction set at a time is used as the sparse unit length.
在又一种可选的实施方式中,所述神经网络压缩装置还可以确定所述处理设备中的计算单元的位宽,并将确定的所述计算单元的位宽作为所述稀疏化单位长度。其中可选的,所述计算单元可以为GPU、NPU等等。In yet another optional embodiment, the neural network compression device may further determine the bit width of the calculation unit in the processing device, and use the determined bit width of the calculation unit as the sparse unit length . Optionally, the calculation unit may be a GPU, NPU, or the like.
在另一种可选的实施方式中,所述神经网络压缩装置还可以确定所述处理设备中寄存器、高速缓冲存储器(cache)、指令集和计算单元的位宽中的一个或者多个组合后可以支持的最大数据长度,并将所述可以支持的最大数据长度作为所述稀疏化单位长度。In another optional embodiment, the neural network compression device may further determine one or more combinations of the bit widths of registers, caches, instruction sets, and calculation units in the processing device The maximum data length that can be supported, and the maximum data length that can be supported is used as the sparse unit length.
通过步骤301后续可以实现针对不同的硬件设备对神经网络模型进行针对性的训练,这样可以更适应硬件设备的处理能力,达到更好的效果。Through the subsequent step 301, the neural network model can be specifically trained for different hardware devices, which can be more adapted to the processing capabilities of the hardware devices and achieve better results.
步骤302:所述神经网络压缩装置在对神经网络模型进行当前次训练时,根据上一次训练参照的第j组权重,对上一次训练后得到的第j组权重进行调整,得到当前次训练参照的第j组权重;其中,第j组权重包括的权重个数为所述稀疏化单位长度;所述j取遍1至m中的任意一个正整数,所述m为对所述神经网络模型的所有权重按照所述稀疏化单位长度分组后得到的权重总组数。Step 302: When performing the current training on the neural network model, the neural network compression device adjusts the jth group of weights obtained after the last training according to the jth group of weights referenced in the previous training to obtain the current training reference Group j weights; wherein, the number of weights included in group j weights is the length of the sparse unit; the j takes any positive integer from 1 to m, where m is the neural network model The total number of weights obtained after grouping according to the sparse unit length is the total number of groups.
在一种可选的实施方式中,所述神经网络压缩装置进行每一次训练时,均是按照所述稀疏化单位长度获取一组连续的权重进行训练流程。可以理解为所述神经网络压缩装置对所有权重按照所述稀疏化单位长度进行了分组。可选的,所述神经网络压缩装置在进行每一次训练时,可以先获取到神经网络模型的所有权重,当所述神经网络压缩装置可以直接获取到权重的具体数据,也可以获取神经网络模型的模型文件,解析所述模型文件得到权重的数据。In an optional implementation manner, each time the neural network compression device performs training, it obtains a continuous set of weights according to the sparse unit length to perform the training process. It can be understood that the neural network compression device groups the weight according to the sparse unit length. Optionally, during each training of the neural network compression device, the weight of the neural network model can be obtained first, and when the neural network compression device can directly obtain the specific data of the weight, the neural network model can also be obtained Model file, and parse the model file to obtain weighted data.
在一种可选的实施方式中,所述神经网络压缩装置在对所述神经网络进行首次训练之 前,可以先根据初始神经网络模型的初始权重阈值,对所述初始神经网络模型的所有权重进行剪裁。In an optional embodiment, before performing the first training on the neural network, the neural network compression device may perform weighting on the initial neural network model according to the initial weight threshold of the initial neural network model. Tailoring.
其中,示例性的,所述神经网络压缩装置在对所述初始神经络模型的所有权重进行剪裁的具体方法可以为:所述神经网络压缩装置分别获取所述初始神经网络模型每一层的权重,然后根据所述每一层的初始权重阈值对所述每一层的权重进行剪裁,直至所有层的权重均剪裁结束。其中,上述过程可以称为稀疏化过程,具体的,上述过程可以采用常用的各种矩阵稀疏化方法,例如可以是论文《Learning bothWeights and Connections for Efficient Neural Networks》中提到的剪枝方法,也可以是论文《Ternary weight networks》中提到的量化方法,还可以使其它方法,本申请对此不作具体限定。Wherein, exemplarily, the specific method for the neural network compression device to trim the weight of the initial neural network model may be: the neural network compression device separately obtains the weight of each layer of the initial neural network model Then, the weights of each layer are trimmed according to the initial weight threshold of each layer until the weights of all layers are trimmed. Among them, the above process can be called a sparse process. Specifically, the above process can use a variety of commonly used matrix sparse methods, such as the pruning method mentioned in the paper "Learning both Weights and Connections for Efficient Neural Networks", and It may be the quantization method mentioned in the paper "Ternary Weights" or other methods, which is not specifically limited in this application.
在一种可选的实施方式中,所述神经网络压缩装置根据每一层的初始权重阈值,对所述每一层的权重进行剪裁时,具体过程可以是,所述神经网络压缩装置将所述每一层中小于所述每一层的初始权重阈值的权值置零,将所述每一层中不小于所述每一层的初始权重阈值的权重保持不变。In an optional implementation manner, when the neural network compression device tailors the weight of each layer according to the initial weight threshold of each layer, the specific process may be that the neural network compression device The weights in each layer that are less than the initial weight threshold of each layer are set to zero, and the weights in each layer that are not less than the initial weight threshold of each layer are kept unchanged.
在一种可选的实施方式中,所述神经网络压缩装置在获取所述初始神经网络模型每一层的权重之前,需要对神经网络进行训练,得到神经网络的所有权重,进而得到所述初始神经网络模型。示例性的,对所述神经网络进行训练,得到所述神经网络中的所有权重,具体可以为:通过数据输入和神经网络模型构建,得到神经网络的结构和神经网络中的所有权重。例如,可以通过常用的深度学习框架对所述神经网络进行训练,如TensorFlow、Caffe、MXNet、PyTorch等。In an alternative embodiment, before obtaining the weight of each layer of the initial neural network model, the neural network compression device needs to train the neural network to obtain the weight of the neural network, and then obtain the initial Neural network model. Exemplarily, the neural network is trained to obtain the weight in the neural network, which may be specifically: through data input and neural network model construction, the structure of the neural network and the weight in the neural network are obtained. For example, the neural network may be trained through commonly used deep learning frameworks, such as TensorFlow, Caffe, MXNet, PyTorch, and so on.
在一种可选的实施方式中,所述神经网络压缩装置根据上一次训练参照的第j组权重,对上一次训练后得到的第j组权重进行调整,具体可以包括以下5种情况:In an optional implementation manner, the neural network compression device adjusts the jth group weight obtained after the last training according to the jth group weight referenced in the last training, which may specifically include the following 5 cases:
情况a1、在所述上一次训练参照的第j组权重全部为零、且所述上一次训练后得到的第j组权重全部小于置零权重阈值时,将所述神经网络压缩装置所述上一次训练后得到的第j组权重全部置零。Case a1, when the jth group weights referred to in the previous training are all zero, and the jth group weights obtained after the last training are all less than the zero-setting weight threshold, the neural network compression device The weights of group j obtained after one training are all set to zero.
情况a2、在所述上一次训练参照的第j组权重全部为零、且所述上一次训练后得到的第j组权重不全部都小于置零权重阈值时,所述神经网络压缩装置保持所述上一次训练后得到的第j组权重不变。Case a2. When the weights of the j-th group referred to in the previous training are all zero, and all the weights of the j-th group obtained after the previous training are not less than the zero-setting weight threshold, the neural network compression device maintains all The weight of the jth group obtained after the last training is unchanged.
情况a3、在所述上一次训练参照的第j组权重不全部为零、且所述上一次训练后得到的第j组权重中非零值的个数在所述上一次训练后得到的第j组权重的总个数中所占比重小于设定比重阈值、且所述上一次训练后得到的第j组权重中的非零值均小于置零权重阈值时,所述神经网络压缩装置将所述上一次训练后得到的第j组权重中的所述非零值的权重均置零。Case a3. The j-th group weights referenced in the previous training are not all zero, and the number of non-zero values in the j-th group weights obtained after the last training is obtained after the last training When the proportion of the total number of j groups of weights is less than the set weight threshold, and the non-zero values of the jth group of weights obtained after the previous training are all less than the zero-setting weight threshold, the neural network compression device will The weights of the non-zero values in the j-th group weights obtained after the last training are all set to zero.
例如,所述设定比重阈值可以为30%等等,还可以为其它的取值,本申请对此不作限定。For example, the set specific gravity threshold may be 30%, etc., and may also be other values, which is not limited in this application.
情况a4、在所述上一次训练参照的第j组权重不全部为零、且所述上一次训练后得到的第j组权重中非零值的个数在所述上一次训练后得到的第j组权重的总个数中所占比重小于设定比重阈值、且所述上一次训练后得到的第j组权重中的非零值不均小于置零权重阈值时,所述神经网络压缩装置保持所述上一次训练后得到的第j组权重不变。Case a4. The j-th group weights referred to in the previous training are not all zero, and the number of non-zero values in the j-th group weights obtained after the last training is obtained after the last training. When the proportion of the total number of j groups of weights is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the last training are not all less than the zero-setting weight threshold, the neural network compression device Keep the jth group weight obtained after the previous training unchanged.
情况a5、在所述上一次训练参照的第j组权重不全部为零、且所述上一次训练后得到的第j组权重中非零值的个数在所述上一次训练后得到的第j组权重的总个数中所占比重 不小于设定比重阈值时,所述神经网络压缩装置保持所述上一次训练后得到的第j组权重不变。Case a5. The j-th group weights referenced in the previous training are not all zero, and the number of non-zero values in the j-th group weights obtained after the last training is obtained after the last training. When the proportion of the total number of j groups of weights is not less than the set proportion threshold, the neural network compression device keeps the jth group of weights obtained after the previous training unchanged.
通过上述方法,可以使得到的最终神经网络模型的权重矩阵中的零值分布更加均匀,例如,可以使连续零值尽可能分布在一组权重中,这样可以使后续应用该神经网络模型进行数据处理时利用零值的规则分布而大量减少访存次数和计算量,进而可以提升运算速度。Through the above method, the distribution of zero values in the weight matrix of the final neural network model can be made more uniform, for example, continuous zero values can be distributed in a set of weights as much as possible, so that the subsequent application of the neural network model for data The zero-value regular distribution is used during processing to greatly reduce the number of memory accesses and the amount of calculation, which in turn can increase the speed of calculation.
需要说明的是,上一次训练参照的第j组权重可以理解为上一次需要训练的第j组权重;对上一次训练后得到的第j组权重进行调整之后得到的第j组权重是当次需要训练的权重,也即当次训练参照的权重。应理解第一次训练参照的第j组权重可以为初始神经网络模型的第j组权重。It should be noted that the jth group weight referenced in the last training can be understood as the jth group weight that needs training last time; the jth group weight obtained after adjusting the jth group weight obtained after the last training is the current time The weight that needs to be trained, that is, the weight that is referenced in the current training. It should be understood that the jth group of weights referenced in the first training may be the jth group of weights of the initial neural network model.
在一种可选的实施方式中,所述置零权重阈值可以是基于所述初始权重阈值确定的,示例性的,所述置零权重阈值可以为所述初始权重阈值设定倍数的值,所述设定倍数大于1。例如,当所述初始权重阈值为1时,所述置零阈值可以为1.05。In an optional embodiment, the zero-setting weight threshold may be determined based on the initial weight threshold. Exemplarily, the zero-setting weight threshold may set a multiple of the initial weight threshold, The set multiple is greater than 1. For example, when the initial weight threshold is 1, the zero-setting threshold may be 1.05.
在一种可选的实施方式中,所述神经网络压缩装置维护了一个置零标记数据结构,以及所述置零标记数据结构中的每个置零标记分别对应的一组权重(其中,每一组权重可以被称为一个权重矩阵)。其中,当一组权重中所有权重均为0时,该组权重对应的置零标记为0,当该组权重中至少有一个不为0时,该组权重对应的置零标记为非0值(即为1等)。例如,置零标记数据结构与权重矩阵可以如图4所示的示意图表示。其中,权重矩阵中每连续个稀疏化单位长度的权重对应一个置零标记数据结构中的1个bit,如图4中示出的示例中,所述稀疏化单位长度是4时,每4个连续权重对应一个置零标记。In an alternative embodiment, the neural network compression device maintains a zero-setting mark data structure, and a set of weights corresponding to each zero-setting mark in the zero-setting mark data structure (where each A set of weights can be called a weight matrix). Among them, when the weights in a group of weights are all 0, the zero setting mark corresponding to the group weight is 0, and when at least one of the group weights is not 0, the zero setting mark corresponding to the group weight is a non-zero value (That is, 1 etc.). For example, the zero mark data structure and weight matrix can be represented as shown in the schematic diagram in FIG. 4. The weight of each consecutive sparse unit length in the weight matrix corresponds to 1 bit in a zero-mark data structure. As shown in the example shown in FIG. 4, when the sparse unit length is 4, every 4 The continuous weight corresponds to a zero-setting mark.
在一种可选的实施方式中,基于上述置零标记数据结构,所述神经网络压缩装置判断所述上一次训练参照的第j组权重是否全部为零时,具体方法可以为:所述神经网络压缩装置确定所述置零标记数据结构中第j组权重对应的置零标记是否为零;当所述置零标记为零时,判定所述上一次训练参照的第j组权重全部为零;当所述置零标记为非零值时,判定所述上一次训练参照的第j组权重不全为零。例如,以图4为例,图4中置零标记数据结构中,第一个置零标记为0,则表示该置零标记对应的一组权重全部为0,例如图4中权重矩阵中第一行前4个权重(也即第一组权重,或者也即第一个权重矩阵)可以看出对应的该组权重全部为0。In an optional embodiment, based on the above zero-setting label data structure, when the neural network compression device determines whether the jth group weights referenced in the previous training are all zero, the specific method may be: the nerve The network compression device determines whether the zero-setting mark corresponding to the j-th set of weights in the zero-setting mark data structure is zero; when the zero-setting mark is zero, it is determined that the j-th set of weights referred to in the previous training are all zero ; When the zero-setting flag is a non-zero value, it is determined that the j-th group of weights referred to in the previous training is not all zero. For example, taking FIG. 4 as an example, in the data structure of the zero mark in FIG. 4, the first zero mark is 0, which means that the set of weights corresponding to the zero mark is all 0. For example, in the weight matrix in FIG. The first 4 weights in a row (that is, the first group of weights, or the first weight matrix) can be seen that the corresponding group of weights are all 0.
在一种可选的实施方式中,所述神经网络压缩装置在将所述上一次训练后得到的第j组权重全部置零之后,或者在将所述非零值的权重均置零之后,将当前的置零标记数据结构中第j组权重对应的置零标记更新为零。同理,在一种可选的实施方式中,所述神经网络压缩装置在保持所述上一次训练后得到的第j组权重不变之后,将当前的置零标记数据结构中第j组权重对应的置零标记更新为非零值(即为1)。通过上述方法,可以实时更新所述置零标记数据结构中的置零标记,以使更准确地在训练过程中对权重进行调整,以及后续处理设备在基于神经网络模型处理数据时可以准确地基于权重进行数据处理。In an optional implementation manner, after the neural network compression device resets all the jth group weights obtained after the previous training to zero, or after all non-zero weights are set to zero, Update the zero-setting mark corresponding to the j-th group of weights in the current zero-setting mark data structure to zero. Similarly, in an optional implementation manner, after maintaining the jth group weight obtained after the last training unchanged, the neural network compression device changes the jth group weight in the current zero-marking data structure The corresponding zero mark is updated to a non-zero value (that is, 1). Through the above method, the zero-setting mark in the zero-setting mark data structure can be updated in real time, so that the weights can be adjusted more accurately during the training process, and the subsequent processing device can accurately base on the data processing based on the neural network model Weights are used for data processing.
上述5种情况实际上可以是一个循环过程,所述神经网络压缩装置先判断所述上一次训练参照的第j组权重是否为零,根据判断结果再分别进行后续流程,根据上述5种情况,从而得到神经网络模型的所有组的新的权重,以使后续所述神经网络压缩装置对所述新的权重进行训练。示例性的,一种具体的权重调整的流程的示意图可以如图5所示。The above five situations may actually be a cyclic process. The neural network compression device first determines whether the jth group weight referenced by the previous training is zero, and then performs subsequent processes according to the judgment results. According to the above five situations, Thereby, new weights of all groups of the neural network model are obtained, so that the neural network compression device subsequently trains the new weights. Exemplarily, a schematic diagram of a specific weight adjustment process may be shown in FIG. 5.
需要说明的是,得到所述m的过程中,对所述神经网络模型的所有权重按照所述稀疏 化单位长度分组时,可以有多种情况:一种情况下是将所述神经网络模型的所有权重一起平均分组,在分组过程中最后一组剩余的权重个数可能小于所述稀疏化单位长度,此时最后一组中权重个数即使小于稀疏化单位长度,对该组的权重的处理也与其他组权重(个数等于所述稀疏化单位长度)的处理方法相同;另一种情况是,按照所述神经网络模型的所有权重组成的权重矩阵,按照行(或者列)分别对每一行(或列))的权重进行分组,这样每一行(或列)中按照稀疏化单位长度分组时,每一行(或列)中的最后一组中的权重个数也可能小于稀疏化单位长度,同理,每一行(或列)中最后一组的权重的处理方法与其他组权重(个数等于所述稀疏化单位长度)的处理方法相同。It should be noted that, in the process of obtaining the m, when the weight of the neural network model is grouped according to the sparse unit length, there may be multiple cases: in one case, the neural network model The weights are grouped together evenly. During the grouping process, the number of remaining weights in the last group may be less than the length of the sparse unit. At this time, even if the number of weights in the last group is less than the length of the sparse unit, the weight of the group is processed. It is also the same as the processing method of other groups of weights (the number is equal to the length of the sparse unit); another case is that the weight matrix composed of the weights of the neural network model is divided into rows (or columns) for each The weight of one row (or column) is grouped, so that when each row (or column) is grouped according to the sparse unit length, the number of weights in the last group in each row (or column) may also be less than the length of the sparse unit For the same reason, the processing method of the weight of the last group in each row (or column) is the same as the processing method of the weights of other groups (the number is equal to the length of the sparse unit).
步骤303:所述神经网络压缩装置根据得到的当前次训练参照的各组权重,对所述神经网络模型进行当前次训练。Step 303: The neural network compression device performs the current training on the neural network model according to the obtained sets of weights referenced by the current training.
基于上述步骤302可以得到所述神经网络模型的所有组权重,从而可以执行步骤303。Based on the above step 302, all group weights of the neural network model can be obtained, so that step 303 can be performed.
在一种可选的实施方式中,所述神经网络执行步骤303的方法可以参考常用的神经网络训练方法,本申请对此不作具体说明。In an optional implementation manner, the method for performing step 303 by the neural network may refer to a commonly used neural network training method, which is not specifically described in this application.
采用本申请实施例提供的神经网络压缩方法,在进行神经网络压缩时,可以基于处理设备的能力信息去确定稀疏化单位长度,在训练过程中对基于稀疏化单位长度分组后的权重进行处理,可以根据处理设备的能力不同,使神经网络模型适配不同的处理设备的能力,以使后续处理设备达到比较好的处理效果。Using the neural network compression method provided in the embodiments of the present application, when performing neural network compression, the sparse unit length can be determined based on the capability information of the processing device, and the weights after grouping based on the sparse unit length are processed during the training process, According to the different capabilities of the processing equipment, the neural network model can be adapted to the capabilities of different processing equipment, so that the subsequent processing equipment can achieve a better processing effect.
通过上述图3所示的实施例得到的最终神经网络模型可以应用于数据处理设备中,以使数据处理装置基于所述最终得到的神经网络模型进行数据处理。基于此,本申请实施例还提供了一种数据处理方法,该方法基于图3所示的实施例得到的最终的神经网络模型实现。如图6所示,以执行主体为数据处理装置为例说明本申请提供的数据处理方法,该方法的具体流程可以包括如下步骤:The final neural network model obtained through the embodiment shown in FIG. 3 may be applied to a data processing device, so that the data processing device performs data processing based on the finally obtained neural network model. Based on this, an embodiment of the present application also provides a data processing method, which is implemented based on the final neural network model obtained in the embodiment shown in FIG. 3. As shown in FIG. 6, the data processing method provided by the present application is explained by taking an execution subject as a data processing device as an example. The specific flow of the method may include the following steps:
步骤601:数据处理装置获取目标神经网络模型的权重,所述目标神经网络模型是基于稀疏化单位长度对神经网络模型的权重神经网络分组后训练得到的最终神经网络模型;所述稀疏化单位长度为基于处理设备的处理能力信息确定的,所述稀疏化单位长度为进行矩阵运算时一次运算的数据长度。Step 601: The data processing device obtains the weight of the target neural network model, the target neural network model is a final neural network model obtained by training the weighted neural network of the neural network model based on the sparse unit length after grouping; the sparse unit length It is determined based on the processing capability information of the processing device, and the sparse unit length is the data length of one operation when performing matrix operation.
其中,所述目标神经网络模型的生成方法可以参考图3所示的实施例中的具体过程,此处不再重复赘述。For the method for generating the target neural network model, reference may be made to the specific process in the embodiment shown in FIG. 3, and details are not repeated here.
同样的,所述处理设备在这里即为所述数据处理装置,基于所述处理设备的处理能力信息确定所述稀疏化单位长度的具体方法也可以参见图3所示的实施例中涉及的相关方法,此处不再重复赘述。Similarly, the processing device is the data processing device here, and for a specific method for determining the sparse unit length based on the processing capability information of the processing device, reference may also be made to the related reference in the embodiment shown in FIG. 3 The method will not be repeated here.
步骤602:基于所述目标神经网络模型的权重进行如下处理:在第p次处理时,判断第q组权重是否全部为零,若是则根据矩阵运算类型或者根据所述矩阵运算类型和待处理的矩阵数据生成第一运算结果并保存,否则根据所述第q组权重、所述待处理的矩阵数据和所述矩阵运算类型生成第二运算结果并保存。Step 602: Perform the following processing based on the weights of the target neural network model: in the pth processing, determine whether the qth group of weights are all zero, and if so, according to the matrix operation type or according to the matrix operation type and the to-be-processed The matrix data generates the first operation result and saves it; otherwise, generates and saves the second operation result according to the qth group weights, the matrix data to be processed, and the matrix operation type.
其中,第q组权重包括的权重的个数为稀疏化单位长度;所述q取遍1至f中的任意一个正整数,所述f为所述目标神经网络模型的所有权重按照所述稀疏化单位长度分组后的总组数;所述p取遍1到f中的任意一个正整数。Among them, the number of weights included in the qth group of weights is the length of the sparse unit; the q takes any positive integer from 1 to f, and f is the weight of the target neural network model according to the sparse The total number of groups after unit length grouping; p takes any positive integer from 1 to f.
需要说明的是,得到所述f的过程中的分组情况,与图3所示的实施例中得到m的过 程中的分组情况类似,具体描述可以相互参见,此处不再详细赘述。It should be noted that the grouping in the process of obtaining f is similar to the grouping in the process of obtaining m in the embodiment shown in FIG. 3, and the specific descriptions can be referred to each other, and are not described in detail here.
在一种可选的实施方式中,所述数据处理装置判断第q组权重是否全部为零时,先获取所述目标神经网络模型的权重对应的置零标记数据结构,再判断所述置零标记数据结构中所述第q组权重对应的置零标记是否为零。具体的,当所述置零标记数据结构中所述第q组的权重对应的置零标记为零时,则所述数据处理装置判定所述第q组的权重全部为零;当所述置零标记数据结构中所述第q组的权重对应的置零标记不为零时,则所述数据处理装置判定所述第q组的权重不全部为零。例如,参与图4所示,当获取到第q组权重对应的置零标记为第一个置零标记时,由于该置零标记为0,则确定所述第q组权重全部为零。In an optional implementation manner, when the data processing device determines whether the q-th group weights are all zero, first obtain a zero-setting label data structure corresponding to the weight of the target neural network model, and then determine the zero-setting Mark whether the zero-setting mark corresponding to the qth group of weights in the data structure is zero. Specifically, when the zero-setting mark corresponding to the weight of the q-th group in the zero-marking data structure is zero, the data processing device determines that the weights of the q-th group are all zero; When the zero-setting flag corresponding to the weight of the q-th group in the zero-mark data structure is not zero, the data processing device determines that the weights of the q-th group are not all zero. For example, as shown in FIG. 4, when the zero-setting mark corresponding to the q-th group weight is acquired as the first zero-setting mark, since the zero-setting mark is 0, it is determined that the q-th group weights are all zero.
由于所述目标神经网络模型是适配所述数据处理装置的,因此,关于与所述目标神经网络模型有关的信息(例如所述置零标记数据结构)已经预先配置在了所述数据处理装置中。其中,所述置零标记数据结构以及所述置零标记的相关描述可以参见图3所示的实施例中的置零标记数据结构以及置零标记的相关描述,此处不再重复赘述。Since the target neural network model is adapted to the data processing device, information about the target neural network model (such as the zero-mark data structure) has been pre-configured in the data processing device in. For the description of the data structure of the zero-setting mark and the related description of the zero-setting mark, reference may be made to the description of the data structure of the zero-setting mark and the related description of the zero-setting mark in the embodiment shown in FIG. 3, and details are not repeated here.
一种示例中,当所述第q组权重全部为零,所述数据处理装置根据矩阵运算类型或者根据所述矩阵运算类型和待处理的矩阵数据生成所述第一运算结果时:当所述矩阵运算类型为矩阵乘法时,所述数据处理装置直接得到所述第一运算结果为零;当所述矩阵运算类型为矩阵加法时,所述数据处理装置确定所述待处理的矩阵数据为所述第一运算结果。In an example, when the q-th group weights are all zero, the data processing device generates the first operation result according to the matrix operation type or according to the matrix operation type and the matrix data to be processed: when the When the matrix operation type is matrix multiplication, the data processing device directly obtains that the first operation result is zero; when the matrix operation type is matrix addition, the data processing device determines that the matrix data to be processed is all The first operation result is described.
另一种示例中,当所述第q组权重不全部为零,所述数据处理装置根据所述第q组权重、所述待处理的矩阵数据和所述矩阵运算类型生成第二运算结果,具体方法为:所述数据处理装置将所述第q组权重和所述待处理的矩阵数据加载到寄存器中,然后根据所述矩阵运算类型对所述第q组权重和所述待处理的矩阵数据进行相应的矩阵运算生成所述第二运算结果。In another example, when the q-th group weights are not all zero, the data processing device generates a second operation result according to the q-th group weights, the matrix data to be processed, and the matrix operation type, A specific method is: the data processing device loads the qth group weights and the matrix data to be processed into a register, and then loads the qth group weights and the matrix to be processed according to the matrix operation type The data is subjected to a corresponding matrix operation to generate the second operation result.
通过上述过程当将所述目标神经网络模型的所有权重遍历完后,可以生成最终的处理结果。After the ownership of the target neural network model is traversed through the above process, the final processing result can be generated.
在上述处理过程中,明显可以看出,当一组权重均为零时,可以理解为是跳过当前最耗时的矩阵运算过程,以达到加速。In the above process, it can be clearly seen that when a set of weights are all zero, it can be understood as skipping the current most time-consuming matrix operation process to achieve acceleration.
需要说明的是,上述处理过程是一个循环过程,每一次针对一组权重进行上述处理,直至遍历了所有组的权重结束。示例性的,一种具体的数据处理过程可以如图7中的示意图所示。It should be noted that the above processing process is a cyclic process, and the above processing is performed for each group of weights until the weights of all groups are traversed. Exemplarily, a specific data processing process may be shown in the schematic diagram in FIG. 7.
采用本申请实施例提供的数据处理方法,由于应用根据数据处理装置(即处理设备)的处理能力信息得到的稀疏化单位长度,对神经网络模型的权重神经网络分组后训练得到的最终神经网络模型,这样可以根据矩阵运算的特性,可以使后续应用该最终神经网络模型进行数据处理时,可以大量减少对数据的访问和计算量,从而可以提高运算速度。Using the data processing method provided in the embodiment of the present application, due to the application of the sparse unit length obtained according to the processing capability information of the data processing device (ie, processing device), the weights of the neural network model are grouped and the final neural network model is trained after the neural network is grouped In this way, according to the characteristics of the matrix operation, the subsequent application of the final neural network model for data processing can greatly reduce the amount of data access and calculation, thereby improving the speed of operation.
基于上述实施例,本申请实施例还提供了一种神经网络压缩装置,用于实现如图3所示的实施例提供的神经网络压缩方法。参阅图8所示,所述神经网络压缩装置800中包括确定单元801、权重调整单元802和训练单元803,其中:Based on the above embodiments, the embodiments of the present application further provide a neural network compression device, which is used to implement the neural network compression method provided in the embodiment shown in FIG. 3. Referring to FIG. 8, the neural network compression device 800 includes a determination unit 801, a weight adjustment unit 802, and a training unit 803, where:
所述确定单元801用于根据处理设备的处理能力信息,确定稀疏化单位长度,所述稀疏化单位长度为所述处理设备进行矩阵运算时一次运算的数据长度;所述权重调整单元802用于在对神经网络模型进行当前次训练时,根据上一次训练参照的第j组权重,对上一次训练后得到的第j组权重进行调整,得到当前次训练参照的第j组权重;其中,第j 组权重包括的权重个数为所述稀疏化单位长度;所述j取遍1至m中的任意一个正整数,所述m为对所述神经网络模型的所有权重按照所述稀疏化单位长度分组后得到的权重总组数;所述训练单元803用于根据权重调整单元得到的当前次训练参照的各组权重,对所述神经网络模型进行当前次训练。The determining unit 801 is used to determine the sparse unit length according to the processing capability information of the processing device, and the sparse unit length is the data length of one operation when the processing device performs matrix operation; the weight adjustment unit 802 is used to When performing the current training on the neural network model, the jth group weight obtained after the last training is adjusted according to the jth group weight referenced in the previous training to obtain the jth group weight referenced in the current training; The number of weights included in the set of j weights is the length of the sparse unit; the j takes any positive integer from 1 to m, where m is the weight of ownership of the neural network model in accordance with the sparse unit The total number of groups of weights obtained after length grouping; the training unit 803 is configured to perform the current training of the neural network model according to the weights of the groups referenced by the current adjustment training unit obtained by the weight adjustment unit.
在一种可选的实施方式中,所述确定单元801在根据处理设备的处理能力信息确定稀疏化单位长度时,确定所述处理设备中寄存器的长度或者所述处理设备中指令集一次处理的最大数据长度;将所述寄存器的长度或者所述指令集一次处理的最大数据长度作为所述稀疏化单位长度。In an optional embodiment, when determining the length of the sparse unit according to the processing capability information of the processing device, the determining unit 801 determines the length of the register in the processing device or the instruction set in the processing device for processing at a time Maximum data length; the length of the register or the maximum data length processed by the instruction set at a time is used as the sparse unit length.
在一种可选的实施方式中,所述神经网络压缩装置还可以包括权重剪裁单元,所述权重剪裁单元用于在所述训练单元对所述神经网络进行首次训练之前,根据初始神经网络模型的初始权重阈值,对所述初始神经网络模型的所有权重进行剪裁。In an optional implementation manner, the neural network compression device may further include a weight trimming unit, the weight trimming unit is used to first train the neural network according to the initial neural network model before the training unit The initial weight threshold of, trims the initial weight of the initial neural network model.
在一种可选的实施方式中,所述权重调整单元802在根据上一次训练参照的第j组权重,对上一次训练后得到的第j组权重进行调整时,具体可以分为以下几种情况:In an optional implementation manner, when the weight adjustment unit 802 adjusts the jth group of weights obtained after the last training according to the jth group of weights referenced in the previous training, the weight adjustment unit 802 may be specifically classified into the following types: Happening:
在所述上一次训练参照的第j组权重全部为零、且所述上一次训练后得到的第j组权重全部小于置零权重阈值时,将所述上一次训练后得到的第j组权重全部置零;或者When the jth group weights referred to in the previous training are all zero, and the jth group weights obtained after the last training are all less than the zero-setting weight threshold, the jth group weights obtained after the last training Zero all; or
在所述上一次训练参照的第j组权重全部为零、且所述上一次训练后得到的第j组权重不全部都小于置零权重阈值时,保持所述上一次训练后得到的第j组权重不变;或者When the weights of the jth group referenced in the previous training are all zero, and not all the weights of the jth group obtained after the last training are less than the zero-setting weight threshold, the jth group obtained after the last training is maintained The group weight is unchanged; or
在所述上一次训练参照的第j组权重不全部为零、且所述上一次训练后得到的第j组权重中非零值的个数在所述上一次训练后得到的第j组权重的总个数中所占比重小于设定比重阈值、且所述上一次训练后得到的第j组权重中的非零值均小于置零权重阈值时,将所述上一次训练后得到的第j组权重中的所述非零值的权重均置零;或者The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number of is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the last training are all less than the zero-setting weight threshold, the The weights of the non-zero values in the group j weights are all set to zero; or
在所述上一次训练参照的第j组权重不全部为零、且所述上一次训练后得到的第j组权重中非零值的个数在所述上一次训练后得到的第j组权重的总个数中所占比重小于设定比重阈值、且所述上一次训练后得到的第j组权重中的非零值不均小于置零权重阈值时,保持所述上一次训练后得到的第j组权重不变;或者The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the last training are not all less than the zero-setting weight threshold, keep the value obtained after the last training Group j weights remain unchanged; or
在所述上一次训练参照的第j组权重不全部为零、且所述上一次训练后得到的第j组权重中非零值的个数在所述上一次训练后得到的第j组权重的总个数中所占比重不小于设定比重阈值时,保持所述上一次训练后得到的第j组权重不变。The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number is not less than the set proportion threshold, keep the jth group weight obtained after the previous training unchanged.
在一种可选的实施方式中,所述权重调整单元802在判断所述上一次训练参照的第j组权重是否全部为零时具体用于:确定置零标记数据结构中第j组权重对应的置零标记是否为零;当所述置零标记为零时,判定所述上一次训练参照的第j组权重全部为零;当所述置零标记为非零值时,判定所述上一次训练参照的第j组权重不全为零。In an optional implementation manner, the weight adjustment unit 802 is specifically configured to determine whether the j-th group of weights in the zero-mark data structure corresponds to the j-th group of weights referenced in the previous training is all zero Whether the zero-setting flag is zero; when the zero-setting flag is zero, it is determined that the jth group of weights referred to in the previous training are all zero; when the zero-setting flag is a non-zero value, it is determined that the upper The weights of the jth group referred to in one training are not all zero.
在一种可选的实施方式中,所述权重调整单元802还用于在将所述上一次训练后得到的第j组权重全部置零之后,或者在将所述非零值的权重均置零之后,将当前的置零标记数据结构中第j组权重对应的置零标记更新为零;或者,所述权重调整单元802还用于在保持所述上一次训练后得到的第j组权重不变之后,将当前的置零标记数据结构中第j组权重对应的置零标记更新为非零值。In an optional implementation manner, the weight adjustment unit 802 is further used to set the weights of the j-th group obtained after the last training to zero, or to set the weights of the non-zero values to all After zero, the zero-setting flag corresponding to the j-th group of weights in the current zero-setting flag data structure is updated to zero; or, the weight adjustment unit 802 is further used to maintain the j-th group of weights obtained after the previous training After unchanged, the zero-setting mark corresponding to the j-th group of weights in the current zero-setting mark data structure is updated to a non-zero value.
采用本申请实施例提供的神经网络压缩装置,在进行神经网络压缩时,可以基于处理设备的能力信息去确定稀疏化单位长度,在训练过程中对基于稀疏化单位长度分组后的权重进行处理,可以根据处理设备的能力不同,使神经网络模型适配不同的处理设备的能力, 以使后续处理设备达到比较好的处理效果。Using the neural network compression device provided in the embodiment of the present application, when performing neural network compression, the sparse unit length can be determined based on the capability information of the processing device, and the weights after grouping based on the sparse unit length can be processed during the training process. According to the different capabilities of the processing equipment, the neural network model can be adapted to the capabilities of different processing equipment, so that the subsequent processing equipment can achieve a better processing effect.
基于上述实施例,本申请实施例还提供了一种数据处理装置,用于实现如图6所示的实施例提供的数据处理方法。参阅图9所示,所述数据处理装置900中包括获取单元901和处理单元902,其中:Based on the above embodiments, the embodiments of the present application further provide a data processing apparatus, which is used to implement the data processing method provided in the embodiment shown in FIG. 6. Referring to FIG. 9, the data processing apparatus 900 includes an acquiring unit 901 and a processing unit 902, where:
所述获取单元901用于获取目标神经网络模型的权重,所述目标神经网络模型是基于稀疏化单位长度对神经网络模型的权重神经网络分组后训练得到的最终神经网络模型;所述处理单元902用于基于所述目标神经网络模型的权重进行如下处理:在第p次处理时,判断第q组权重是否全部为零,若是则根据矩阵运算类型或者根据所述矩阵运算类型和待处理的矩阵数据生成第一运算结果并保存,否则根据所述第q组权重、所述待处理的矩阵数据和所述矩阵运算类型生成第二运算结果并保存;其中,所述稀疏化单位长度为基于处理设备的处理能力信息确定的,所述稀疏化单位长度为进行矩阵运算时一次运算的数据长度;第q组权重包括的权重的个数为稀疏化单位长度;所述q取遍1至f中的任意一个正整数,所述f为所述目标神经网络模型的所有权重按照所述稀疏化单位长度分组后的总组数;所述p取遍1到f中的任意一个正整数。The obtaining unit 901 is used to obtain the weight of the target neural network model. The target neural network model is a final neural network model obtained by training the weighted neural network of the neural network model based on the sparse unit length after grouping; the processing unit 902 It is used to perform the following processing based on the weights of the target neural network model: in the pth processing, determine whether the qth group of weights are all zero, and if so, according to the matrix operation type or according to the matrix operation type and the matrix to be processed Generate and save the first operation result of the data, otherwise generate and save the second operation result according to the qth group weights, the matrix data to be processed and the matrix operation type; wherein, the length of the sparse unit is based on processing Determined by the processing capability information of the device, the sparse unit length is the data length of one operation when performing matrix operation; the number of weights included in the qth group of weights is the sparse unit length; the q is taken from 1 to f Any positive integer of, where f is the total number of groups after the weight of the target neural network model is grouped according to the sparse unit length; p takes any positive integer from 1 to f.
在一种可选的实施方式中,所述处理单元902在判断第q组权重是否全部为零时具体用于:获取所述目标神经网络模型的权重对应的置零标记数据结构;判断所述置零标记数据结构中所述第q组权重对应的置零标记是否为零。In an optional implementation manner, the processing unit 902 is specifically configured to: when determining whether the q-th group weights are all zero: obtain a zero-labeled data structure corresponding to the weight of the target neural network model; determine the Whether the zero-setting mark corresponding to the q-th group weight in the zero-setting mark data structure is zero.
采用本申请实施例提供的数据处理装置,由于应用根据数据处理装置(即处理设备)的处理能力信息得到的稀疏化单位长度,对神经网络模型的权重神经网络分组后训练得到的最终神经网络模型,这样可以根据矩阵运算的特性,可以使后续应用该最终神经网络模型进行数据处理时,可以大量减少对数据的访问和计算量,从而可以提高运算速度。Using the data processing device provided in the embodiments of the present application, due to the application of the sparse unit length obtained according to the processing capability information of the data processing device (ie, processing device), the final neural network model obtained by training the neural network model after weighting the neural network model is grouped In this way, according to the characteristics of the matrix operation, the subsequent application of the final neural network model for data processing can greatly reduce the amount of data access and calculation, thereby improving the speed of operation.
需要说明的是,本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。在本申请的实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。It should be noted that the division of the units in the embodiments of the present application is schematic, and is only a division of logical functions, and there may be other division manners in actual implementation. The functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or software function unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application essentially or part of the contribution to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to enable a computer device (which may be a personal computer, server, or network device, etc.) or processor to execute all or part of the steps of the methods described in the embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program codes .
基于以上实施例,本申请实施例还提供了一种神经网络压缩装置,所述神经网络压缩装置,用于实现图3所示的神经网络压缩方法。参阅图10所示,所述神经网络压缩装置1000包括:处理器1001和存储器1002,其中:Based on the above embodiments, the embodiments of the present application further provide a neural network compression device, which is used to implement the neural network compression method shown in FIG. 3. Referring to FIG. 10, the neural network compression device 1000 includes: a processor 1001 and a memory 1002, where:
所述处理器1001可以是CPU,GPU或者CPU和GPU的组合。所述处理器1001还可 以是NPU、TPU等等支持神经网络处理的AI芯片。所述处理器1001还可以进一步包括硬件芯片。上述硬件芯片可以是ASIC,PLD,DSP或其组合。上述PLD可以是CPLD,FPGA,GAL或其任意组合。需要说明的是,所述处理器1001不限于上述列举的情况,所述处理器1001可以是能够实现上述图3所示的神经网络压缩方法的任何处理器件。The processor 1001 may be a CPU, GPU, or a combination of CPU and GPU. The processor 1001 may also be an AI chip that supports neural network processing such as NPU, TPU, and so on. The processor 1001 may further include a hardware chip. The above hardware chip may be ASIC, PLD, DSP or a combination thereof. The above PLD may be CPLD, FPGA, GAL or any combination thereof. It should be noted that the processor 1001 is not limited to the above enumerated cases, and the processor 1001 may be any processing device capable of implementing the neural network compression method shown in FIG. 3 described above.
所述处理器1001以及所述存储器1002之间相互连接。可选的,所述处理器1001以及所述存储器1002通过总线1003相互连接;所述总线1003可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,EISA)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图10中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The processor 1001 and the memory 1002 are connected to each other. Optionally, the processor 1001 and the memory 1002 are connected to each other through a bus 1003; the bus 1003 may be a peripheral component interconnection standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard structure (Extended Industry Standard Architecture) , EISA) bus and so on. The bus can be divided into an address bus, a data bus, and a control bus. For ease of representation, only a thick line is used in FIG. 10, but it does not mean that there is only one bus or one type of bus.
所述处理器1001用于实现本申请实施例提供的神经网络压缩方法时,执行以下操作:When the processor 1001 is used to implement the neural network compression method provided by the embodiment of the present application, it performs the following operations:
根据处理设备的处理能力信息,确定稀疏化单位长度,所述稀疏化单位长度为所述处理设备进行矩阵运算时一次运算的数据长度;Determine the sparse unit length according to the processing capability information of the processing device, where the sparse unit length is the data length of one operation when the processing device performs matrix operation;
在对神经网络模型进行当前次训练时,根据上一次训练参照的第j组权重,对上一次训练后得到的第j组权重进行调整,得到当前次训练参照的第j组权重;其中,第j组权重包括的权重个数为所述稀疏化单位长度;所述j取遍1至m中的任意一个正整数,所述m为对所述神经网络模型的所有权重按照所述稀疏化单位长度分组后得到的权重总组数;When performing the current training on the neural network model, the jth group weight obtained after the last training is adjusted according to the jth group weight referenced in the previous training to obtain the jth group weight referenced in the current training; The number of weights included in the set of j weights is the length of the sparse unit; the j takes any positive integer from 1 to m, where m is the weight of ownership of the neural network model in accordance with the sparse unit The total number of weight groups obtained after length grouping;
根据得到的当前次训练参照的各组权重,对所述神经网络模型进行当前次训练。Perform the current training on the neural network model according to the obtained weights of the groups referred to in the current training.
在一种可选的实施方式中,所述处理器1001还可以执行其他操作,具体可以参照以上图3所示的实施例中步骤301、步骤302和步骤303中涉及的具体描述,此处不再赘述。In an optional implementation manner, the processor 1001 may also perform other operations. For details, reference may be made to the specific descriptions involved in step 301, step 302, and step 303 in the embodiment shown in FIG. 3 above. Repeat again.
所述存储器1002,用于存放程序和数据等。具体地,程序可以包括程序代码,该程序代码包括计算机操作的指令。存储器1002可能包含随机存取存储器(random access memory,RAM),也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。处理器1001执行存储器1002所存放的程序,实现上述功能,从而实现如图3所示的神经网络压缩方法。The memory 1002 is used to store programs and data. Specifically, the program may include program code, and the program code includes instructions for computer operation. The memory 1002 may include random access memory (random access memory, RAM), or may also include non-volatile memory (non-volatile memory), for example, at least one disk memory. The processor 1001 executes the program stored in the memory 1002 to realize the above-mentioned functions, thereby implementing the neural network compression method shown in FIG. 3.
需要说明的是,当图10所示的神经网络压缩装置可以应用于终端设备时,所述神经网络压缩装置可以体现为图2所示的终端设备。此时,所述处理器1001可以与图2中示出的处理器210相同,所述存储器1002可以与图2中示出的存储器220相同。It should be noted that, when the neural network compression device shown in FIG. 10 can be applied to a terminal device, the neural network compression device may be embodied as the terminal device shown in FIG. 2. At this time, the processor 1001 may be the same as the processor 210 shown in FIG. 2, and the memory 1002 may be the same as the memory 220 shown in FIG. 2.
基于以上实施例,本申请实施例还提供了一种数据处理装置,所述数据处理装置,用于实现图4所示的数据处理方法。参阅图11所示,所述数据处理装置1100包括:处理器1101和存储器1102,其中:Based on the above embodiment, an embodiment of the present application further provides a data processing apparatus, which is used to implement the data processing method shown in FIG. 4. Referring to FIG. 11, the data processing device 1100 includes a processor 1101 and a memory 1102, where:
所述处理器1101可以是CPU,GPU或者CPU和GPU的组合。所述处理器1101还可以是NPU、TPU等等支持神经网络处理的AI芯片。所述处理器1101还可以进一步包括硬件芯片。上述硬件芯片可以是ASIC,PLD,DSP或其组合。上述PLD可以是CPLD,FPGA,GAL或其任意组合。需要说明的是,所述处理器1101不限于上述列举的情况,所述处理器1101可以是能够实现神经网络推理运算的任何处理器件。The processor 1101 may be a CPU, GPU, or a combination of CPU and GPU. The processor 1101 may also be an AI chip that supports neural network processing such as NPU, TPU, and so on. The processor 1101 may further include a hardware chip. The above hardware chip may be ASIC, PLD, DSP or a combination thereof. The above PLD may be CPLD, FPGA, GAL or any combination thereof. It should be noted that the processor 1101 is not limited to the above-mentioned cases, and the processor 1101 may be any processing device capable of implementing neural network inference operation.
所述处理器1101以及所述存储器1102之间相互连接。可选的,所述处理器1101以及所述存储器1102通过总线1103相互连接;所述总线1103可以是外设部件互连标准(Peripheral Component Interconnect,PCI)总线或扩展工业标准结构(Extended Industry  Standard Architecture,EISA)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图11中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The processor 1101 and the memory 1102 are connected to each other. Optionally, the processor 1101 and the memory 1102 are connected to each other through a bus 1103; the bus 1103 may be a Peripheral Component Interconnect (PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture) , EISA) bus and so on. The bus can be divided into an address bus, a data bus, and a control bus. For ease of representation, only a thick line is used in FIG. 11, but it does not mean that there is only one bus or one type of bus.
所述处理器1101用于实现本申请实施例提供的数据处理方法时,可以执行以下操作:When the processor 1101 is used to implement the data processing method provided by the embodiment of the present application, it may perform the following operations:
获取目标神经网络模型的权重,所述目标神经网络模型是基于稀疏化单位长度对神经网络模型的权重神经网络分组后训练得到的最终神经网络模型;所述稀疏化单位长度为基于处理设备的处理能力信息确定的,所述稀疏化单位长度为进行矩阵运算时一次运算的数据长度;Obtain the weight of the target neural network model. The target neural network model is the final neural network model obtained by training the weighted neural network based on the sparse unit length and grouping the weight of the neural network model; the sparse unit length is based on processing by the processing device If the capability information is determined, the length of the sparse unit is the data length of one operation when performing matrix operation;
基于所述目标神经网络模型的权重进行如下处理:在第p次处理时,判断第q组权重是否全部为零,若是则根据矩阵运算类型或者根据所述矩阵运算类型和待处理的矩阵数据生成第一运算结果并保存,否则根据所述第q组权重、所述待处理的矩阵数据和所述矩阵运算类型生成第二运算结果并保存;The following processing is performed based on the weights of the target neural network model: in the pth processing, it is determined whether the qth group of weights are all zero, and if so, it is generated according to the matrix operation type or according to the matrix operation type and the matrix data to be processed Save the first operation result, otherwise generate and save the second operation result according to the qth group weights, the matrix data to be processed and the matrix operation type;
其中,第q组权重包括的权重的个数为稀疏化单位长度;所述q取遍1至f中的任意一个正整数,所述f为所述目标神经网络模型的所有权重按照所述稀疏化单位长度分组后的总组数;所述p取遍1到f中的任意一个正整数。Among them, the number of weights included in the qth group of weights is the length of the sparse unit; the q takes any positive integer from 1 to f, and f is the weight of the target neural network model according to the sparse The total number of groups after unit length grouping; p takes any positive integer from 1 to f.
在一种可选的实施方式中,所述处理器1101还可以执行其他操作,具体可以参照以上图6所示的实施例中步骤601和步骤602中涉及的具体描述,此处不再赘述。In an optional implementation manner, the processor 1101 may also perform other operations. For details, reference may be made to the specific descriptions involved in step 601 and step 602 in the embodiment shown in FIG. 6 above, and details are not described herein again.
所述存储器1102,用于存放程序和数据等。具体地,程序可以包括程序代码,该程序代码包括计算机操作的指令。存储器1102可能包含随机存取存储器(random access memory,RAM),也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。处理器1101执行存储器1102所存放的程序,实现上述功能,从而实现如图6所示的数据处理方法。The memory 1102 is used to store programs and data. Specifically, the program may include program code, and the program code includes instructions for computer operation. The memory 1102 may include random access memory (random access memory, RAM), or may also include non-volatile memory (non-volatile memory), for example, at least one disk memory. The processor 1101 executes the program stored in the memory 1102 to realize the above functions, thereby implementing the data processing method shown in FIG. 6.
需要说明的是,当图11所示的数据处理装置可以应用于终端设备时,所述数据处理装置可以体现为图2所示的终端设备。此时,所述处理器1101可以与图2中示出的处理器210相同,所述存储器1102可以与图2中示出的存储器220相同。It should be noted that when the data processing apparatus shown in FIG. 11 can be applied to a terminal device, the data processing apparatus may be embodied as the terminal device shown in FIG. 2. At this time, the processor 1101 may be the same as the processor 210 shown in FIG. 2, and the memory 1102 may be the same as the memory 220 shown in FIG. 2.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Moreover, the present application may take the form of a computer program product implemented on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer usable program code.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。This application is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the application. It should be understood that each flow and/or block in the flowchart and/or block diagram and a combination of the flow and/or block in the flowchart and/or block diagram may be implemented by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, special-purpose computer, embedded processing machine, or other programmable data processing device to produce a machine that enables the generation of instructions executed by the processor of the computer or other programmable data processing device A device for realizing the functions specified in one block or multiple blocks of one flow or multiple flows of a flowchart and/or one block or multiple blocks of a block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个 方框中指定的功能。These computer program instructions may also be stored in a computer readable memory that can guide a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer readable memory produce an article of manufacture including an instruction device, the instructions The device implements the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, so that a series of operating steps are performed on the computer or other programmable device to produce computer-implemented processing, which is executed on the computer or other programmable device The instructions provide steps for implementing the functions specified in one block or multiple blocks of the flowchart one flow or multiple flows and/or block diagrams.
尽管已描述了本申请的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。Although the preferred embodiments of the present application have been described, those skilled in the art can make additional changes and modifications to these embodiments once they learn the basic inventive concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications falling within the scope of the present application.
显然,本领域的技术人员可以对本申请实施例进行各种改动和变型而不脱离本申请实施例的范围。这样,倘若本申请实施例的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various modifications and variations to the embodiments of the present application without departing from the scope of the embodiments of the present application. In this way, if these modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalent technologies, the present application is also intended to include these modifications and variations.

Claims (27)

  1. 一种神经网络压缩方法,其特征在于,包括:A neural network compression method, characterized in that it includes:
    根据处理设备的处理能力信息,确定稀疏化单位长度,所述稀疏化单位长度为所述处理设备进行矩阵运算时一次运算的数据长度;Determine the sparse unit length according to the processing capability information of the processing device, where the sparse unit length is the data length of one operation when the processing device performs matrix operation;
    在对神经网络模型进行当前次训练时,根据上一次训练参照的第j组权重,对上一次训练后得到的第j组权重进行调整,得到当前次训练参照的第j组权重;其中,第j组权重包括的权重个数为所述稀疏化单位长度;所述j取遍1至m中的任意一个正整数,所述m为对所述神经网络模型的所有权重按照所述稀疏化单位长度分组后得到的权重总组数;When performing the current training on the neural network model, the jth group weight obtained after the last training is adjusted according to the jth group weight referenced in the previous training to obtain the jth group weight referenced in the current training; The number of weights included in the set of j weights is the length of the sparse unit; the j takes any positive integer from 1 to m, where m is the weight of ownership of the neural network model in accordance with the sparse unit The total number of weight groups obtained after length grouping;
    根据得到的当前次训练参照的各组权重,对所述神经网络模型进行当前次训练。Perform the current training on the neural network model according to the obtained weights of the groups referred to in the current training.
  2. 如权利要求1所述的方法,其特征在于,根据处理设备的处理能力信息,确定稀疏化单位长度,包括:The method according to claim 1, wherein determining the length of the sparse unit based on the processing capability information of the processing device includes:
    确定所述处理设备中寄存器的长度或者所述处理设备中指令集一次处理的最大数据长度;Determine the length of the register in the processing device or the maximum length of data processed by the instruction set in the processing device at one time;
    将所述寄存器的长度或者所述指令集一次处理的最大数据长度作为所述稀疏化单位长度。The length of the register or the maximum data length processed by the instruction set at a time is used as the sparse unit length.
  3. 如权利要求1或2所述的方法,其特征在于,在对所述神经网络进行首次训练之前,还包括:The method according to claim 1 or 2, wherein before the first training of the neural network, the method further comprises:
    根据初始神经网络模型的初始权重阈值,对所述初始神经网络模型的所有权重进行剪裁。According to the initial weight threshold of the initial neural network model, the ownership weight of the initial neural network model is tailored.
  4. 如权利要求1-3任一项所述的方法,其特征在于,根据上一次训练参照的第j组权重,对上一次训练后得到的第j组权重进行调整,包括:The method according to any one of claims 1 to 3, wherein adjusting the jth group weight obtained after the last training according to the jth group weight referred to in the previous training includes:
    在所述上一次训练参照的第j组权重全部为零、且所述上一次训练后得到的第j组权重全部小于置零权重阈值时,将所述上一次训练后得到的第j组权重全部置零;或者When the jth group weights referred to in the previous training are all zero, and the jth group weights obtained after the last training are all less than the zero-setting weight threshold, the jth group weights obtained after the last training Zero all; or
    在所述上一次训练参照的第j组权重全部为零、且所述上一次训练后得到的第j组权重不全部都小于置零权重阈值时,保持所述上一次训练后得到的第j组权重不变;或者When the weights of the jth group referenced in the previous training are all zero, and not all the weights of the jth group obtained after the last training are less than the zero-setting weight threshold, the jth group obtained after the last training is maintained The group weight is unchanged; or
    在所述上一次训练参照的第j组权重不全部为零、且所述上一次训练后得到的第j组权重中非零值的个数在所述上一次训练后得到的第j组权重的总个数中所占比重小于设定比重阈值、且所述上一次训练后得到的第j组权重中的非零值均小于置零权重阈值时,将所述上一次训练后得到的第j组权重中的所述非零值的权重均置零;或者The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number of is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the last training are all less than the zero-setting weight threshold, the The weights of the non-zero values in the group j weights are all set to zero; or
    在所述上一次训练参照的第j组权重不全部为零、且所述上一次训练后得到的第j组权重中非零值的个数在所述上一次训练后得到的第j组权重的总个数中所占比重小于设定比重阈值、且所述上一次训练后得到的第j组权重中的非零值不均小于置零权重阈值时,保持所述上一次训练后得到的第j组权重不变;或者The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the last training are not all less than the zero-setting weight threshold, keep the value obtained after the last training Group j weights remain unchanged; or
    在所述上一次训练参照的第j组权重不全部为零、且所述上一次训练后得到的第j组权重中非零值的个数在所述上一次训练后得到的第j组权重的总个数中所占比重不小于设定比重阈值时,保持所述上一次训练后得到的第j组权重不变。The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number is not less than the set proportion threshold, keep the jth group weight obtained after the previous training unchanged.
  5. 如权利要求4所述的方法,其特征在于,判断所述上一次训练参照的第j组权重是否全部为零,包括:The method according to claim 4, wherein determining whether the jth group weights referred to in the previous training are all zero includes:
    确定置零标记数据结构中第j组权重对应的置零标记是否为零;Determine whether the zero-setting mark corresponding to the j-th group of weights in the zero-setting mark data structure is zero;
    当所述置零标记为零时,判定所述上一次训练参照的第j组权重全部为零;When the zero-setting flag is zero, it is determined that the weight of the jth group referenced in the previous training is all zero;
    当所述置零标记为非零值时,判定所述上一次训练参照的第j组权重不全为零。When the zero-setting flag is a non-zero value, it is determined that the j-th group of weights referred to in the previous training is not all zero.
  6. 如权利要求4或5所述的方法,其特征在于,在将所述上一次训练后得到的第j组权重全部置零之后,或者在将所述非零值的权重均置零之后,还包括:The method according to claim 4 or 5, characterized in that after all the j-th group weights obtained after the last training are set to zero, or after all non-zero weights are set to zero, include:
    将当前的置零标记数据结构中第j组权重对应的置零标记更新为零;或者Update the zero mark corresponding to the j-th group of weights in the current zero mark data structure to zero; or
    在保持所述上一次训练后得到的第j组权重不变之后,还包括:After keeping the j-th group weight obtained after the previous training unchanged, the method further includes:
    将当前的置零标记数据结构中第j组权重对应的置零标记更新为非零值。Update the zero-setting mark corresponding to the j-th group of weights in the current zero-setting mark data structure to a non-zero value.
  7. 一种数据处理方法,其特征在于,包括:A data processing method, characterized in that it includes:
    获取目标神经网络模型的权重,所述目标神经网络模型是基于稀疏化单位长度对神经网络模型的权重神经网络分组后训练得到的最终神经网络模型;所述稀疏化单位长度为基于处理设备的处理能力信息确定的,所述稀疏化单位长度为进行矩阵运算时一次运算的数据长度;Obtain the weight of the target neural network model. The target neural network model is the final neural network model obtained by training the weighted neural network based on the sparse unit length and grouping the weight of the neural network model; the sparse unit length is based on processing by the processing device If the capability information is determined, the length of the sparse unit is the data length of one operation when performing matrix operation;
    基于所述目标神经网络模型的权重进行如下处理:Perform the following processing based on the weights of the target neural network model:
    在第p次处理时,判断第q组权重是否全部为零,若是则根据矩阵运算类型或者根据所述矩阵运算类型和待处理的矩阵数据生成第一运算结果并保存,否则根据所述第q组权重、所述待处理的矩阵数据和所述矩阵运算类型生成第二运算结果并保存;During the p-th processing, it is judged whether the q-th group of weights are all zero, if so, the first operation result is generated and saved according to the matrix operation type or according to the matrix operation type and the matrix data to be processed, otherwise according to the q-th Group weight, the matrix data to be processed and the matrix operation type generate a second operation result and save it;
    其中,第q组权重包括的权重的个数为稀疏化单位长度;所述q取遍1至f中的任意一个正整数,所述f为所述目标神经网络模型的所有权重按照所述稀疏化单位长度分组后的总组数;所述p取遍1到f中的任意一个正整数。Among them, the number of weights included in the qth group of weights is the length of the sparse unit; the q takes any positive integer from 1 to f, and f is the weight of the target neural network model according to the sparse The total number of groups after unit length grouping; p takes any positive integer from 1 to f.
  8. 如权利要求7所述的方法,其特征在于,判断第q组权重是否全部为零,包括:The method according to claim 7, wherein determining whether the q-th group weights are all zero includes:
    获取所述目标神经网络模型的权重对应的置零标记数据结构;Obtain the zero-mark data structure corresponding to the weight of the target neural network model;
    判断所述置零标记数据结构中所述第q组权重对应的置零标记是否为零。Judging whether the zero-setting mark corresponding to the q-th group weight in the zero-setting mark data structure is zero.
  9. 一种神经网络压缩装置,其特征在于,包括:A neural network compression device, characterized in that it includes:
    确定单元,用于根据处理设备的处理能力信息,确定稀疏化单位长度,所述稀疏化单位长度为所述处理设备进行矩阵运算时一次运算的数据长度;A determining unit, configured to determine the sparse unit length according to the processing capability information of the processing device, where the sparse unit length is the data length of one operation when the processing device performs matrix operation;
    权重调整单元,用于在对神经网络模型进行当前次训练时,根据上一次训练参照的第j组权重,对上一次训练后得到的第j组权重进行调整,得到当前次训练参照的第j组权重;其中,第j组权重包括的权重个数为所述稀疏化单位长度;所述j取遍1至m中的任意一个正整数,所述m为对所述神经网络模型的所有权重按照所述稀疏化单位长度分组后得到的权重总组数;The weight adjustment unit is used to adjust the weight of the jth group obtained after the last training according to the weight of the jth group referenced in the previous training when the current training of the neural network model is performed, to obtain the jth group of the current training reference Group weight; wherein, the number of weights included in the jth group weight is the length of the sparse unit; the j takes any positive integer from 1 to m, where m is the weight of ownership of the neural network model The total number of weight groups obtained after grouping according to the sparse unit length;
    训练单元,用于根据权重调整单元得到的当前次训练参照的各组权重,对所述神经网络模型进行当前次训练。The training unit is configured to perform the current training on the neural network model according to each group of weights referenced by the current adjustment training unit obtained by the weight adjustment unit.
  10. 如权利要求9所述的装置,其特征在于,所述确定单元,在根据处理设备的处理能力信息确定稀疏化单位长度时,具体用于:The apparatus according to claim 9, wherein the determining unit, when determining the sparse unit length according to the processing capability information of the processing device, is specifically used to:
    确定所述处理设备中寄存器的长度或者所述处理设备中指令集一次处理的最大数据长度;Determine the length of the register in the processing device or the maximum length of data processed by the instruction set in the processing device at one time;
    将所述寄存器的长度或者所述指令集一次处理的最大数据长度作为所述稀疏化单位长度。The length of the register or the maximum data length processed by the instruction set at a time is used as the sparse unit length.
  11. 如权利要求9或10所述的装置,其特征在于,还包括:The device according to claim 9 or 10, further comprising:
    权重剪裁单元,用于在所述训练单元对所述神经网络进行首次训练之前,根据初始神经网络模型的初始权重阈值,对所述初始神经网络模型的所有权重进行剪裁。The weight trimming unit is configured to trim the weight of the initial neural network model according to the initial weight threshold of the initial neural network model before the training unit performs the first training on the neural network.
  12. 如权利要求9-11任一项所述的装置,其特征在于,所述权重调整单元,在根据上一次训练参照的第j组权重,对上一次训练后得到的第j组权重进行调整时,具体用于:The apparatus according to any one of claims 9-11, wherein the weight adjustment unit adjusts the jth group weight obtained after the last training based on the jth group weight referred to in the previous training , Specifically for:
    在所述上一次训练参照的第j组权重全部为零、且所述上一次训练后得到的第j组权重全部小于置零权重阈值时,将所述上一次训练后得到的第j组权重全部置零;或者When the jth group weights referred to in the previous training are all zero, and the jth group weights obtained after the last training are all less than the zero-setting weight threshold, the jth group weights obtained after the last training Zero all; or
    在所述上一次训练参照的第j组权重全部为零、且所述上一次训练后得到的第j组权重不全部都小于置零权重阈值时,保持所述上一次训练后得到的第j组权重不变;或者When the weights of the jth group referenced in the previous training are all zero, and not all the weights of the jth group obtained after the last training are less than the zero-setting weight threshold, the jth group obtained after the last training is maintained The group weight is unchanged; or
    在所述上一次训练参照的第j组权重不全部为零、且所述上一次训练后得到的第j组权重中非零值的个数在所述上一次训练后得到的第j组权重的总个数中所占比重小于设定比重阈值、且所述上一次训练后得到的第j组权重中的非零值均小于置零权重阈值时,将所述上一次训练后得到的第j组权重中的所述非零值的权重均置零;或者The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number of is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the last training are all less than the zero-setting weight threshold, the The weights of the non-zero values in the group j weights are all set to zero; or
    在所述上一次训练参照的第j组权重不全部为零、且所述上一次训练后得到的第j组权重中非零值的个数在所述上一次训练后得到的第j组权重的总个数中所占比重小于设定比重阈值、且所述上一次训练后得到的第j组权重中的非零值不均小于置零权重阈值时,保持所述上一次训练后得到的第j组权重不变;或者The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the last training are not all less than the zero-setting weight threshold, keep the value obtained after the last training Group j weights remain unchanged; or
    在所述上一次训练参照的第j组权重不全部为零、且所述上一次训练后得到的第j组权重中非零值的个数在所述上一次训练后得到的第j组权重的总个数中所占比重不小于设定比重阈值时,保持所述上一次训练后得到的第j组权重不变。The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number is not less than the set proportion threshold, keep the jth group weight obtained after the previous training unchanged.
  13. 如权利要求12所述的装置,其特征在于,所述权重调整单元,在判断所述上一次训练参照的第j组权重是否全部为零时,具体用于:The apparatus according to claim 12, wherein the weight adjustment unit is specifically used to determine whether the weights of the jth group referred to in the previous training are all zero:
    确定置零标记数据结构中第j组权重对应的置零标记是否为零;Determine whether the zero-setting mark corresponding to the j-th group of weights in the zero-setting mark data structure is zero;
    当所述置零标记为零时,判定所述上一次训练参照的第j组权重全部为零;When the zero-setting flag is zero, it is determined that the weight of the jth group referenced in the previous training is all zero;
    当所述置零标记为非零值时,判定所述上一次训练参照的第j组权重不全为零。When the zero-setting flag is a non-zero value, it is determined that the j-th group of weights referred to in the previous training is not all zero.
  14. 如权利要求12或13所述的装置,其特征在于,所述权重调整单元,还用于:The apparatus according to claim 12 or 13, wherein the weight adjustment unit is further used to:
    在将所述上一次训练后得到的第j组权重全部置零之后,或者在将所述非零值的权重均置零之后,将当前的置零标记数据结构中第j组权重对应的置零标记更新为零;或者After the j-th set of weights obtained after the previous training are all set to zero, or after the non-zero-valued weights are all set to zero, the set corresponding to the j-th set of weights in the current zero-marking data structure is set The zero mark is updated to zero; or
    所述权重调整单元,还用于:The weight adjustment unit is also used to:
    在保持所述上一次训练后得到的第j组权重不变之后,将当前的置零标记数据结构中第j组权重对应的置零标记更新为非零值。After keeping the j-th group weight obtained after the previous training unchanged, the zero-setting flag corresponding to the j-th group weight in the current zero-setting flag data structure is updated to a non-zero value.
  15. 一种数据处理装置,其特征在于,包括:A data processing device, characterized in that it includes:
    获取单元,用于获取目标神经网络模型的权重,所述目标神经网络模型是基于稀疏化单位长度对神经网络模型的权重神经网络分组后训练得到的最终神经网络模型;所述稀疏化单位长度为基于处理设备的处理能力信息确定的,所述稀疏化单位长度为进行矩阵运算时一次运算的数据长度;An obtaining unit, used to obtain the weight of the target neural network model, the target neural network model is a final neural network model trained by grouping the weighted neural network of the neural network model based on the sparse unit length; the sparse unit length is Determined based on the processing capability information of the processing device, the sparse unit length is the data length of one operation when performing matrix operation;
    处理单元,用于基于所述目标神经网络模型的权重进行如下处理:在第p次处理时,判断第q组权重是否全部为零,若是则根据矩阵运算类型或者根据所述矩阵运算类型和待处理的矩阵数据生成第一运算结果并保存,否则根据所述第q组权重、所述待处理的矩阵 数据和所述矩阵运算类型生成第二运算结果并保存;The processing unit is configured to perform the following processing based on the weights of the target neural network model: in the pth processing, determine whether the qth group of weights are all zero, and if so, according to the matrix operation type or according to the matrix operation type and the pending The processed matrix data generates a first operation result and saves it; otherwise, generates and saves a second operation result according to the qth group weights, the matrix data to be processed, and the matrix operation type;
    其中,第q组权重包括的权重的个数为稀疏化单位长度;所述q取遍1至f中的任意一个正整数,所述f为所述目标神经网络模型的所有权重按照所述稀疏化单位长度分组后的总组数;所述p取遍1到f中的任意一个正整数。Among them, the number of weights included in the qth group of weights is the length of the sparse unit; the q takes any positive integer from 1 to f, and f is the weight of the target neural network model according to the sparse The total number of groups after unit length grouping; p takes any positive integer from 1 to f.
  16. 如权利要求15所述的装置,其特征在于,所述处理单元,在判断第q组权重是否全部为零时,具体用于:The apparatus according to claim 15, wherein the processing unit, when determining whether the q-th group weights are all zero, is specifically used for:
    获取所述目标神经网络模型的权重对应的置零标记数据结构;Obtain the zero-mark data structure corresponding to the weight of the target neural network model;
    判断所述置零标记数据结构中所述第q组权重对应的置零标记是否为零。Judging whether the zero-setting mark corresponding to the q-th group weight in the zero-setting mark data structure is zero.
  17. 一种神经网络压缩装置,其特征在于,包括:A neural network compression device, characterized in that it includes:
    存储器,用于存储程序指令;Memory, used to store program instructions;
    处理器,用于与所述存储器耦合,调用所述存储器中的程序指令,执行以下操作:A processor, configured to couple with the memory, call program instructions in the memory, and perform the following operations:
    根据处理设备的处理能力信息,确定稀疏化单位长度,所述稀疏化单位长度为所述处理设备进行矩阵运算时一次运算的数据长度;Determine the sparse unit length according to the processing capability information of the processing device, where the sparse unit length is the data length of one operation when the processing device performs matrix operation;
    在对神经网络模型进行当前次训练时,根据上一次训练参照的第j组权重,对上一次训练后得到的第j组权重进行调整,得到当前次训练参照的第j组权重;其中,第j组权重包括的权重个数为所述稀疏化单位长度;所述j取遍1至m中的任意一个正整数,所述m为对所述神经网络模型的所有权重按照所述稀疏化单位长度分组后得到的权重总组数;When performing the current training on the neural network model, the jth group weight obtained after the last training is adjusted according to the jth group weight referenced in the previous training to obtain the jth group weight referenced in the current training; The number of weights included in the set of j weights is the length of the sparse unit; the j takes any positive integer from 1 to m, where m is the weight of ownership of the neural network model in accordance with the sparse unit The total number of weight groups obtained after length grouping;
    根据得到的当前次训练参照的各组权重,对所述神经网络模型进行当前次训练。Perform the current training on the neural network model according to the obtained weights of the groups referred to in the current training.
  18. 如权利要求17所述的装置,其特征在于,所述处理器,在根据处理设备的处理能力信息确定稀疏化单位长度时,具体用于:The apparatus according to claim 17, wherein the processor, when determining the sparse unit length according to the processing capability information of the processing device, is specifically used to:
    确定所述处理设备中寄存器的长度或者所述处理设备中指令集一次处理的最大数据长度;Determine the length of the register in the processing device or the maximum length of data processed by the instruction set in the processing device at one time;
    将所述寄存器的长度或者所述指令集一次处理的最大数据长度作为所述稀疏化单位长度。The length of the register or the maximum data length processed by the instruction set at a time is used as the sparse unit length.
  19. 如权利要求17或18所述的装置,其特征在于,所述处理器,还用于:The apparatus according to claim 17 or 18, wherein the processor is further used to:
    在对所述神经网络进行首次训练之前,根据初始神经网络模型的初始权重阈值,对所述初始神经网络模型的所有权重进行剪裁。Before the first training of the neural network, according to the initial weight threshold of the initial neural network model, the ownership weight of the initial neural network model is tailored.
  20. 如权利要求17-19任一项所述的装置,其特征在于,所述处理器,在根据上一次训练参照的第j组权重,对上一次训练后得到的第j组权重进行调整时,具体用于:The apparatus according to any one of claims 17-19, wherein the processor, when adjusting the jth group weight obtained after the last training based on the jth group weight referenced by the last training, Specifically used for:
    在所述上一次训练参照的第j组权重全部为零、且所述上一次训练后得到的第j组权重全部小于置零权重阈值时,将所述上一次训练后得到的第j组权重全部置零;或者When the jth group weights referred to in the previous training are all zero, and the jth group weights obtained after the last training are all less than the zero-setting weight threshold, the jth group weights obtained after the last training Zero all; or
    在所述上一次训练参照的第j组权重全部为零、且所述上一次训练后得到的第j组权重不全部都小于置零权重阈值时,保持所述上一次训练后得到的第j组权重不变;或者When the weights of the jth group referenced in the previous training are all zero, and not all the weights of the jth group obtained after the last training are less than the zero-setting weight threshold, the jth group obtained after the last training is maintained The group weight is unchanged; or
    在所述上一次训练参照的第j组权重不全部为零、且所述上一次训练后得到的第j组权重中非零值的个数在所述上一次训练后得到的第j组权重的总个数中所占比重小于设定比重阈值、且所述上一次训练后得到的第j组权重中的非零值均小于置零权重阈值时,将所述上一次训练后得到的第j组权重中的所述非零值的权重均置零;或者The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number of is less than the set weight threshold, and the non-zero values in the jth group of weights obtained after the last training are all less than the zero-setting weight threshold, the The weights of the non-zero values in the group j weights are all set to zero; or
    在所述上一次训练参照的第j组权重不全部为零、且所述上一次训练后得到的第j组权重中非零值的个数在所述上一次训练后得到的第j组权重的总个数中所占比重小于设定 比重阈值、且所述上一次训练后得到的第j组权重中的非零值不均小于置零权重阈值时,保持所述上一次训练后得到的第j组权重不变;或者The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number of is less than the set weight threshold, and the non-zero values in the j-th group of weights obtained after the last training are not all less than the zero-setting weight threshold, keep the value obtained after the last training Group j weights remain unchanged; or
    在所述上一次训练参照的第j组权重不全部为零、且所述上一次训练后得到的第j组权重中非零值的个数在所述上一次训练后得到的第j组权重的总个数中所占比重不小于设定比重阈值时,保持所述上一次训练后得到的第j组权重不变。The jth group weights referenced in the previous training are not all zero, and the number of non-zero values in the jth group weights obtained after the last training is the jth group weights obtained after the last training When the proportion of the total number is not less than the set proportion threshold, keep the jth group weight obtained after the previous training unchanged.
  21. 如权利要求20所述的装置,其特征在于,所述处理器,在判断所述上一次训练参照的第j组权重是否全部为零时,具体用于:The apparatus according to claim 20, wherein the processor, when judging whether the weights of the jth group referred to in the previous training are all zero, is specifically used for:
    确定置零标记数据结构中第j组权重对应的置零标记是否为零;Determine whether the zero mark corresponding to the j-th group of weights in the zero mark data structure is zero;
    当所述置零标记为零时,判定所述上一次训练参照的第j组权重全部为零;When the zero-setting flag is zero, it is determined that the weights of the jth group referred to in the previous training are all zero;
    当所述置零标记为非零值时,判定所述上一次训练参照的第j组权重不全为零。When the zero-setting flag is a non-zero value, it is determined that the j-th group of weights referred to in the previous training is not all zero.
  22. 如权利要求20或21所述的装置,其特征在于,所述处理器,还用于:The apparatus according to claim 20 or 21, wherein the processor is further used to:
    在将所述上一次训练后得到的第j组权重全部置零之后,或者在将所述非零值的权重均置零之后,将当前的置零标记数据结构中第j组权重对应的置零标记更新为零;或者After the j-th set of weights obtained after the previous training are all set to zero, or after the non-zero-valued weights are all set to zero, the set corresponding to the j-th set of weights in the current zero-marking data structure is set The zero mark is updated to zero; or
    所述处理器,还用于:The processor is also used to:
    在保持所述上一次训练后得到的第j组权重不变之后,将当前的置零标记数据结构中第j组权重对应的置零标记更新为非零值。After keeping the j-th group weight obtained after the previous training unchanged, the zero-setting flag corresponding to the j-th group weight in the current zero-setting flag data structure is updated to a non-zero value.
  23. 一种数据处理装置,其特征在于,包括:A data processing device, characterized in that it includes:
    存储器,用于存储程序指令;Memory, used to store program instructions;
    处理器,用于与所述存储器耦合,调用所述存储器中的程序指令,执行以下操作:A processor, configured to couple with the memory, call program instructions in the memory, and perform the following operations:
    获取目标神经网络模型的权重,所述目标神经网络模型是基于稀疏化单位长度对神经网络模型的权重神经网络分组后训练得到的最终神经网络模型;所述稀疏化单位长度为基于处理设备的处理能力信息确定的,所述稀疏化单位长度为进行矩阵运算时一次运算的数据长度;Obtain the weight of the target neural network model. The target neural network model is the final neural network model obtained by training the weighted neural network based on the sparse unit length and grouping the weight of the neural network model; the sparse unit length is based on processing by the processing device If the capability information is determined, the length of the sparse unit is the data length of one operation when performing matrix operation;
    基于所述目标神经网络模型的权重进行如下处理:在第p次处理时,判断第q组权重是否全部为零,若是则根据矩阵运算类型或者根据所述矩阵运算类型和待处理的矩阵数据生成第一运算结果并保存,否则根据所述第q组权重、所述待处理的矩阵数据和所述矩阵运算类型生成第二运算结果并保存;The following processing is performed based on the weights of the target neural network model: in the pth processing, it is determined whether the qth group of weights are all zero, and if so, it is generated according to the matrix operation type or according to the matrix operation type and the matrix data to be processed Save the first operation result, otherwise generate and save the second operation result according to the qth group weights, the matrix data to be processed and the matrix operation type;
    其中,第q组权重包括的权重的个数为稀疏化单位长度;所述q取遍1至f中的任意一个正整数,所述f为所述目标神经网络模型的所有权重按照所述稀疏化单位长度分组后的总组数;所述p取遍1到f中的任意一个正整数。Among them, the number of weights included in the qth group of weights is the length of the sparse unit; the q takes any positive integer from 1 to f, and f is the weight of the target neural network model according to the sparse The total number of groups after unit length grouping; p takes any positive integer from 1 to f.
  24. 如权利要求23所述的装置,其特征在于,所述处理器,在判断第q组权重是否全部为零时,具体用于:The apparatus of claim 23, wherein the processor, when determining whether the q-th group weights are all zero, is specifically used to:
    获取所述目标神经网络模型的权重对应的置零标记数据结构;Obtain the zero-mark data structure corresponding to the weight of the target neural network model;
    判断所述置零标记数据结构中所述第q组权重对应的置零标记是否为零。Judging whether the zero-setting mark corresponding to the q-th group weight in the zero-setting mark data structure is zero.
  25. 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得计算机执行如权利要求1-8任一项所述的方法。A computer program product containing instructions, characterized in that, when the computer program product runs on a computer, it causes the computer to execute the method according to any one of claims 1-8.
  26. 一种计算机存储介质,其特征在于,所述计算机存储介质中存储有计算机程序,所述计算机程序被计算机执行时,使得所述计算机执行如权利要求1-8任一项所述的方法。A computer storage medium, characterized in that a computer program is stored in the computer storage medium, and when the computer program is executed by a computer, the computer is caused to execute the method according to any one of claims 1-8.
  27. 一种芯片,其特征在于,所述芯片与存储器耦合,用于读取并执行所述存储器中存储的程序指令,以实现如权利要求1-8任一项所述的方法。A chip, characterized in that the chip is coupled to a memory, and is used to read and execute program instructions stored in the memory to implement the method according to any one of claims 1-8.
PCT/CN2018/125812 2018-12-29 2018-12-29 Neural network compression method and apparatus WO2020133492A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880099983.5A CN113168554B (en) 2018-12-29 2018-12-29 Neural network compression method and device
PCT/CN2018/125812 WO2020133492A1 (en) 2018-12-29 2018-12-29 Neural network compression method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/125812 WO2020133492A1 (en) 2018-12-29 2018-12-29 Neural network compression method and apparatus

Publications (1)

Publication Number Publication Date
WO2020133492A1 true WO2020133492A1 (en) 2020-07-02

Family

ID=71127997

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/125812 WO2020133492A1 (en) 2018-12-29 2018-12-29 Neural network compression method and apparatus

Country Status (2)

Country Link
CN (1) CN113168554B (en)
WO (1) WO2020133492A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114580630A (en) * 2022-03-01 2022-06-03 厦门大学 Neural network model training method and graph classification method for AI chip design
EP4191478A1 (en) * 2021-12-02 2023-06-07 Beijing Baidu Netcom Science Technology Co., Ltd. Method and apparatus for compressing neural network model
CN114580630B (en) * 2022-03-01 2024-05-31 厦门大学 Neural network model training method and graph classification method for AI chip design

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116383666B (en) * 2023-05-23 2024-04-19 重庆大学 Power data prediction method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107229967A (en) * 2016-08-22 2017-10-03 北京深鉴智能科技有限公司 A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA
CN107239824A (en) * 2016-12-05 2017-10-10 北京深鉴智能科技有限公司 Apparatus and method for realizing sparse convolution neutral net accelerator
CN107239825A (en) * 2016-08-22 2017-10-10 北京深鉴智能科技有限公司 Consider the deep neural network compression method of load balancing
CN107688850A (en) * 2017-08-08 2018-02-13 北京深鉴科技有限公司 A kind of deep neural network compression method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8700552B2 (en) * 2011-11-28 2014-04-15 Microsoft Corporation Exploiting sparseness in training deep neural networks
WO2018107414A1 (en) * 2016-12-15 2018-06-21 上海寒武纪信息科技有限公司 Apparatus, equipment and method for compressing/decompressing neural network model
CN107909147A (en) * 2017-11-16 2018-04-13 深圳市华尊科技股份有限公司 A kind of data processing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107229967A (en) * 2016-08-22 2017-10-03 北京深鉴智能科技有限公司 A kind of hardware accelerator and method that rarefaction GRU neutral nets are realized based on FPGA
CN107239825A (en) * 2016-08-22 2017-10-10 北京深鉴智能科技有限公司 Consider the deep neural network compression method of load balancing
CN107239824A (en) * 2016-12-05 2017-10-10 北京深鉴智能科技有限公司 Apparatus and method for realizing sparse convolution neutral net accelerator
CN107688850A (en) * 2017-08-08 2018-02-13 北京深鉴科技有限公司 A kind of deep neural network compression method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4191478A1 (en) * 2021-12-02 2023-06-07 Beijing Baidu Netcom Science Technology Co., Ltd. Method and apparatus for compressing neural network model
US11861498B2 (en) 2021-12-02 2024-01-02 Beijing Baidu Netcom Science Technology Co., Ltd. Method and apparatus for compressing neural network model
CN114580630A (en) * 2022-03-01 2022-06-03 厦门大学 Neural network model training method and graph classification method for AI chip design
CN114580630B (en) * 2022-03-01 2024-05-31 厦门大学 Neural network model training method and graph classification method for AI chip design

Also Published As

Publication number Publication date
CN113168554A (en) 2021-07-23
CN113168554B (en) 2023-11-28

Similar Documents

Publication Publication Date Title
US11651259B2 (en) Neural architecture search for convolutional neural networks
US20190087713A1 (en) Compression of sparse deep convolutional network weights
CN106575377B (en) Classifier updates on common features
US9600762B2 (en) Defining dynamics of multiple neurons
CN110399487B (en) Text classification method and device, electronic equipment and storage medium
CN109886422A (en) Model configuration method, device, electronic equipment and read/write memory medium
US20210312295A1 (en) Information processing method, information processing device, and information processing program
WO2020133492A1 (en) Neural network compression method and apparatus
EP3685266A1 (en) Power state control of a mobile device
CN101833691A (en) Realizing method of least square support vector machine serial structure based on EPGA (Filed Programmable Gate Array)
KR20220009682A (en) Method and system for distributed machine learning
US20150278683A1 (en) Plastic synapse management
CN112269875B (en) Text classification method, device, electronic equipment and storage medium
CN113742069A (en) Capacity prediction method and device based on artificial intelligence and storage medium
CN112700006A (en) Network architecture searching method, device, electronic equipment and medium
US20220335293A1 (en) Method of optimizing neural network model that is pre-trained, method of providing a graphical user interface related to optimizing neural network model, and neural network model processing system performing the same
CN112766462A (en) Data processing method, device and computer readable storage medium
WO2020133364A1 (en) Neural network compression method and apparatus
Bao et al. Multi-grained Pruning Method of Convolutional Neural Network
EP4283522A1 (en) Spiking neural network circuit and spiking neural network-based calculation method
US20240020510A1 (en) System and method for execution of inference models across multiple data processing systems
US20230351165A1 (en) Method for operating neural network
US20230014656A1 (en) Power efficient register files for deep neural network (dnn) accelerator
US20240144649A1 (en) Image classification method, electronic device and storage medium
CN113162780B (en) Real-time network congestion analysis method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18945108

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18945108

Country of ref document: EP

Kind code of ref document: A1