WO2020172829A1 - 一种神经网络模型处理方法及装置 - Google Patents

一种神经网络模型处理方法及装置 Download PDF

Info

Publication number
WO2020172829A1
WO2020172829A1 PCT/CN2019/076374 CN2019076374W WO2020172829A1 WO 2020172829 A1 WO2020172829 A1 WO 2020172829A1 CN 2019076374 W CN2019076374 W CN 2019076374W WO 2020172829 A1 WO2020172829 A1 WO 2020172829A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
neural network
network model
operation layer
low
Prior art date
Application number
PCT/CN2019/076374
Other languages
English (en)
French (fr)
Inventor
隋志成
周力
赵磊
刘默翰
俞清华
蒋洪睿
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to US17/434,563 priority Critical patent/US20220121936A1/en
Priority to PCT/CN2019/076374 priority patent/WO2020172829A1/zh
Priority to EP19917294.1A priority patent/EP3907662A4/en
Priority to CN201980031862.1A priority patent/CN112189205A/zh
Publication of WO2020172829A1 publication Critical patent/WO2020172829A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • This application relates to the field of neural network technology, and in particular to a neural network model processing method and device.
  • AI artificial intelligence
  • NN neural network
  • the parameters of the neural network model are usually in the order of millions, tens of millions or hundreds of millions, so the storage and computing capabilities of the terminal equipment running AI applications that use the neural network model are relatively high, which limits the use of the neural network model on the terminal equipment usage of.
  • the method of reducing the parameters of the neural network model is usually used to achieve the purpose of compressing the neural network model.
  • the neural network model can be compressed to a certain extent in this way, the accuracy and effectiveness of the neural network model will be reduced after the neural network model is compressed by this method.
  • This application provides a neural network model processing method and device for compressing the neural network model without reducing the accuracy and effectiveness of the neural network model.
  • this application provides a neural network model processing method, which can be applied to a server or a terminal device.
  • the method includes: training to obtain a first low-bit neural network model, the first low-bit neural network model including at least two Operation layer, the at least two operation layers include a first operation layer and a second operation layer, each of the at least two operation layers includes at least one operation, a parameter and/or used for the at least one operation Or the value of the data is represented by N bits, and N is a positive integer less than 8.
  • the first low-bit neural network model is compressed to obtain a second low-bit neural network model
  • the second low-bit neural network model includes at least An operating layer
  • the at least one operating layer includes a third operating layer
  • the third operating layer is equivalent to the first operating layer and the second operating layer
  • the operation layers other than the three operation layers are the same as the operation layers other than the first operation layer and the second operation layer in the at least two operation layers.
  • the first low-bit neural network model obtained by training can be compressed, and the third operating layer included in the compressed second low-bit neural network model is equivalent to the first low-bit neural network model before compression.
  • the operation layer and the second operation layer are equivalent to the first low-bit neural network model before compression.
  • the neural network model can be compressed by reducing the operation layer of the neural network model. Because before and after the neural network model is compressed, the operation layers included in the neural network model are equivalent, which can ensure that the In the case of the accuracy and effectiveness of the neural network model, compress the neural network model.
  • compressing the first low-bit neural network model to obtain a second low-bit neural network model includes: searching for the first operation layer and the first operation layer in the at least two operation layers The second operation layer; merge the first operation layer and the second operation layer to obtain the third operation layer, the input of the first operation layer is the same as the input of the third operation layer, the The output of the first operation layer is the input of the second operation layer, and the output of the second operation layer is the same as the output of the third operation layer; according to the third operation layer, and the at least two operations Among the layers, the operation layers other than the first operation layer and the second operation layer construct the second low-bit neural network model.
  • the first operation layer includes at least one first operation
  • the second operation layer includes at least one second operation.
  • the first operation layer and the second Combining operation layers to obtain the third operation layer includes: combining the at least one first operation and the at least one second operation according to a preset rule to obtain at least one third operation; and according to the at least one first operation Three operations construct the third operation layer, and the third operation layer includes the at least one third operation.
  • the first operation layer includes at least one first operation
  • the second operation layer includes at least one second operation.
  • the first operation layer and the second Combining operation layers to obtain the third operation layer includes: constructing the third operation layer according to the at least one first operation and the at least one second operation, the third operation layer including the at least one The first operation and the at least one second operation.
  • the second low-bit neural network model is stored or sent.
  • this application provides a neural network model processing method that can be applied to a terminal device.
  • the method includes: the terminal device obtains a second low-bit neural network model; and the terminal device updates the first low-bit neural network model to The second low-bit neural network model.
  • the first low-bit neural network model includes at least two operating layers, the at least two operating layers include a first operating layer and a second operating layer, and each of the at least two operating layers includes At least one operation, the parameter and/or data value of the at least one operation is represented by N bits, where N is a positive integer less than 8, the second low-bit neural network model includes at least one operation layer, and the at least One operation layer includes a third operation layer, which is equivalent to the first operation layer and the second operation layer, and operations in the at least one operation layer other than the third operation layer The layer is the same as the operation layer of the at least two operation layers except for the first operation layer and the second operation layer.
  • the second low-bit neural network model is based on the third operating layer, and the at least two operating layers except for the first operating layer and the second operating layer
  • the neural network model constructed by the operation layer of the operating layer, the third operating layer is the operating layer obtained by merging the first operating layer and the second operating layer, the input of the first operating layer and the third operating layer
  • the input of the operation layer is the same
  • the output of the first operation layer is the input of the second operation layer
  • the output of the second operation layer is the same as the output of the third operation layer.
  • the first operation layer includes at least one first operation
  • the second operation layer includes at least one second operation
  • the third operation layer includes at least one third operation
  • the at least one The third operation is an operation obtained by combining the at least one first operation and the at least one second operation according to a preset rule.
  • the first operation layer includes at least one first operation
  • the second operation layer includes at least one second operation
  • the third operation layer includes the at least one first operation and the At least one second operation.
  • acquiring the second low-bit neural network model by the terminal device includes:
  • the terminal device receives the second low-bit neural network model from the server; or,
  • the terminal device obtains the second low-bit neural network model locally.
  • the present application provides a neural network model processing method, which can be applied to a neural network model processing system.
  • the method includes: a server trains to obtain a first low-bit neural network model, and the first low-bit neural network model includes at least Two operating layers, the at least two operating layers include a first operating layer and a second operating layer, each of the at least two operating layers includes at least one operation, a parameter used for the at least one operation And/or the value of the data is represented by N bits, where N is a positive integer less than 8.
  • the server compresses the first low-bit neural network model to obtain a second low-bit neural network model, and the second low-bit neural network model is
  • the neural network model includes at least one operating layer, the at least one operating layer includes a third operating layer, the third operating layer is equivalent to the first operating layer and the second operating layer, the at least one operating layer
  • the operation layers other than the third operation layer in the at least two operation layers are the same as the operation layers other than the first operation layer and the second operation layer in the at least two operation layers;
  • the server sends to the terminal device The second low-bit neural network model; the terminal device uses the second low-bit neural network model to update the locally stored first low-bit neural network model.
  • the present application provides a neural network model processing device, which includes a processing unit.
  • the processing unit is configured to train to obtain a first low-bit neural network model, the first low-bit neural network model including at least two operation layers, and the at least two operation layers include a first operation layer and a second operation layer
  • Each of the at least two operation layers includes at least one operation, and the value of the parameter and/or data used for the at least one operation is represented by N bits, and N is a positive integer less than 8;
  • the processing The unit is further configured to compress the first low-bit neural network model to obtain a second low-bit neural network model, where the second low-bit neural network model includes at least one operation layer, and the at least one operation layer includes a first Three operating layers, the third operating layer is equivalent to the first operating layer and the second operating layer, and the operating layers other than the third operating layer in the at least one operating layer and the at least The two operating layers are the same except for the first operating layer and the second operating layer.
  • the processing unit is specifically configured to:
  • Search for the first operation layer and the second operation layer in the at least two operation layers merge the first operation layer and the second operation layer to obtain the third operation layer, the The input of the first operating layer is the same as the input of the third operating layer, the output of the first operating layer is the input of the second operating layer, and the output of the second operating layer is the same as that of the third operating layer.
  • the output of is the same; construct the second low-bit neural network according to the third operation layer and the operation layers of the at least two operation layers except the first operation layer and the second operation layer model.
  • the first operation layer includes at least one first operation
  • the second operation layer includes at least one second operation.
  • the processing unit is specifically configured to:
  • the operation layer includes the at least one third operation.
  • the first operation layer includes at least one first operation
  • the second operation layer includes at least one second operation.
  • the processing unit is specifically configured to:
  • the third operation layer is constructed based on the at least one first operation and the at least one second operation, and the third operation layer includes the at least one first operation and the at least one second operation.
  • the device further includes a storage unit
  • the storage unit is used to store the second low-bit neural network model
  • the device further includes a transceiver unit;
  • the transceiver unit is configured to send the second low-bit neural network model.
  • the present application provides a neural network model processing device, which includes an acquisition unit and a processing unit.
  • the obtaining unit is used to obtain the second low-bit neural network model;
  • the processing unit is used to update the first low-bit neural network model to the second low-bit neural network model.
  • the first low-bit neural network model includes at least two operating layers, the at least two operating layers include a first operating layer and a second operating layer, and each of the at least two operating layers includes At least one operation, the parameter and/or data value of the at least one operation is represented by N bits, where N is a positive integer less than 8, the second low-bit neural network model includes at least one operation layer, and the at least One operation layer includes a third operation layer, which is equivalent to the first operation layer and the second operation layer, and operations in the at least one operation layer other than the third operation layer The layer is the same as the operation layer of the at least two operation layers except for the first operation layer and the second operation layer.
  • the second low-bit neural network model is based on the third operating layer, and the at least two operating layers except for the first operating layer and the second operating layer
  • the neural network model constructed by the operation layer of the operating layer, the third operating layer is the operating layer obtained by merging the first operating layer and the second operating layer, the input of the first operating layer and the third operating layer
  • the input of the operation layer is the same
  • the output of the first operation layer is the input of the second operation layer
  • the output of the second operation layer is the same as the output of the third operation layer.
  • the first operation layer includes at least one first operation
  • the second operation layer includes at least one second operation
  • the third operation layer includes at least one third operation
  • the at least one The third operation is an operation obtained by combining the at least one first operation and the at least one second operation according to a preset rule.
  • the first operation layer includes at least one first operation
  • the second operation layer includes at least one second operation
  • the third operation layer includes the at least one first operation and the At least one second operation.
  • the device further includes a transceiver unit;
  • the transceiver unit is configured to receive the second low-bit neural network model from a server;
  • processing unit is further used for:
  • this application provides a neural network model processing system, including a server and terminal equipment;
  • the server is trained to obtain a first low-bit neural network model, the first low-bit neural network model includes at least two operation layers, the at least two operation layers include a first operation layer and a second operation layer, the at least Each of the two operation layers includes at least one operation, and the value of the parameter and/or data used for the at least one operation is represented by N bits, and N is a positive integer less than 8; the server will A low-bit neural network model is compressed to obtain a second low-bit neural network model.
  • the second low-bit neural network model includes at least one operation layer, the at least one operation layer includes a third operation layer, and the third operation
  • the layers are equivalent to the first operation layer and the second operation layer, the operation layers in the at least one operation layer except the third operation layer and the at least two operation layers except the first operation layer
  • the first operating layer is the same as the operating layers other than the second operating layer;
  • the server sends the second low-bit neural network model to the terminal device;
  • the terminal device uses the second low-bit neural network model to update the local The stored first low-bit neural network model.
  • an embodiment of the present application also provides a neural network model processing device, which has the function of realizing the above-mentioned method in the first aspect.
  • the function can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above-mentioned functions.
  • the structure of the neural network model processing device may include a processor and a memory, and the processor is configured to execute the method mentioned in the first aspect.
  • the memory is coupled with the processor, and stores the necessary program instructions and data of the neural network model processing device.
  • an embodiment of the present application also provides a neural network model processing device, which has the function of realizing the above-mentioned method in the second aspect.
  • the function can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above-mentioned functions.
  • the structure of the neural network model processing device may include a processor and a memory, and the processor is configured to execute the method mentioned in the second aspect.
  • the memory is coupled with the processor, and stores the necessary program instructions and data of the neural network model processing device.
  • the present application provides a neural network model processing system, including the neural network model processing device described in the seventh aspect and the neural network model processing device described in the eighth aspect.
  • an embodiment of the present application also provides a computer storage medium, the computer storage medium stores computer-executable instructions, and when called by a computer, the computer-executable instructions cause the computer to execute the above-mentioned first aspect.
  • the method provided by any design of the foregoing first aspect or the method provided by any design of the foregoing second aspect or the foregoing second aspect.
  • an embodiment of the present application also provides a computer program product.
  • the computer program product stores instructions that, when run on a computer, cause the computer to execute the first aspect or any of the first aspects.
  • an embodiment of the present application also provides a chip, which is coupled with a memory, and is used to read and execute program instructions stored in the memory to implement the above-mentioned first aspect or second aspect or third aspect Any of the methods mentioned.
  • FIG. 1a is a schematic diagram of a neural network model provided by an embodiment of this application.
  • FIG. 1b is a schematic diagram of another neural network model provided by an embodiment of the application.
  • FIG. 2 is a schematic structural diagram of a computer device applicable to the embodiments of this application;
  • FIG. 3 is a flowchart of a neural network model processing method provided by an embodiment of this application.
  • 4a is a flowchart of another neural network model processing method provided by an embodiment of the application.
  • 4b is a schematic diagram of another neural network model provided by an embodiment of the application.
  • FIG. 4c is a schematic diagram of another neural network model provided by an embodiment of this application.
  • FIG. 4d is a schematic diagram of another neural network model provided by an embodiment of this application.
  • Fig. 5 is a flowchart of another neural network model processing method provided by an embodiment of the application.
  • FIG. 6 is a schematic structural diagram of a neural network model processing device provided by an embodiment of the application.
  • FIG. 7 is a schematic structural diagram of another neural network model processing device provided by an embodiment of the application.
  • FIG. 8 is a schematic structural diagram of another neural network model processing device provided by an embodiment of the application.
  • FIG. 9 is a schematic structural diagram of another neural network model processing device provided by an embodiment of the application.
  • the embodiments of the application provide a neural network model processing method and device.
  • the neural network model processing method and device provided in the embodiments of the application can be applied but not limited to speech recognition (automatic speech recognition, ASR) and natural language processing (natural language process). , NLP), optical character recognition (OCR) or image processing.
  • ASR automatic speech recognition
  • NLP natural language processing
  • OCR optical character recognition
  • AI applications using neural network models are increasing year by year.
  • these AI applications need to be deployed in various terminal devices.
  • the parameters of neural network models are usually on the order of millions, tens of millions, or hundreds of millions.
  • the storage and computing capabilities of terminal devices that run these AI applications are relatively high, which limits the use of neural network models on terminal devices.
  • the prior art reduces the parameters of the neural network model to compress the neural network model, but reducing the model parameters will inevitably affect the accuracy of the model. Therefore, after using this method to compress the neural network model, the accuracy and effectiveness of the neural network model will be reduced.
  • Using the neural network model processing method provided in the embodiments of the present application can compress the neural network model without reducing the accuracy and effectiveness of the neural network model.
  • the method and the device are based on the same inventive concept. Since the principles of the method and the device to solve the problem are similar, the implementation of the device and the method can be referred to each other, and the repetition will not be repeated.
  • Neural networks imitate the behavioral characteristics of animal neural networks, similar to the structure of brain synaptic connections for data processing.
  • the neural network model is composed of a large number of nodes (or called neurons) interconnected.
  • the neural network model can be composed of an input layer, a hidden layer, and an output layer, as shown in Figure 1a.
  • the input layer is the input data of the neural network model
  • the output layer is the output data of the neural network model
  • the hidden layer is composed of many node connections between the input layer and the output layer, and is used to process the input data.
  • the hidden layer can be composed of one layer or multiple layers.
  • each operation layer has input and output.
  • the output of the previous operation layer is the input of the next operation layer.
  • Each operation layer can include at least one operation. The operation is used to process the input parameters of the operation layer. When there is a parameter input After the operation layer, the operation layer can store the parameter first, and when the corresponding operation needs to be performed, the parameter is read out and the corresponding operation is performed.
  • AI applications refer to applications or applications in the AI field.
  • the AI applications involved in the embodiments of this application mainly refer to AI applications using a neural network model.
  • neural network models with stable performance obtained after extensive training of neural networks are widely used in AI applications. These AI applications are deployed on various terminal devices to realize the application of neural network models in various fields. Since the process of training a neural network is a complicated process, it is usually possible to separate the platform for neural network model training and the platform for neural network model deployment.
  • the neural network model processing method provided in the embodiments of this application can be implemented on a neural network model training platform, or can be implemented on a neural network model deployment platform, which is not limited in this application.
  • the neural network model training platform may include, but is not limited to, a computer device, a server (server) or a cloud service platform, etc.
  • the computer device may include, for example, a personal computer (PC), a desktop computer, a tablet computer, and a vehicle-mounted computer. Wait.
  • the platform on which the neural network model is deployed may include, but is not limited to, terminal equipment, computer devices, or servers.
  • the terminal device may be a device that provides voice and/or data connectivity to the user, for example, it may include a handheld device with a wireless connection function, or a processing device connected to a wireless modem.
  • the terminal equipment may include user equipment (UE), wireless terminal equipment, mobile terminal equipment, subscriber unit (subscriber unit), access point (AP), remote terminal equipment (remote terminal), and access terminal Equipment (access terminal), user terminal equipment (user terminal), user agent (user agent), or user equipment (user device), etc.
  • it may include mobile phones (or “cellular” phones), computers with mobile terminal equipment, portable, pocket-sized, handheld, computer-built or vehicle-mounted mobile devices, smart wearable devices, and so on.
  • PCS personal communication service
  • SIP session initiation protocol
  • WLL wireless local loop
  • PDA personal digital assistant
  • restricted devices such as devices with low power consumption, or devices with limited storage capabilities, or devices with limited computing capabilities. Examples include barcodes, radio frequency identification (RFID), sensors, global positioning system (GPS), laser scanners and other information sensing equipment.
  • RFID radio frequency identification
  • GPS global positioning system
  • laser scanners and other information sensing equipment.
  • FIG. 2 shows a schematic structural diagram of a possible computer device applicable to an embodiment of the present application.
  • the computer device includes: a processor 210, a memory 220, a communication module 230, an input unit 240, a display unit 250, a power supply 260 and other components.
  • the structure of the computer device shown in FIG. 2 does not constitute a limitation on the computer device, and the computer device provided in the embodiment of the present application may include more or less than the computer device shown in FIG. Components, or combination of certain components, or different component arrangements.
  • the communication module 230 may be connected to other devices through a wireless connection or a physical connection to realize data transmission and reception of the computer device.
  • the communication module 230 may include any one or a combination of a radio frequency (RF) circuit, a wireless fidelity (wireless fidelity, WiFi) module, a communication interface, or a Bluetooth module, which is not described in this embodiment of the application. limited.
  • RF radio frequency
  • the memory 220 can be used to store program instructions and data.
  • the processor 210 executes various functional applications and data processing of the computer device by running the program instructions stored in the memory 220.
  • the program instructions include program instructions that enable the processor 210 to execute the neural network model processing method provided in the following embodiments of the present application.
  • the memory 220 may mainly include a program storage area and a data storage area.
  • the storage program area can store operating systems, various application programs, and program instructions;
  • the storage data area can store various data such as neural networks.
  • the memory 220 may include a high-speed random access memory, and may also include a non-volatile memory, such as a magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the input unit 240 may be used to receive information such as data or operation instructions input by the user.
  • the input unit 240 may include input devices such as a touch panel, function keys, a physical keyboard, a mouse, a camera, and a monitor.
  • the display unit 250 can implement human-computer interaction, and is used to display information input by the user, information provided to the user, and other content through a user interface.
  • the display unit 250 may include a display panel 251.
  • the display panel 251 may be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), etc.
  • the touch panel can cover the display panel 251, and when the touch panel detects a touch event on or near it, it is transmitted to the processor 210 To determine the type of touch event to perform the corresponding operation.
  • the processor 210 is the control center of the computer device, and uses various interfaces and lines to connect the above components.
  • the processor 210 may execute the program instructions stored in the memory 220 and call the data stored in the memory 220 to complete various functions of the computer device and implement the methods provided in the embodiments of the present application.
  • the processor 210 may include one or more processing units.
  • the processor 210 can integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application program, etc., and the modem processing The device mainly deals with wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 210.
  • the processing unit may compress the neural network model.
  • the processor 210 may be a central processing unit (central processing unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), or a combination of a CPU and a GPU.
  • the processor 210 may also be an artificial intelligence (AI) chip that supports neural network processing, such as a network processor unit (NPU), a tensor processing unit (TPU), or the like.
  • the processor 210 may further include a hardware chip.
  • the aforementioned hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (PLD), a digital signal processing device (digital sgnal processing, DSP), or a combination thereof.
  • ASIC application-specific integrated circuit
  • PLD programmable logic device
  • DSP digital signal processing device
  • the above-mentioned PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL), or any combination thereof.
  • CPLD complex programmable logic device
  • FPGA field-programmable gate array
  • GAL generic array logic
  • the computer device also includes a power source 260 (such as a battery) for powering various components.
  • a power source 260 such as a battery
  • the power supply 260 may be logically connected to the processor 210 through a power management system, so that functions such as charging and discharging the computer device can be realized through the power management system.
  • the computer device may also include components such as a camera, a sensor, and an audio collector, which will not be repeated here.
  • the foregoing computer device is only an example of a device to which the method provided in the embodiment of the present application is applicable. It should be understood that the method provided in the embodiments of the present application may also be applied to other devices other than the foregoing computer device, for example, it may also be applied to terminal devices, servers, or cloud servers, etc., which is not limited in this application.
  • the neural network model processing method provided in the embodiments of the present application may be applicable to the computer device shown in FIG. 2 and may also be applicable to other devices (such as servers or terminal devices).
  • the neural network model processing method provided by the present application is described by taking the execution subject as the neural network model processing device as an example, and the specific process of the method may include:
  • Step 101 The neural network model processing device is trained to obtain the first low-bit neural network model.
  • the first low-bit neural network model includes at least two operating layers, the at least two operating layers include a first operating layer and a second operating layer, and each of the at least two operating layers includes At least one operation, the value of the parameter and/or data used for the at least one operation is represented by N bits, and N is a positive integer less than 8.
  • the low-bit neural network model refers to a neural network model in which the parameter and/or data value of the involved operation is represented by a positive integer less than 8 bits.
  • it may include a binary neural network model or a ternary neural network model.
  • the first operation layer and the second operation layer may be predefined and combined operation layers. In actual applications, it can be defined based on a preset rule that the first operation layer and the second operation layer can be combined.
  • the preset rule may be that the first operating layer and the second operating layer are adjacent operating layers, and the operations in the first operating layer and the second operating layer are linear operating.
  • the directly connected operation layer-batch normalization layer (BN), operation layer-scale layer (scale layer), and operation layer-low-bit activation layer (binary activation layer, BinAct) can be defined in advance.
  • Step 102 The neural network model processing device compresses the first low-bit neural network model to obtain a second low-bit neural network model.
  • the second low-bit neural network model includes at least one operation layer, and the at least one operation
  • the layers include a third operating layer, the third operating layer is equivalent to the first operating layer and the second operating layer, and the operating layers other than the third operating layer in the at least one operating layer are In the at least two operation layers, operation layers other than the first operation layer and the second operation layer are the same.
  • the low-bit neural network model obtained by training is compressed, for example, the binary neural network model obtained by training is compressed.
  • the method of this application is adopted The method can not only reduce the operation layer of the neural network model, but also because the parameters and/or data in the low-bit neural network model are represented by positive integers less than 8, compared with the parameters and/or data in the floating-point neural network model The data is represented by a positive integer greater than or equal to 8, and the method of this application can greatly reduce the storage space occupied by the compressed model.
  • the neural network model processing device may adopt but not limited to the following methods to compress the first low-bit neural network model to obtain the second low-bit neural network model:
  • Step 1021 The neural network model processing device searches for the first operation layer and the second operation layer among the at least two operation layers included in the first low-bit neural network model.
  • Step 1022 The neural network model processing device merges the first operation layer and the second operation layer to obtain the third operation layer.
  • the input of the first operating layer is the same as the input of the third operating layer
  • the output of the first operating layer is the input of the second operating layer
  • the output of the second operating layer is the same as that of the third operating layer.
  • the output of the operation layer is the same.
  • the output of the first operating layer is the input of the second operating layer. It can be understood that the first operating layer and the second operating layer are adjacent operating layers, such as the operating layer in Figure 1b. 1 and operation layer 2, operation layer 2 and operation layer 3, operation layer 3 and operation layer 4.
  • Step 1023 The neural network model processing device constructs the second operation layer according to the third operation layer and the operation layers of the at least two operation layers except the first operation layer and the second operation layer. Low-bit neural network model.
  • the first operation layer may include at least one first operation
  • the second operation layer may include at least one second operation
  • the following two methods may be adopted to merge the first operation layer and the second operation layer to obtain the third operation layer.
  • the first implementation manner the neural network model processing device merges the at least one first operation with the at least one second operation according to a preset rule to obtain at least one third operation, and according to the at least one third operation Constructing the third operation layer, at this time, the third operation layer includes the at least one third operation.
  • the first implementation manner when the first operation layer and the second operation layer are combined into the third operation layer, at least one of the first operation layer and the second operation layer included in the first operation layer
  • the at least one second operation is merged into at least one third operation according to a predefined rule.
  • at least one third operation is equivalent to at least one first operation and at least one second operation.
  • the preset rule for merging the first operation and the second operation depends on the first operation and the second operation itself. It can be understood that different preset rules are used for merging different operations. Examples will be given in the article, so I won’t repeat them here.
  • the second implementation manner the neural network model processing device constructs the third operation layer according to the at least one first operation and the at least one second operation.
  • the third operation layer includes the at least one first operation.
  • only the first operation layer and the second operation layer can be merged, and the operations included in the first operation layer and the second operation layer may not be merged. In this way, although it fails to reduce the operations during model processing , But after merging the first operation layer and the second operation layer, after the first operation layer inputs parameters to the second operation layer, there is no need to store the input parameters, which can save storage space.
  • the neural network model processing device may further store or send the second low-bit neural network model after compressing the first low-bit neural network model to obtain the second low-bit neural network model.
  • the second low-bit neural network model may be sent to the terminal device.
  • the neural network model when performing neural network model compression, is compressed by equivalent compression of the operation layer of the neural network model, that is, the neural network model is compressed by reducing the operation layer of the neural network model.
  • Obtaining the compressed equivalent model can ensure that the neural network model is compressed without reducing the accuracy and effectiveness of the neural network model.
  • the method includes:
  • Step 201 The neural network model processing device is trained to obtain a first low-bit neural network model as shown in FIG. 4b.
  • the first low-bit neural network model includes a first operation layer, a second operation layer, a third operation layer, and a fourth operation layer. Operation layer and 5th operation layer. Among them, the first operation layer, the second operation layer, and the third operation layer are predefined and combined operation layers.
  • the first operation layer includes the first operation.
  • the first operation is: the input of the first operation layer-mu/sqrt(delta+epsilon).
  • the second operation layer includes the second operation. The second operation is: alpha*input of the second operation layer + beta.
  • the third operation layer includes the third operation.
  • the third operation is: output 1 when the input of the third operation layer is greater than or equal to zero, and output negative 1 when the input of the third operation layer is less than zero, because the second operation layer and the third operation layer
  • the operating layer is an adjacent operating layer.
  • Step 202 The neural network model processing device compresses the first low-bit neural network model shown in FIG. 4b to obtain the second low-bit neural network model shown in FIG. 4c.
  • the second low-bit neural network model includes the kth operating layer, the 4th operating layer, and the 5th operating layer.
  • the kth operating layer is equivalent to the first operating layer and the second operating layer shown in Figure 4b.
  • Figures 4a-4c only take the combination of three operation layers as an example. In actual applications, multiple operation layers can be combined, and if there is a repetitive merging operation layer structure in the model structure, Then you can merge multiple repeated mergeable operation layer structures.
  • Figure 4d taking Figure 4d as an example for schematic illustration, the first operating layer, the second operating layer, and the third operating layer in Figure 4d are predefined operating layers that can be combined. In actual applications, only the left side of Figure 4d can be combined.
  • Combinable operation layer structure 1st operation layer-2nd operation layer-3rd operation layer, or only the mergeable operation layer structure on the right side in Figure 4d: 1st operation layer-2nd operation layer-3rd operation Layers, of course, can also merge these two mergeable operation layer structures.
  • the operations performed can be reduced. Since the operations after the model compression are completely equivalent, it can be achieved without reducing the accuracy of the neural network model. In the case of effectiveness, improve operating efficiency.
  • the first implementation of the foregoing merge operation layer is taken as an example.
  • the second implementation manner of the foregoing merge operation layer is illustrated below with reference to FIG. 4b.
  • the second implementation of merging operation layers only the first operation layer, the second operation layer, and the third operation layer can be merged, and the operations included in the first operation layer, the second operation layer, and the third operation layer may not be merged.
  • the first operation, the second operation, and the third operation are still included.
  • the second operation does not need to store the output parameter y.
  • the final neural network model obtained through the embodiment shown in FIG. 3 can be applied to another neural network model processing device, so that the neural network model processing device performs processing based on the finally obtained neural network model. deal with.
  • the embodiment of the present application also provides another neural network model processing method, which is implemented based on the final neural network model obtained in the embodiment shown in FIG. 3.
  • another neural network model processing method provided by this application is described by taking the execution subject as the terminal device as an example. The specific process of the method may include the following steps:
  • Step 301 The terminal device obtains the second low-bit neural network model.
  • the terminal device may use the following method to obtain the second low-bit neural network model.
  • Method 1 The terminal device receives the second low-bit neural network model from the server.
  • the server in the first method may be an example of a neural network model processing device to which the method in FIG. 3 is applicable.
  • Method 2 The terminal device obtains the second low-bit neural network model locally. If the second method is adopted, the second low-bit neural network model may be compressed by the terminal device using the method shown in FIG. 3.
  • Step 302 The terminal device updates the first low-bit neural network model to the second low-bit neural network model.
  • update can be understood as replacement.
  • the compressed second low-bit neural network model can be used in the terminal device, because the compressed second low-bit neural network model has fewer operating layers than the first low-bit neural network model before compression, and the corresponding reduction Therefore, the operation efficiency of the terminal device can be improved without reducing the accuracy and effectiveness of the neural network model.
  • the processing method shown in FIG. 5 may continue to be executed.
  • an embodiment of the present application also provides a neural network model processing device, which is used to implement the neural network model processing method provided by the embodiment shown in FIG. 3.
  • the neural network model processing device 600 includes a processing unit 601, wherein:
  • the processing unit 601 is configured to train to obtain a first low-bit neural network model, the first low-bit neural network model includes at least two operation layers, and the at least two operation layers include a first operation layer and a second operation layer Each of the at least two operation layers includes at least one operation, and the value of the parameter and/or data used for the at least one operation is represented by N bits, and N is a positive integer less than 8;
  • the processing unit 601 is further configured to compress the first low-bit neural network model to obtain a second low-bit neural network model.
  • the second low-bit neural network model includes at least one operation layer, and the at least one operation
  • the layers include a third operating layer, the third operating layer is equivalent to the first operating layer and the second operating layer, and the operating layers other than the third operating layer in the at least one operating layer are In the at least two operation layers, operation layers other than the first operation layer and the second operation layer are the same.
  • first operation layer and the second operation layer may be predefined and combined operation layers.
  • processing unit 601 is specifically configured to:
  • Search for the first operation layer and the second operation layer in the at least two operation layers merge the first operation layer and the second operation layer to obtain the third operation layer, the The input of the first operating layer is the same as the input of the third operating layer, the output of the first operating layer is the input of the second operating layer, and the output of the second operating layer is the same as that of the third operating layer.
  • the output of is the same; construct the second low-bit neural network according to the third operation layer and the operation layers of the at least two operation layers except the first operation layer and the second operation layer model.
  • the first operation layer includes at least one first operation
  • the second operation layer includes at least one second operation
  • the processing unit 601 is specifically configured to:
  • the operation layer includes the at least one third operation.
  • the first operation layer includes at least one first operation
  • the second operation layer includes at least one second operation
  • the processing unit 601 is specifically configured to:
  • the third operation layer is constructed based on the at least one first operation and the at least one second operation, and the third operation layer includes the at least one first operation and the at least one second operation.
  • the device further includes a storage unit 602;
  • the storage unit 602 is configured to store the second low-bit neural network model; or,
  • the device also includes a transceiver unit 603;
  • the transceiver unit 603 is configured to send the second low-bit neural network model.
  • the embodiment of the present application also provides another neural network model processing device for implementing the processing method provided by the embodiment shown in FIG. 5.
  • the neural network model processing device 700 includes an acquiring unit 701 and a processing unit 702, wherein:
  • the obtaining unit 701 is configured to obtain a second low-bit neural network model
  • the processing unit 702 is configured to update the first low-bit neural network model to the second low-bit neural network model
  • the first low-bit neural network model includes at least two operating layers, the at least two operating layers include a first operating layer and a second operating layer, and each of the at least two operating layers includes At least one operation, the parameter and/or data value of the at least one operation is represented by N bits, where N is a positive integer less than 8, the second low-bit neural network model includes at least one operation layer, and the at least One operation layer includes a third operation layer, which is equivalent to the first operation layer and the second operation layer, and operations in the at least one operation layer other than the third operation layer The layer is the same as the operation layer of the at least two operation layers except for the first operation layer and the second operation layer.
  • the second low-bit neural network model is based on the third operation layer, and the at least two operation layers except the first operation layer and the second operation layer
  • the neural network model constructed by the operation layer other than the layer, the third operation layer is the operation layer obtained by merging the first operation layer and the second operation layer, and the input of the first operation layer is The input of the third operation layer is the same, the output of the first operation layer is the input of the second operation layer, and the output of the second operation layer is the same as the output of the third operation layer.
  • the first operation layer includes at least one first operation
  • the second operation layer includes at least one second operation
  • the third operation layer includes at least one third operation.
  • the at least one third operation is an operation obtained by combining the at least one first operation with the at least one second operation according to a preset rule.
  • the first operation layer includes at least one first operation
  • the second operation layer includes at least one second operation
  • the third operation layer includes the at least one first operation And the at least one second operation.
  • the device further includes a transceiver unit 703;
  • the transceiver unit 703 is configured to receive the second low-bit neural network model from a server;
  • the processing unit 702 is further configured to:
  • the division of units in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
  • the functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including a number of instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) execute all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .
  • the embodiments of the present application also provide a neural network model processing device, which is used to implement the neural network model processing method shown in FIG. 3.
  • the neural network model processing device 800 includes: a processor 801 and a memory 802, where:
  • the processor 801 may be a CPU, a GPU, or a combination of a CPU and a GPU.
  • the processor 801 may also be an AI chip that supports neural network processing, such as NPU, TPU, or the like.
  • the processor 801 may further include a hardware chip.
  • the above hardware chip may be ASIC, PLD, DSP or a combination thereof.
  • the above-mentioned PLD may be CPLD, FPGA, GAL or any combination thereof. It should be noted that the processor 801 is not limited to the above-mentioned cases, and the processor 801 may be any processing device capable of implementing the neural network model processing method shown in FIG. 3.
  • the processor 801 and the memory 802 are connected to each other.
  • the processor 801 and the memory 802 are connected to each other through a bus 803;
  • the bus 803 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (extended industry standard architecture). , EISA) bus, etc.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in FIG. 8 to represent, but it does not mean that there is only one bus or one type of bus.
  • processor 801 When the processor 801 is configured to implement the neural network model processing method provided in FIG. 3 of the embodiment of the present application, it performs the following operations:
  • a first low-bit neural network model is obtained through training, the first low-bit neural network model includes at least two operation layers, the at least two operation layers include a first operation layer and a second operation layer, the at least two operations Each operation layer in the layers includes at least one operation, and the value of the parameter and/or data used for the at least one operation is represented by N bits, and N is a positive integer less than 8;
  • the first low-bit neural network model is compressed to obtain a second low-bit neural network model.
  • the second low-bit neural network model includes at least one operation layer, and the at least one operation layer includes a third operation layer.
  • the third operation layer is equivalent to the first operation layer and the second operation layer, and the operation layers other than the third operation layer in the at least one operation layer and the at least two operation layers
  • the operation layers other than the first operation layer and the second operation layer are the same.
  • first operation layer and the second operation layer may be predefined and combined operation layers.
  • the processor 801 may also perform other operations. For details, reference may be made to the specific descriptions involved in step 101 and step 102 in the embodiment shown in FIG. 3, which will not be repeated here.
  • the memory 802 is used to store programs and data.
  • the program may include program code, and the program code includes instructions for computer operations.
  • the memory 802 may include a random access memory (RAM), and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the processor 801 executes the program stored in the memory 802 to realize the above-mentioned functions, thereby realizing the method shown in FIG. 3.
  • the neural network model processing device shown in FIG. 8 can be applied to a computer device
  • the neural network model processing device may be embodied as the computer device shown in FIG. 2.
  • the processor 801 may be the same as the processor 210 shown in FIG. 2
  • the memory 802 may be the same as the memory 220 shown in FIG. 2.
  • the embodiments of the present application also provide another neural network model processing device, and the neural network model processing device is used to implement the method shown in FIG. 5.
  • the neural network model processing device 900 includes: a processor 901 and a memory 902, wherein:
  • the processor 901 may be a CPU, a GPU, or a combination of a CPU and a GPU.
  • the processor 901 may also be an AI chip supporting neural network processing, such as NPU, TPU, and so on.
  • the processor 901 may further include a hardware chip.
  • the above hardware chip may be ASIC, PLD, DSP or a combination thereof.
  • the above PLD can be CPLD, FPGA, GAL or any combination thereof. It should be noted that the processor 901 is not limited to the above-mentioned cases, and the processor 901 may be any processing device capable of updating a neural network model.
  • the processor 901 and the memory 902 are connected to each other.
  • the processor 901 and the memory 902 are connected to each other through a bus 903;
  • the bus 903 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (extended industry standard architecture). , EISA) bus, etc.
  • PCI peripheral component interconnect
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus, etc. For ease of representation, only one thick line is used in FIG. 9, but it does not mean that there is only one bus or one type of bus.
  • processor 901 When the processor 901 is configured to implement the method provided in the embodiment of the present application, it may perform the following operations:
  • the first low-bit neural network model includes at least two operating layers, the at least two operating layers include a first operating layer and a second operating layer, and each of the at least two operating layers includes At least one operation, the parameter and/or data value of the at least one operation is represented by N bits, where N is a positive integer less than 8, the second low-bit neural network model includes at least one operation layer, and the at least One operation layer includes a third operation layer, which is equivalent to the first operation layer and the second operation layer, and operations in the at least one operation layer other than the third operation layer The layer is the same as the operation layer of the at least two operation layers except for the first operation layer and the second operation layer.
  • the processor 901 may also perform other operations. For details, reference may be made to the specific descriptions involved in step 301 and step 302 in the embodiment shown in FIG. 5, which will not be repeated here.
  • the memory 902 is used to store programs and data.
  • the program may include program code, and the program code includes instructions for computer operations.
  • the memory 902 may include random access memory (RAM), and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
  • the processor 901 executes the program stored in the memory 902 to realize the above-mentioned functions, thereby realizing the processing method shown in FIG. 5.
  • the neural network model processing device shown in FIG. 9 when the neural network model processing device shown in FIG. 9 is applied to a computer device, the neural network model processing device may be embodied as the computer device shown in FIG. 2.
  • the processor 901 may be the same as the processor 210 shown in FIG. 2
  • the memory 902 may be the same as the memory 220 shown in FIG. 2.
  • the embodiment of the present application also provides a computer-readable storage medium on which some instructions are stored. When these instructions are called and executed by a computer, the computer can complete the above method embodiments and method implementations. Examples of methods involved in any possible design.
  • the computer-readable storage medium is not limited. For example, it may be RAM (random-access memory, random access memory), ROM (read-only memory, read-only memory), etc.
  • the present application also provides a computer program product that can complete the method embodiment and the method involved in any possible design of the above method embodiment when the computer program product is invoked and executed by a computer.
  • the present application further provides a chip, which is coupled with a transceiver, and is used to complete the foregoing method embodiment and the method involved in any one of the possible implementation manners of the method embodiment, wherein "Coupling” means that two components are directly or indirectly combined with each other. This combination can be fixed or movable. This combination can allow fluid, electricity, electrical signals or other types of signals to be connected between the two components. Communicate between.
  • the embodiments of the present application can be provided as methods, systems, or computer program products. Therefore, the present application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, this application may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes.
  • a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing functions specified in a flow or multiple flows in the flowchart and/or a block or multiple blocks in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Feedback Control In General (AREA)

Abstract

一种神经网络模型处理方法及装置,在不降低神经网络模型的精度及有效性的情况下,压缩神经网络模型,该方法包括:训练得到第一低比特神经网络模型,该模型包括至少两个操作层,至少两个操作层包括第一操作层和第二操作层,每个操作层包括至少一个操作,用于所述操作的参数和/或数据的值用N比特表示,N为小于8的正整数,压缩该模型得到第二低比特神经网络模型,压缩后的模型包括至少一个操作层,至少一个操作层包括第三操作层,第三操作层等效于第一操作层和第二操作层,至少一个操作层中除第三操作层之外的操作层与至少两个操作层中除第一操作层和第二操作层之外的操作层相同。

Description

一种神经网络模型处理方法及装置 技术领域
本申请涉及神经网络技术领域,尤其涉及一种神经网络模型处理方法及装置。
背景技术
随着人工智能(artificial intelligence,AI)和神经网络(neural network,NN)技术的发展,神经网络模型被广泛应用于图像处理、语音识别以及自然语言处理等领域的AI应用,使得使用神经网络模型的AI应用逐年增多。
然而,神经网络模型的参数通常在百万、千万或上亿数量级,因此对运行使用神经网络模型的AI应用的终端设备的存储和计算能力要求较高,限制了神经网络模型在终端设备上的使用。
目前,通常采用减少神经网络模型参数的方法,达到压缩神经网络模型的目的。虽然,这样可以对神经网络模型进行一定程度的压缩,但是采用该方法压缩神经网络模型后,神经网络模型的精度及有效性会降低。
发明内容
本申请提供一种神经网络模型处理方法及装置,用以在不降低神经网络模型的精度及有效性的情况下,压缩神经网络模型。
第一方面,本申请提供一种神经网络模型处理方法,可应用于服务器或终端设备,该方法包括:训练得到第一低比特神经网络模型,所述第一低比特神经网络模型包括至少两个操作层,所述至少两个操作层包括第一操作层和第二操作层,所述至少两个操作层中的每个操作层包括至少一个操作,用于所述至少一个操作的参数和/或数据的值用N比特表示,N为小于8的正整数;将所述第一低比特神经网络模型进行压缩,得到第二低比特神经网络模型,所述第二低比特神经网络模型包括至少一个操作层,所述至少一个操作层包括第三操作层,所述第三操作层等效于所述第一操作层和所述第二操作层,所述至少一个操作层中除所述第三操作层之外的操作层与所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层相同。
通过上述方法,可将训练得到的第一低比特神经网络模型进行压缩,压缩后的第二低比特神经网络模型包括的第三操作层等效于压缩前第一低比特神经网络模型的第一操作层和第二操作层,这样,通过减少神经网络模型的操作层实现对神经网络模型的压缩,由于压缩神经网络模型前后,神经网络模型包括的操作层是等效的,可保证在不降低神经网络模型精度及有效性的情况下,压缩神经网络模型。
一种可能的设计中,将所述第一低比特神经网络模型进行压缩,得到第二低比特神经网络模型,包括:在所述至少两个操作层中查找所述第一操作层和所述第二操作层;将所述第一操作层和所述第二操作层合并,得到所述第三操作层,所述第一操作层的输入与所述第三操作层的输入相同,所述第一操作层的输出为所述第二操作层的输入,所述第二操作层的输出与所述第三操作层的输出相同;根据所述第三操作层,以及所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层,构建所述第二低比特神经网络模 型。
一种可能的设计中,所述第一操作层包括至少一个第一操作,所述第二操作层包括至少一个第二操作,基于该种设计,将所述第一操作层和所述第二操作层合并,得到所述第三操作层,包括:根据预设的规则将所述至少一个第一操作与所述至少一个第二操作合并,得到至少一个第三操作;根据所述至少一个第三操作构建所述第三操作层,所述第三操作层包括所述至少一个第三操作。
一种可能的设计中,所述第一操作层包括至少一个第一操作,所述第二操作层包括至少一个第二操作,基于该种设计,将所述第一操作层和所述第二操作层合并,得到所述第三操作层,包括:根据所述至少一个第一操作和所述至少一个第二操作,构建所述第三操作层,所述第三操作层包括所述至少一个第一操作和所述至少一个第二操作。
一种可能的设计中,存储或发送所述第二低比特神经网络模型。
第二方面,本申请提供一种神经网络模型处理方法,可应用于终端设备,该方法包括:终端设备获取第二低比特神经网络模型;所述终端设备将第一低比特神经网络模型更新为所述第二低比特神经网络模型。
其中,所述第一低比特神经网络模型包括至少两个操作层,所述至少两个操作层包括第一操作层和第二操作层,所述至少两个操作层中的每个操作层包括至少一个操作,用于所述至少一个操作的参数和/或数据的值用N比特表示,N为小于8的正整数,所述第二低比特神经网络模型包括至少一个操作层,所述至少一个操作层包括第三操作层,所述第三操作层等效于所述第一操作层和所述第二操作层,所述至少一个操作层中除所述第三操作层之外的操作层与所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层相同。
一种可能的设计中,所述第二低比特神经网络模型为根据所述第三操作层,以及所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层构建得到的神经网络模型,所述第三操作层为根据所述第一操作层和所述第二操作层合并得到的操作层,所述第一操作层的输入与所述第三操作层的输入相同,所述第一操作层的输出为所述第二操作层的输入,所述第二操作层的输出与所述第三操作层的输出相同。
一种可能的设计中,所述第一操作层包括至少一个第一操作,所述第二操作层包括至少一个第二操作,所述第三操作层包括至少一个第三操作,所述至少一个第三操作为根据预设的规则将所述至少一个第一操作与所述至少一个第二操作合并得到的操作。
一种可能的设计中,所述第一操作层包括至少一个第一操作,所述第二操作层包括至少一个第二操作,所述第三操作层包括所述至少一个第一操作和所述至少一个第二操作。
一种可能的设计中,所述终端设备获取第二低比特神经网络模型,包括:
所述终端设备接收来自服务器的所述第二低比特神经网络模型;或者,
所述终端设备从本地获取所述第二低比特神经网络模型。
第三方面,本申请提供一种神经网络模型处理方法,可应用于神经网络模型处理系统,该方法包括:服务器训练得到第一低比特神经网络模型,所述第一低比特神经网络模型包括至少两个操作层,所述至少两个操作层包括第一操作层和第二操作层,所述至少两个操作层中的每个操作层包括至少一个操作,用于所述至少一个操作的参数和/或数据的值用N比特表示,N为小于8的正整数;所述服务器将所述第一低比特神经网络模型进行压缩,得到第二低比特神经网络模型,所述第二低比特神经网络模型包括至少一个操作层,所述 至少一个操作层包括第三操作层,所述第三操作层等效于所述第一操作层和所述第二操作层,所述至少一个操作层中除所述第三操作层之外的操作层与所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层相同;所述服务器向终端设备发送所述第二低比特神经网络模型;所述终端设备使用所述第二低比特神经网络模型更新本地存储的所述第一低比特神经网络模型。
第四方面,本申请提供一种神经网络模型处理装置,该装置包括处理单元。其中,处理单元,用于训练得到第一低比特神经网络模型,所述第一低比特神经网络模型包括至少两个操作层,所述至少两个操作层包括第一操作层和第二操作层,所述至少两个操作层中的每个操作层包括至少一个操作,用于所述至少一个操作的参数和/或数据的值用N比特表示,N为小于8的正整数;所述处理单元,还用于将所述第一低比特神经网络模型进行压缩,得到第二低比特神经网络模型,所述第二低比特神经网络模型包括至少一个操作层,所述至少一个操作层包括第三操作层,所述第三操作层等效于所述第一操作层和所述第二操作层,所述至少一个操作层中除所述第三操作层之外的操作层与所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层相同。
一种可能的设计中,所述处理单元具体用于:
在所述至少两个操作层中查找所述第一操作层和所述第二操作层;将所述第一操作层和所述第二操作层合并,得到所述第三操作层,所述第一操作层的输入与所述第三操作层的输入相同,所述第一操作层的输出为所述第二操作层的输入,所述第二操作层的输出与所述第三操作层的输出相同;根据所述第三操作层,以及所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层,构建所述第二低比特神经网络模型。
一种可能的设计中,所述第一操作层包括至少一个第一操作,所述第二操作层包括至少一个第二操作,基于该种设计,所述处理单元具体用于:
根据预设的规则将所述至少一个第一操作与所述至少一个第二操作合并,得到至少一个第三操作;根据所述至少一个第三操作构建所述第三操作层,所述第三操作层包括所述至少一个第三操作。
一种可能的设计中,所述第一操作层包括至少一个第一操作,所述第二操作层包括至少一个第二操作,基于该种设计,所述处理单元具体用于:
根据所述至少一个第一操作和所述至少一个第二操作,构建所述第三操作层,所述第三操作层包括所述至少一个第一操作和所述至少一个第二操作。
一种可能的设计中,所述装置还包括存储单元;
所述存储单元,用于存储所述第二低比特神经网络模型;
或者,所述装置还包括收发单元;
所述收发单元,用于发送所述第二低比特神经网络模型。
第五方面,本申请提供一种神经网络模型处理装置,该装置包括获取单元和处理单元。其中,获取单元,用于获取第二低比特神经网络模型;处理单元,用于将第一低比特神经网络模型更新为所述第二低比特神经网络模型。
其中,所述第一低比特神经网络模型包括至少两个操作层,所述至少两个操作层包括第一操作层和第二操作层,所述至少两个操作层中的每个操作层包括至少一个操作,用于所述至少一个操作的参数和/或数据的值用N比特表示,N为小于8的正整数,所述第二低比特神经网络模型包括至少一个操作层,所述至少一个操作层包括第三操作层,所述第三 操作层等效于所述第一操作层和所述第二操作层,所述至少一个操作层中除所述第三操作层之外的操作层与所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层相同。
一种可能的设计中,所述第二低比特神经网络模型为根据所述第三操作层,以及所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层构建得到的神经网络模型,所述第三操作层为根据所述第一操作层和所述第二操作层合并得到的操作层,所述第一操作层的输入与所述第三操作层的输入相同,所述第一操作层的输出为所述第二操作层的输入,所述第二操作层的输出与所述第三操作层的输出相同。
一种可能的设计中,所述第一操作层包括至少一个第一操作,所述第二操作层包括至少一个第二操作,所述第三操作层包括至少一个第三操作,所述至少一个第三操作为根据预设的规则将所述至少一个第一操作与所述至少一个第二操作合并得到的操作。
一种可能的设计中,所述第一操作层包括至少一个第一操作,所述第二操作层包括至少一个第二操作,所述第三操作层包括所述至少一个第一操作和所述至少一个第二操作。
一种可能的设计中,所述装置还包括收发单元;
所述收发单元,用于接收来自服务器的所述第二低比特神经网络模型;
或者,所述处理单元还用于:
从本地获取所述第二低比特神经网络模型。
第六方面,本申请提供一种神经网络模型处理系统,包括服务器和终端设备;
所述服务器训练得到第一低比特神经网络模型,所述第一低比特神经网络模型包括至少两个操作层,所述至少两个操作层包括第一操作层和第二操作层,所述至少两个操作层中的每个操作层包括至少一个操作,用于所述至少一个操作的参数和/或数据的值用N比特表示,N为小于8的正整数;所述服务器将所述第一低比特神经网络模型进行压缩,得到第二低比特神经网络模型,所述第二低比特神经网络模型包括至少一个操作层,所述至少一个操作层包括第三操作层,所述第三操作层等效于所述第一操作层和所述第二操作层,所述至少一个操作层中除所述第三操作层之外的操作层与所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层相同;所述服务器向终端设备发送所述第二低比特神经网络模型;所述终端设备使用所述第二低比特神经网络模型更新本地存储的所述第一低比特神经网络模型。
第七方面,本申请实施例中还提供一种神经网络模型处理装置,该神经网络模型处理装置具有实现上述第一方面方法的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。
在一个可能的设计中,所述神经网络模型处理装置的结构中可以包括处理器和存储器,所述处理器被配置为执行上述第一方面提及的方法。所述存储器与所述处理器耦合,其保存所述神经网络模型处理装置必要的程序指令和数据。
第八方面,本申请实施例中还提供一种神经网络模型处理装置,该神经网络模型处理装置具有实现上述第二方面方法的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。
在一个可能的设计中,所述神经网络模型处理装置的结构中可以包括处理器和存储器,所述处理器被配置为执行上述第二方面提及的方法。所述存储器与所述处理器耦合,其保存所述神经网络模型处理装置必要的程序指令和数据。
第九方面,本申请提供一种神经网络模型处理系统,包括第七方面所述的神经网络模型处理装置和第八方面所述的神经网络模型处理装置。
第十方面,本申请实施例中还提供一种计算机存储介质,所述计算机存储介质存储有计算机可执行指令,所述计算机可执行指令在被计算机调用时,使所述计算机执行上述第一方面或上述第一方面的任意一种设计提供的方法或上述第二方面或上述第二方面的任意一种设计提供的方法。
第十一方面,本申请实施例中还提供一种计算机程序产品,所述计算机程序产品中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面或上述第一方面的任意一种可能的设计中所述的方法或上述第二方面或上述第二方面的任意一种设计提供的方法。
第十二方面,本申请实施例中还提供一种芯片,所述芯片与存储器耦合,用于读取并执行存储器中存储的程序指令,以实现上述第一方面或第二方面或第三方面提及的任一种方法。
附图说明
图1a为本申请实施例提供的一种神经网络模型的示意图;
图1b为本申请实施例提供的另一种神经网络模型的示意图;
图2为本申请实施例可应用的一种计算机装置的结构示意图;
图3为本申请实施例提供的一种神经网络模型处理方法流程图;
图4a为本申请实施例提供的另一种神经网络模型处理方法流程图;
图4b为本申请实施例提供的又一种神经网络模型的示意图;
图4c为本申请实施例提供的又一种神经网络模型的示意图;
图4d为本申请实施例提供的又一种神经网络模型的示意图;
图5为本申请实施例提供的另一种神经网络模型处理方法流程图;
图6为本申请实施例提供的一种神经网络模型处理装置结构示意图;
图7为本申请实施例提供的另一种神经网络模型处理装置结构示意图;
图8为本申请实施例提供的又一种神经网络模型处理装置结构示意图;
图9为本申请实施例提供的又一种神经网络模型处理装置结构示意图。
具体实施方式
下面结合说明书附图对本申请作进一步地详细描述。
本申请实施例提供一种神经网络模型处理方法及装置,本申请实施例提供的神经网络模型处理方法及装置可应用但不限于语音识别(automatic speech recognize,ASR)、自然语言处理(natural language process,NLP)、光符识别(optical character recognition,OCR)或图像处理等领域。在这些领域中,使用神经网络模型的AI应用逐年增多,实际应用中需要将这些AI应用部署在各类终端设备,然而,神经网络模型的参数通常在百万、千万或上亿数量级,因此对运行这些AI应用的终端设备的存储和计算能力要求较高,限制了神经网络模型在终端设备上的使用。为了减小这些AI应用占用终端设备的存储空间以及提升运行这些AI应用的时间,现有技术中通过减少神经网络模型的参数,来压缩神经网 络模型,但是减少模型参数势必会影响模型的精度,故采用该方法压缩神经网络模型后,神经网络模型的精度及有效性会降低。而采用本申请实施例中提供的神经网络模型处理方法,可在不降低神经网络模型的精度及有效性的情况下,压缩神经网络模型。其中,方法和装置是基于同一发明构思的,由于方法及装置解决问题的原理相似,因此装置与方法的实施可以相互参见,重复之处不再赘述。
以下,对本申请实施例中所涉及的概念进行解释说明,以便于本领域技术人员理解:
1)神经网络模仿动物神经网络行为特征,类似于大脑神经突触连接的结构进行数据处理。神经网络模型作为一种数学运算模型,由大量的节点(或称为神经元)之间相互连接构成。神经网络模型可以由输入层、隐藏层以及输出层组成,如图1a所示。其中,输入层为神经网络模型输入数据;输出层为神经网络模型输出数据;而隐藏层由输入层和输出层之间众多节点连接组成的,用于对输入数据进行运算处理。其中,隐藏层可以由一层或多层构成。神经网络模型中隐藏层的层数、节点数与该神经网络实际解决的问题的复杂程度、输入层的节点以及输出层的节点的个数有着直接关系。需要说明的是,由于输入层、隐藏层、输出层中均需要执行相应的操作,故在本申请实施例中可以将输入层、隐藏层、输出层均描述为操作层。如图1b所示,为本申请实施例提供的另一种神经网络模型的示意图,其中,每一个操作层均有输入和输出,对于相邻的两个操作层(如图1b中的操作层1和操作层2)来说,前一个操作层的输出为后一个操作层的输入,每个操作层可以包括至少一个操作,操作用于对该操作层的输入参数进行处理,当有参数输入操作层后,该操作层可以先存储该参数,在需要执行相应的操作时,将该参数读取出来并执行相应操作。
2)AI应用,是指AI领域的应用或应用程序,本申请实施例中所涉及的AI应用主要是指使用神经网络模型的AI应用程序。
通常情况下,神经网络通过大量训练后得到的性能稳定的神经网络模型被广泛应用于AI应用,这些AI应用被部署到各类终端设备上,以实现各个领域对神经网络模型的应用。由于对神经网络进行训练的过程是一个复杂的过程,因此,通常可将神经网络模型训练的平台和神经网络模型部署的平台分开。本申请实施例中提供的神经网络模型处理方法可以在神经网络模型训练的平台实现,也可以在神经网络模型部署的平台实现,本申请对此不做限定。示例性的,神经网络模型训练的平台可包括但不限于计算机装置、服务器(server)或云服务平台等,计算机装置例如可以包括个人计算机(personal computer,PC)、台式计算机、平板电脑、车载计算机等。示例性的,神经网络模型部署的平台可包括但不限于终端设备、计算机装置或服务器等。其中,终端设备可以是向用户提供语音和/或数据连通性的设备,例如可以包括具有无线连接功能的手持式设备、或连接到无线调制解调器的处理设备。该终端设备可以包括用户设备(user equipment,UE)、无线终端设备、移动终端设备、用户单元(subscriber unit)、接入点(access point,AP)、远程终端设备(remote terminal)、接入终端设备(access terminal)、用户终端设备(user terminal)、用户代理(user agent)、或用户装备(user device)等。例如,可以包括移动电话(或称为“蜂窝”电话),具有移动终端设备的计算机,便携式、袖珍式、手持式、计算机内置的或者车载的移动装置,智能穿戴式设备等。例如,个人通信业务(personal communication service,PCS)电话、无绳电话、会话发起协议(session initiation protocol,SIP)话机、无线本地环路(wireless local loop,WLL)站、个人数字助理(personal digital assistant,PDA)、等设备。还包括受限设备,例如功耗较低的设备,或存储能力有限的设备,或计算能力有限的设备等。例如包括 条码、射频识别(radio frequency identification,RFID)、传感器、全球定位系统(global positioning system,GPS)、激光扫描器等信息传感设备。
本申请实施例下面以在计算机装置实现本申请提供的神经网络模型处理方法为例进行说明。
请参见图2,图2示出了本申请实施例可应用的一种可能的计算机装置的结构示意图。如图2所示,所述计算机装置中包括:处理器210、存储器220、通信模块230、输入单元240、显示单元250、电源260等部件。本领域技术人员可以理解,图2中示出的计算机装置的结构并不构成对计算机装置的限定,本申请实施例提供的计算机装置可以包括比图2所示的计算机装置更多或更少的部件,或者组合某些部件,或者不同的部件布置。
下面结合图2对计算机装置的各个构成部件进行具体介绍:
所述通信模块230可以通过无线连接或物理连接的方式连接其他设备,实现计算机装置的数据发送和接收。可选的,所述通信模块230可以包含射频(radio frequency,RF)电路、无线保真(wireless fidelity,WiFi)模块、通信接口或蓝牙模块等任一项或组合,本申请实施例对此不作限定。
所述存储器220可用于存储程序指令和数据。所述处理器210通过运行存储在所述存储器220的程序指令,从而执行计算机装置的各种功能应用以及数据处理。其中,所述程序指令中存在可使所述处理器210执行本申请以下实施例提供的神经网络模型处理方法的程序指令。
可选的,所述存储器220可以主要包括存储程序区和存储数据区。其中,存储程序区可存储操作系统、各种应用程序以及程序指令等;存储数据区可存储神经网络等各种数据。此外,所述存储器220可以包括高速随机存取存储器,还可以包括非易失性存储器,例如磁盘存储器件、闪存器件或其他易失性固态存储器件。
所述输入单元240可用于接收用户输入的数据或操作指令等信息。可选的,所述输入单元240可包括触控面板、功能键、物理键盘、鼠标、摄像头、监控器等输入设备。
所述显示单元250可以实现人机交互,用于通过用户界面显示由用户输入的信息,提供给用户的信息等内容。其中,所述显示单元250可以包括显示面板251。可选的,所述显示面板251可以采用液晶显示屏(liquid crystal display,LCD)、有机发光二极管(organic light-emitting diode,OLED)等形式来配置。
进一步的,当输入单元中包含触控面板时,该触控面板可覆盖所述显示面板251,当所述触控面板检测到在其上或附近的触摸事件后,传送给所述处理器210以确定触摸事件的类型从而执行相应的操作。
所述处理器210是计算机装置的控制中心,利用各种接口和线路连接以上各个部件。所述处理器210可以通过执行存储在所述存储器220内的程序指令,以及调用存储在所述存储器220内的数据,以完成计算机装置的各种功能,实现本申请实施例提供的方法。
可选的,所述处理器210可包括一个或多个处理单元。在一种可实现方式中,所述处理器210可集成应用处理器和调制解调处理器,其中,所述应用处理器主要处理操作系统、用户界面和应用程序等,所述调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到所述处理器210中。在本申请实施例中,所述处理单元可以对神经网络模型进行压缩。其中,示例性的,所述处理器210可以是中央处理器(central processing unit,CPU),图形处理器(Graphics Processing Unit,GPU)或者CPU和GPU 的组合。所述处理器210还可以是网络处理器(network processor unit,NPU)、张量处理器(tensor processing unit,TPU)等等支持神经网络处理的人工智能(artificial intelligence,AI)芯片。所述处理器210还可以进一步包括硬件芯片。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD),数字信号处理器件(digital sgnal processing,DSP)或其组合。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD),现场可编程逻辑门阵列(field-programmable gate array,FPGA),通用阵列逻辑(generic array logic,GAL)或其任意组合。
所述计算机装置还包括用于给各个部件供电的电源260(比如电池)。可选的,所述电源260可以通过电源管理系统与所述处理器210逻辑相连,从而通过电源管理系统实现对所述计算机装置的充电、放电等功能。
尽管未示出,所述计算机装置还可以包括摄像头、传感器、音频采集器等部件,在此不再赘述。
需要说明的是,上述计算机装置仅仅是本申请实施例提供的方法所适用的一种设备的示例。应理解,本申请实施例提供的方法还可以应用于除上述计算机装置以外的其它设备,例如还可应用于终端设备、服务器或云端服务器等,本申请对此不作限定。
本申请实施例提供的神经网络模型处理方法,可以适用于图2所示的计算机装置,也可以适用于其它设备(例如服务器或终端设备等)。参阅图3所示,以执行主体为神经网络模型处理装置为例说明本申请提供的神经网络模型处理方法,所述方法的具体流程可以包括:
步骤101:神经网络模型处理装置训练得到第一低比特神经网络模型。其中,所述第一低比特神经网络模型包括至少两个操作层,所述至少两个操作层包括第一操作层和第二操作层,所述至少两个操作层中的每个操作层包括至少一个操作,用于所述至少一个操作的参数和/或数据的值用N比特表示,N为小于8的正整数。
本申请实施例中,低比特神经网络模型是指所涉及的操作的参数和/或数据的值用小于8的正整数比特来表示的神经网络模型。例如可以包括二值化神经网络模型或三值化神经网络模型等。
本申请实施例中,所述第一操作层和所述第二操作层可以是预先定义可以合并的操作层。在实际应用中可以基于预先设定的规则定义第一操作层和第二操作层可以合并。例如,所述预先设定的规则可以是所述第一操作层和所述第二操作层是相邻的操作层,且所述第一操作层和所述第二操作层中的操作是线性操作。例如,可以预先定义直接相连的操作层-批量归一化层(batch normalization layer,BN)、操作层-尺度层(scale layer,scale)以及操作层-低比特激活层(binary activation layer,BinAct)可以合并;也可以预先定义直接相连的操作层-低比特卷积层(binary convolution layer,BinConv)、操作层BinAct以及操作层-BinConv可以合并;也可以预先定义直接相连的操作层-带偏置的低比特卷积层(binary convolution with bias layer,BinConvBias)以及操作层-BinAct可以合并;也可以预先定义直接相连的操作层-BinConvBias、操作层-池化层(pooling layer,Pool)以及操作层BinAct可以合并;也可以预先定义直接相连的操作层-卷积层(convolution layer,Conv)、操作层BN以及操作层scale可以合并等。
步骤102:神经网络模型处理装置将所述第一低比特神经网络模型进行压缩,得到第 二低比特神经网络模型,所述第二低比特神经网络模型包括至少一个操作层,所述至少一个操作层包括第三操作层,所述第三操作层等效于所述第一操作层和所述第二操作层,所述至少一个操作层中除所述第三操作层之外的操作层与所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层相同。
可以理解,本申请中是对训练得到的低比特神经网络模型进行压缩,例如对训练得到的二值化神经网络模型进行压缩,相比直接压缩浮点型神经网络模型的方法,采用本申请的方法,不仅可减少神经网络模型的操作层,同时由于低比特神经网络模型中的参数和/或数据均使用小于8的正整数的比特表示,相比浮点型神经网络模型中参数和/或数据均使用大于等于8的正整数的比特表示,本申请的方法可极大的减少压缩后的模型占用的存储空间。
需要说明的是,本申请中两个操作层相同是指两个操作层中包括的操作和输入参数完全相同。
本申请实施例中,神经网络模型处理装置可采用但不限于如下方法将所述第一低比特神经网络模型进行压缩,得到第二低比特神经网络模型:
步骤1021:神经网络模型处理装置在所述第一低比特神经网络模型包括的所述至少两个操作层中查找所述第一操作层和所述第二操作层。
步骤1022:神经网络模型处理装置将所述第一操作层和所述第二操作层合并,得到所述第三操作层。所述第一操作层的输入与所述第三操作层的输入相同,所述第一操作层的输出为所述第二操作层的输入,所述第二操作层的输出与所述第三操作层的输出相同。其中,所述第一操作层的输出为所述第二操作层的输入可以理解为第一操作层和第二操作层是相邻的操作层,相邻的操作层例如图1b中的操作层1和操作层2、操作层2和操作层3、操作层3和操作层4。
需要说明的是,本申请实施例以合并两个操作层为例说明,在实际应用中,可以还可以合并三个操作层、四个操作层等,本申请实施例中对合并的操作层的数量不做限定。
步骤1023:神经网络模型处理装置根据所述第三操作层,以及所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层,构建所述第二低比特神经网络模型。
本申请实施例中,所述第一操作层可以包括至少一个第一操作,所述第二操作层可以包括至少一个第二操作。
本申请可采用如下两种方式将所述第一操作层和所述第二操作层合并,得到所述第三操作层。
第一种实现方式:神经网络模型处理装置根据预设的规则将所述至少一个第一操作与所述至少一个第二操作合并,得到至少一个第三操作,并根据所述至少一个第三操作构建所述第三操作层,此时所述第三操作层包括所述至少一个第三操作。可以理解,在第一种实现方式中,在将第一操作层和第二操作层合并为第三操作层时,同时将第一操作层包括的至少一个第一操作和第二操作层包括的至少一个第二操作按预先定义的规则合并为至少一个第三操作。本申请实施例中,至少一个第三操作,与,至少一个第一操作和至少一个第二操作等效,这样,将操作层的操作合并之后,在运行压缩后的神经网络模型时,可以减少执行的操作,进而可提升运行效率。
需要说明的是,本申请实施例中,合并第一操作和第二操作的预设规则取决于第一操作和第二操作本身,可以理解为,合并不同的操作采用不同的预设规则,下文中会举例说 明,此处不再赘述。
第二种实现方式:神经网络模型处理装置根据所述至少一个第一操作和所述至少一个第二操作,构建所述第三操作层,此时所述第三操作层包括所述至少一个第一操作和所述至少一个第二操作。可以理解,在第二种实现方式中,可以仅合并第一操作层和第二操作层,不合并第一操作层和第二操作层包括的操作,这样,虽然未能减少模型处理时的操作,但是将第一操作层和第二操作层合并后,在第一操作层向第二操作层输入参数后,无需再存储该输入参数,可节省存储空间。
在一种可选的实施方式中,神经网络模型处理装置在压缩第一低比特神经网络模型,得到第二低比特神经网络模型之后,还可以存储或发送所述第二低比特神经网络模型。例如,可以向终端设备发送所述第二低比特神经网络模型。
采用本申请实施例提供的神经网络模型处理方法,在进行神经网络模型压缩时,通过将神经网络模型的操作层等效压缩,即通过减少神经网络模型的操作层实现对神经网络模型的压缩,得到压缩后的等效模型,可保证在不降低神经网络模型精度及有效性的情况下,压缩神经网络模型。
下面以一个实例对上述图3提供的神经网络模型处理方法举例说明,在该举例中以合并三个操作层为例示意说明。参阅图4a所示,该方法包括:
步骤201:神经网络模型处理装置训练得到如图4b所示的第一低比特神经网络模型,该第一低比特神经网络模型包括第1操作层、第2操作层、第3操作层、第4操作层以及第5操作层。其中,第1操作层、第2操作层、第3操作层是预先定义可以合并的操作层。第1操作层中包括第1操作,第1操作为:第1操作层的输入-mu/sqrt(delta+epsilon),假设第1操作层的输入为x,则x经第1操作层之后的输出为y=x-mu/sqrt(delta+epsilon),其中,mu是训练的批量均值,delta是训练的批量方差,epsilon是任意小的固定常数。第2操作层中包括第2操作,第2操作为:alpha*第2操作层的输入+beta,由于第1操作层和第2操作层是相邻的操作层,由图4b可知,第2操作层是第1操作层的后一层,故第2操作层的输入等于第1操作层的输出,即第2操作层的输入等于y=x-mu/sqrt(delta+epsilon),故经第2操作层之后的输出为z=alpha*(x-mu/sqrt(delta+epsilon))+beta,其中,alpha是训练出的第2操作层的系数,beta是训练出的bias。第3操作层中包括第3操作,第3操作为:当第3操作层的输入大于等于零时输出1,当第3操作层的输入小于零时输出负1,由于第2操作层和第3操作层是相邻的操作层,由图4b可知,第3操作层是第2操作层的后一层,故第3操作层的输入等于第2操作层的输出,即第3操作层的输入等于z=alpha*(x-mu/sqrt(delta+epsilon))+beta。
步骤202:神经网络模型处理装置将图4b所示的第一低比特神经网络模型进行压缩,得到图4c所示的第二低比特神经网络模型。如图4c所示,第二低比特神经网络模型包括第k操作层、第4操作层和第5操作层,第k操作层等效于图4b所示的第1操作层和第2操作层和第3操作层,且,第k操作层中包括第k操作,第k操作为:当alpha*x大于等于thresh时输出1,当alpha*x小于thresh时输出负1,其中,thresh=sqrt(delta+epsilon)*(-beta)+alpha*mu,该第k操作是由第1操作、第2操作以及第3操作合并得到,可以理解为,第k操作,与,第1操作和第2操作和第3操作等效,此处等效的含义是指,同一输入经第k操作后得到的输出,与该输入经第1操作经第2操作经第3操作得到的输出相同。
需要说明的是,上述图4a-图4c仅以将三个操作层合并为例示意,在实际应用中可将多个操作层合并,且,若模型结构中存在重复的可合并操作层结构,则可以合并多个重复的可合并操作层结构。例如,以图4d为例示意说明,图4d中第1操作层、第2操作层、第3操作层是预先定义可以合并的操作层,则在实际应用中可以只合并图4d中左侧的可合并操作层结构:第1操作层-第2操作层-第3操作层,也可以只合并图4d中右侧的可合并操作层结构:第1操作层-第2操作层-第3操作层,当然,也可以合并这两个可合并的操作层结构。
通过上述方法,将操作层的操作合并之后,在运行压缩后的神经网络模型时,可以减少执行的操作,由于模型压缩后操作是完全等效的,进而可实现在不降低神经网络模型精度及有效性的情况下,提升运行效率。
上述图4a的举例中以上述合并操作层的第一种实现方式为例说明。下面结合图4b举例说明上述合并操作层的第二种实现方式。合并操作层的第二种实现方式中,可以仅合并第1操作层、第2操作层、第3操作层,不合并第1操作层、第2操作层、第3操作层包括的操作,这样,在将第1操作层、第2操作层、第3操作层合并得到第k操作层中仍然包括第1操作、第2操作和第3操作,三个操作分别执行,采用这种合并方式,虽然未能减少模型处理时的操作,但是将第1操作层、第2操作层、第3操作层合并后,第1操作层可以将输出y=x-mu/sqrt(delta+epsilon)直接执行第2操作,而无需再存储该输出参数y,类似的,第2操作层可以将输出z=alpha*(x-mu/sqrt(delta+epsilon))+beta直接执行第3操作,而无需再存储该输出参数z,这样,可降低运行模型时的内存占用。
本申请中,通过上述图3所示的实施例得到的最终神经网络模型可以应用于另一种神经网络模型处理装置中,以使该神经网络模型处理装置基于所述最终得到的神经网络模型进行处理。基于此,本申请实施例还提供了另一种神经网络模型处理方法,该方法基于图3所示的实施例得到的最终的神经网络模型实现。如图5所示,以执行主体为终端设备为例说明本申请提供的另一种神经网络模型处理方法,该方法的具体流程可以包括如下步骤:
步骤301:终端设备获取第二低比特神经网络模型。
本申请实施例中,终端设备可采用如下方法获取第二低比特神经网络模型。
方法一:所述终端设备接收来自服务器的所述第二低比特神经网络模型。可以理解,方法一中服务器可以为图3中方法所适用的神经网络模型处理装置的一种示例。
方法二:所述终端设备从本地获取所述第二低比特神经网络模型。若采用该方法二,则所述第二低比特神经网络模型可以是由所述终端设备采用如图3所示的方法压缩得到的。
步骤302:所述终端设备将第一低比特神经网络模型更新为所述第二低比特神经网络模型。其中,本申请实施例中,更新可以理解为替换。这样,可以在终端设备中使用压缩后的第二低比特神经网络模型,由于压缩后的第二低比特神经网络模型相比压缩前的第一低比特神经网络模型减少了操作层,相应的减少了需要执行的操作,以及存储操作数据的存储空间,故,可在不降低神经网络模型的精度及有效性的情况下,提升终端设备的运算效率。
需要说明的是,当图3所示的方法的执行主体为终端设备(例如手机)时,在得到第二低比特神经网络模型之后,可以继续执行图5所示的处理方法。
基于上述实施例,本申请实施例还提供了一种神经网络模型处理装置,用于实现如图3所示的实施例提供的神经网络模型处理方法。参阅图6所示,所述神经网络模型处理装 置600中包括处理单元601,其中:
所述处理单元601用于训练得到第一低比特神经网络模型,所述第一低比特神经网络模型包括至少两个操作层,所述至少两个操作层包括第一操作层和第二操作层,所述至少两个操作层中的每个操作层包括至少一个操作,用于所述至少一个操作的参数和/或数据的值用N比特表示,N为小于8的正整数;
所述处理单元601还用于将所述第一低比特神经网络模型进行压缩,得到第二低比特神经网络模型,所述第二低比特神经网络模型包括至少一个操作层,所述至少一个操作层包括第三操作层,所述第三操作层等效于所述第一操作层和所述第二操作层,所述至少一个操作层中除所述第三操作层之外的操作层与所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层相同。
其中,所述第一操作层和所述第二操作层可以是预先定义可以合并的操作层。
在一种可选的实施方式中,所述处理单元601具体用于:
在所述至少两个操作层中查找所述第一操作层和所述第二操作层;将所述第一操作层和所述第二操作层合并,得到所述第三操作层,所述第一操作层的输入与所述第三操作层的输入相同,所述第一操作层的输出为所述第二操作层的输入,所述第二操作层的输出与所述第三操作层的输出相同;根据所述第三操作层,以及所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层,构建所述第二低比特神经网络模型。
在一种可选的实施方式中,所述第一操作层包括至少一个第一操作,所述第二操作层包括至少一个第二操作,基于此,所述处理单元601具体用于:
根据预设的规则将所述至少一个第一操作与所述至少一个第二操作合并,得到至少一个第三操作;根据所述至少一个第三操作构建所述第三操作层,所述第三操作层包括所述至少一个第三操作。
在一种可选的实施方式中,所述第一操作层包括至少一个第一操作,所述第二操作层包括至少一个第二操作,基于此,所述处理单元601具体用于:
根据所述至少一个第一操作和所述至少一个第二操作,构建所述第三操作层,所述第三操作层包括所述至少一个第一操作和所述至少一个第二操作。
在一种可选的实施方式中,所述装置还包括存储单元602;
所述存储单元602,用于存储所述第二低比特神经网络模型;或者,
所述装置还包括收发单元603;
所述收发单元603,用于发送所述第二低比特神经网络模型。
基于上述实施例,本申请实施例还提供了另一种神经网络模型处理装置,用于实现如图5所示的实施例提供的处理方法。参阅图7所示,所述神经网络模型处理装置700中包括获取单元701和处理单元702,其中:
所述获取单元701用于获取第二低比特神经网络模型;
所述处理单元702,用于将第一低比特神经网络模型更新为所述第二低比特神经网络模型;
其中,所述第一低比特神经网络模型包括至少两个操作层,所述至少两个操作层包括第一操作层和第二操作层,所述至少两个操作层中的每个操作层包括至少一个操作,用于所述至少一个操作的参数和/或数据的值用N比特表示,N为小于8的正整数,所述第二低 比特神经网络模型包括至少一个操作层,所述至少一个操作层包括第三操作层,所述第三操作层等效于所述第一操作层和所述第二操作层,所述至少一个操作层中除所述第三操作层之外的操作层与所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层相同。
在一种可选的实施方式中,所述第二低比特神经网络模型为根据所述第三操作层,以及所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层构建得到的神经网络模型,所述第三操作层为根据所述第一操作层和所述第二操作层合并得到的操作层,所述第一操作层的输入与所述第三操作层的输入相同,所述第一操作层的输出为所述第二操作层的输入,所述第二操作层的输出与所述第三操作层的输出相同。
在一种可选的实施方式中,所述第一操作层包括至少一个第一操作,所述第二操作层包括至少一个第二操作,所述第三操作层包括至少一个第三操作,所述至少一个第三操作为根据预设的规则将所述至少一个第一操作与所述至少一个第二操作合并得到的操作。
在一种可选的实施方式中,所述第一操作层包括至少一个第一操作,所述第二操作层包括至少一个第二操作,所述第三操作层包括所述至少一个第一操作和所述至少一个第二操作。
在一种可选的实施方式中,所述装置还包括收发单元703;
所述收发单元703,用于接收来自服务器的所述第二低比特神经网络模型;
或者,
所述处理单元702还用于:
从本地获取所述第二低比特神经网络模型。
需要说明的是,本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。在本申请的实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
基于以上实施例,本申请实施例还提供了一种神经网络模型处理装置,所述神经网络模型处理装置,用于实现图3所示的神经网络模型处理方法。参阅图8所示,所述神经网络模型处理装置800包括:处理器801和存储器802,其中:
所述处理器801可以是CPU,GPU或者CPU和GPU的组合。所述处理器801还可以是NPU、TPU等等支持神经网络处理的AI芯片。所述处理器801还可以进一步包括硬件芯片。上述硬件芯片可以是ASIC,PLD,DSP或其组合。上述PLD可以是CPLD,FPGA, GAL或其任意组合。需要说明的是,所述处理器801不限于上述列举的情况,所述处理器801可以是能够实现上述图3所示的神经网络模型处理方法的任何处理器件。
所述处理器801以及所述存储器802之间相互连接。可选的,所述处理器801以及所述存储器802通过总线803相互连接;所述总线803可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard architecture,EISA)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图8中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
所述处理器801用于实现本申请实施例图3提供的神经网络模型处理方法时,执行以下操作:
训练得到第一低比特神经网络模型,所述第一低比特神经网络模型包括至少两个操作层,所述至少两个操作层包括第一操作层和第二操作层,所述至少两个操作层中的每个操作层包括至少一个操作,用于所述至少一个操作的参数和/或数据的值用N比特表示,N为小于8的正整数;
将所述第一低比特神经网络模型进行压缩,得到第二低比特神经网络模型,所述第二低比特神经网络模型包括至少一个操作层,所述至少一个操作层包括第三操作层,所述第三操作层等效于所述第一操作层和所述第二操作层,所述至少一个操作层中除所述第三操作层之外的操作层与所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层相同。
其中,所述第一操作层和所述第二操作层可以是预先定义可以合并的操作层。
在一种可选的实施方式中,所述处理器801还可以执行其他操作,具体可以参照以上图3所示的实施例中步骤101、步骤102中涉及的具体描述,此处不再赘述。
所述存储器802,用于存放程序和数据等。具体地,程序可以包括程序代码,该程序代码包括计算机操作的指令。存储器802可能包含随机存取存储器(random access memory,RAM),也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。处理器801执行存储器802所存放的程序,实现上述功能,从而实现如图3所示的方法。
需要说明的是,当图8所示的神经网络模型处理装置可以应用于计算机装置时,所述神经网络模型处理装置可以体现为图2所示的计算机装置。此时,所述处理器801可以与图2中示出的处理器210相同,所述存储器802可以与图2中示出的存储器220相同。
基于以上实施例,本申请实施例还提供了另一种神经网络模型处理装置,所述神经网络模型处理装置,用于实现图5所示的方法。参阅图9所示,所述神经网络模型处理装置900包括:处理器901和存储器902,其中:
所述处理器901可以是CPU,GPU或者CPU和GPU的组合。所述处理器901还可以是NPU、TPU等等支持神经网络处理的AI芯片。所述处理器901还可以进一步包括硬件芯片。上述硬件芯片可以是ASIC,PLD,DSP或其组合。上述PLD可以是CPLD,FPGA,GAL或其任意组合。需要说明的是,所述处理器901不限于上述列举的情况,所述处理器901可以是能够实现更新神经网络模型的任何处理器件。
所述处理器901以及所述存储器902之间相互连接。可选的,所述处理器901以及所述存储器902通过总线903相互连接;所述总线903可以是外设部件互连标准(peripheral component interconnect,PCI)总线或扩展工业标准结构(extended industry standard  architecture,EISA)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图9中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
所述处理器901用于实现本申请实施例提供的方法时,可以执行以下操作:
获取第二低比特神经网络模型;
将第一低比特神经网络模型更新为所述第二低比特神经网络模型;
其中,所述第一低比特神经网络模型包括至少两个操作层,所述至少两个操作层包括第一操作层和第二操作层,所述至少两个操作层中的每个操作层包括至少一个操作,用于所述至少一个操作的参数和/或数据的值用N比特表示,N为小于8的正整数,所述第二低比特神经网络模型包括至少一个操作层,所述至少一个操作层包括第三操作层,所述第三操作层等效于所述第一操作层和所述第二操作层,所述至少一个操作层中除所述第三操作层之外的操作层与所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层相同。
在一种可选的实施方式中,所述处理器901还可以执行其他操作,具体可以参照以上图5所示的实施例中步骤301和步骤302中涉及的具体描述,此处不再赘述。
所述存储器902,用于存放程序和数据等。具体地,程序可以包括程序代码,该程序代码包括计算机操作的指令。存储器902可能包含随机存取存储器(random access memory,RAM),也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。处理器901执行存储器902所存放的程序,实现上述功能,从而实现如图5所示的处理方法。
需要说明的是,当图9所示的神经网络模型处理装置应用于计算机装置时,所述神经网络模型处理装置可以体现为图2所示的计算机装置。此时,所述处理器901可以与图2中示出的处理器210相同,所述存储器902可以与图2中示出的存储器220相同。
基于与上述方法实施例相同构思,本申请实施例还提供了一种计算机可读存储介质,其上存储有一些指令,这些指令被计算机调用执行时,可以使得计算机完成上述方法实施例、方法实施例的任意一种可能的设计中所涉及的方法。本申请实施例中,对计算机可读存储介质不做限定,例如,可以是RAM(random-access memory,随机存取存储器)、ROM(read-only memory,只读存储器)等。
基于与上述方法实施例相同构思,本申请还提供一种计算机程序产品,该计算机程序产品在被计算机调用执行时可以完成方法实施例以及上述方法实施例任意可能的设计中所涉及的方法。
基于与上述方法实施例相同构思,本申请还提供一种芯片,该芯片与收发器耦合,用于完成上述方法实施例、方法实施例的任意一种可能的实现方式中所涉及的方法,其中,“耦合”是指两个部件彼此直接或间接地结合,这种结合可以是固定的或可移动性的,这种结合可以允许流动液、电、电信号或其它类型信号在两个部件之间进行通信。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图 和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。
显然,本领域的技术人员可以对本申请实施例进行各种改动和变型而不脱离本申请实施例的范围。这样,倘若本申请实施例的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (25)

  1. 一种神经网络模型处理方法,应用于服务器或终端设备,其特征在于,包括:
    训练得到第一低比特神经网络模型,所述第一低比特神经网络模型包括至少两个操作层,所述至少两个操作层包括第一操作层和第二操作层,所述至少两个操作层中的每个操作层包括至少一个操作,用于所述至少一个操作的参数和/或数据的值用N比特表示,N为小于8的正整数;
    将所述第一低比特神经网络模型进行压缩,得到第二低比特神经网络模型,所述第二低比特神经网络模型包括至少一个操作层,所述至少一个操作层包括第三操作层,所述第三操作层等效于所述第一操作层和所述第二操作层,所述至少一个操作层中除所述第三操作层之外的操作层与所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层相同。
  2. 根据权利要求1所述的方法,其特征在于,将所述第一低比特神经网络模型进行压缩,得到第二低比特神经网络模型,包括:
    在所述至少两个操作层中查找所述第一操作层和所述第二操作层;
    将所述第一操作层和所述第二操作层合并,得到所述第三操作层,所述第一操作层的输入与所述第三操作层的输入相同,所述第一操作层的输出为所述第二操作层的输入,所述第二操作层的输出与所述第三操作层的输出相同;
    根据所述第三操作层,以及所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层,构建所述第二低比特神经网络模型。
  3. 根据权利要求2所述的方法,其特征在于,所述第一操作层包括至少一个第一操作,所述第二操作层包括至少一个第二操作;
    将所述第一操作层和所述第二操作层合并,得到所述第三操作层,包括:
    根据预设的规则将所述至少一个第一操作与所述至少一个第二操作合并,得到至少一个第三操作;
    根据所述至少一个第三操作构建所述第三操作层,所述第三操作层包括所述至少一个第三操作。
  4. 根据权利要求2所述的方法,其特征在于,所述第一操作层包括至少一个第一操作,所述第二操作层包括至少一个第二操作;
    将所述第一操作层和所述第二操作层合并,得到所述第三操作层,包括:
    根据所述至少一个第一操作和所述至少一个第二操作,构建所述第三操作层,所述第三操作层包括所述至少一个第一操作和所述至少一个第二操作。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,还包括:
    存储或发送所述第二低比特神经网络模型。
  6. 一种神经网络模型处理方法,其特征在于,包括:
    终端设备获取第二低比特神经网络模型;
    所述终端设备将第一低比特神经网络模型更新为所述第二低比特神经网络模型;
    其中,所述第一低比特神经网络模型包括至少两个操作层,所述至少两个操作层包括第一操作层和第二操作层,所述至少两个操作层中的每个操作层包括至少一个操作,用于所述至少一个操作的参数和/或数据的值用N比特表示,N为小于8的正整数,所述第二低 比特神经网络模型包括至少一个操作层,所述至少一个操作层包括第三操作层,所述第三操作层等效于所述第一操作层和所述第二操作层,所述至少一个操作层中除所述第三操作层之外的操作层与所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层相同。
  7. 根据权利要求6所述的方法,其特征在于,所述第二低比特神经网络模型为根据所述第三操作层,以及所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层构建得到的神经网络模型,所述第三操作层为根据所述第一操作层和所述第二操作层合并得到的操作层,所述第一操作层的输入与所述第三操作层的输入相同,所述第一操作层的输出为所述第二操作层的输入,所述第二操作层的输出与所述第三操作层的输出相同。
  8. 根据权利要求7所述的方法,其特征在于,所述第一操作层包括至少一个第一操作,所述第二操作层包括至少一个第二操作,所述第三操作层包括至少一个第三操作,所述至少一个第三操作为根据预设的规则将所述至少一个第一操作与所述至少一个第二操作合并得到的操作。
  9. 根据权利要求7所述的方法,其特征在于,所述第一操作层包括至少一个第一操作,所述第二操作层包括至少一个第二操作,所述第三操作层包括所述至少一个第一操作和所述至少一个第二操作。
  10. 根据权利要求6至9任一项所述的方法,其特征在于,所述终端设备获取第二低比特神经网络模型,包括:
    所述终端设备接收来自服务器的所述第二低比特神经网络模型;或者,
    所述终端设备从本地获取所述第二低比特神经网络模型。
  11. 一种神经网络模型处理方法,其特征在于,包括:
    服务器训练得到第一低比特神经网络模型,所述第一低比特神经网络模型包括至少两个操作层,所述至少两个操作层包括第一操作层和第二操作层,所述至少两个操作层中的每个操作层包括至少一个操作,用于所述至少一个操作的参数和/或数据的值用N比特表示,N为小于8的正整数;
    所述服务器将所述第一低比特神经网络模型进行压缩,得到第二低比特神经网络模型,所述第二低比特神经网络模型包括至少一个操作层,所述至少一个操作层包括第三操作层,所述第三操作层等效于所述第一操作层和所述第二操作层,所述至少一个操作层中除所述第三操作层之外的操作层与所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层相同;
    所述服务器向终端设备发送所述第二低比特神经网络模型;
    所述终端设备使用所述第二低比特神经网络模型更新本地存储的所述第一低比特神经网络模型。
  12. 一种神经网络模型处理装置,其特征在于,包括:
    处理单元,用于训练得到第一低比特神经网络模型,所述第一低比特神经网络模型包括至少两个操作层,所述至少两个操作层包括第一操作层和第二操作层,所述至少两个操作层中的每个操作层包括至少一个操作,用于所述至少一个操作的参数和/或数据的值用N比特表示,N为小于8的正整数;
    所述处理单元,还用于将所述第一低比特神经网络模型进行压缩,得到第二低比特神 经网络模型,所述第二低比特神经网络模型包括至少一个操作层,所述至少一个操作层包括第三操作层,所述第三操作层等效于所述第一操作层和所述第二操作层,所述至少一个操作层中除所述第三操作层之外的操作层与所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层相同。
  13. 根据权利要求12所述的装置,其特征在于,所述处理单元具体用于:
    在所述至少两个操作层中查找所述第一操作层和所述第二操作层;
    将所述第一操作层和所述第二操作层合并,得到所述第三操作层,所述第一操作层的输入与所述第三操作层的输入相同,所述第一操作层的输出为所述第二操作层的输入,所述第二操作层的输出与所述第三操作层的输出相同;
    根据所述第三操作层,以及所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层,构建所述第二低比特神经网络模型。
  14. 根据权利要求13所述的装置,其特征在于,所述第一操作层包括至少一个第一操作,所述第二操作层包括至少一个第二操作;
    所述处理单元具体用于:
    根据预设的规则将所述至少一个第一操作与所述至少一个第二操作合并,得到至少一个第三操作;
    根据所述至少一个第三操作构建所述第三操作层,所述第三操作层包括所述至少一个第三操作。
  15. 根据权利要求13所述的装置,其特征在于,所述第一操作层包括至少一个第一操作,所述第二操作层包括至少一个第二操作;
    所述处理单元具体用于:
    根据所述至少一个第一操作和所述至少一个第二操作,构建所述第三操作层,所述第三操作层包括所述至少一个第一操作和所述至少一个第二操作。
  16. 根据权利要求12至15任一项所述的装置,其特征在于,所述装置还包括存储单元;
    所述存储单元,用于存储所述第二低比特神经网络模型;或者,
    所述装置还包括收发单元;
    所述收发单元,用于发送所述第二低比特神经网络模型。
  17. 一种神经网络模型处理装置,其特征在于,包括:
    获取单元,用于获取第二低比特神经网络模型;
    处理单元,用于将第一低比特神经网络模型更新为所述第二低比特神经网络模型;
    其中,所述第一低比特神经网络模型包括至少两个操作层,所述至少两个操作层包括第一操作层和第二操作层,所述至少两个操作层中的每个操作层包括至少一个操作,用于所述至少一个操作的参数和/或数据的值用N比特表示,N为小于8的正整数,所述第二低比特神经网络模型包括至少一个操作层,所述至少一个操作层包括第三操作层,所述第三操作层等效于所述第一操作层和所述第二操作层,所述至少一个操作层中除所述第三操作层之外的操作层与所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层相同。
  18. 根据权利要求17所述的装置,其特征在于,所述第二低比特神经网络模型为根据所述第三操作层,以及所述至少两个操作层中除所述第一操作层和所述第二操作层之外 的操作层构建得到的神经网络模型,所述第三操作层为根据所述第一操作层和所述第二操作层合并得到的操作层,所述第一操作层的输入与所述第三操作层的输入相同,所述第一操作层的输出为所述第二操作层的输入,所述第二操作层的输出与所述第三操作层的输出相同。
  19. 根据权利要求18所述的装置,其特征在于,所述第一操作层包括至少一个第一操作,所述第二操作层包括至少一个第二操作,所述第三操作层包括至少一个第三操作,所述至少一个第三操作为根据预设的规则将所述至少一个第一操作与所述至少一个第二操作合并得到的操作。
  20. 根据权利要求18所述的装置,其特征在于,所述第一操作层包括至少一个第一操作,所述第二操作层包括至少一个第二操作,所述第三操作层包括所述至少一个第一操作和所述至少一个第二操作。
  21. 根据权利要求17至20任一项所述的装置,其特征在于,所述装置还包括收发单元;
    所述收发单元,用于接收来自服务器的所述第二低比特神经网络模型;
    或者,
    所述处理单元还用于:
    从本地获取所述第二低比特神经网络模型。
  22. 一种神经网络模型处理系统,其特征在于,包括服务器和终端设备;
    所述服务器训练得到第一低比特神经网络模型,所述第一低比特神经网络模型包括至少两个操作层,所述至少两个操作层包括第一操作层和第二操作层,所述至少两个操作层中的每个操作层包括至少一个操作,用于所述至少一个操作的参数和/或数据的值用N比特表示,N为小于8的正整数;
    所述服务器将所述第一低比特神经网络模型进行压缩,得到第二低比特神经网络模型,所述第二低比特神经网络模型包括至少一个操作层,所述至少一个操作层包括第三操作层,所述第三操作层等效于所述第一操作层和所述第二操作层,所述至少一个操作层中除所述第三操作层之外的操作层与所述至少两个操作层中除所述第一操作层和所述第二操作层之外的操作层相同;
    所述服务器向终端设备发送所述第二低比特神经网络模型;
    所述终端设备使用所述第二低比特神经网络模型更新本地存储的所述第一低比特神经网络模型。
  23. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机指令,当所述指令在计算机上运行时,使得计算机执行如权利要求1-11任一所述的方法。
  24. 一种计算机程序产品,其特征在于,所述计算机程序产品在被计算机调用时,使得计算机执行如权利要求1-11任一所述的方法。
  25. 一种芯片,其特征在于,所述芯片与存储器耦合,用于读取并执行所述存储器中存储的程序指令,以实现如权利要求1-11任一项所述的方法。
PCT/CN2019/076374 2019-02-27 2019-02-27 一种神经网络模型处理方法及装置 WO2020172829A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US17/434,563 US20220121936A1 (en) 2019-02-27 2019-02-27 Neural Network Model Processing Method and Apparatus
PCT/CN2019/076374 WO2020172829A1 (zh) 2019-02-27 2019-02-27 一种神经网络模型处理方法及装置
EP19917294.1A EP3907662A4 (en) 2019-02-27 2019-02-27 METHOD AND DEVICE FOR PROCESSING A NEURAL NETWORK MODEL
CN201980031862.1A CN112189205A (zh) 2019-02-27 2019-02-27 一种神经网络模型处理方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/076374 WO2020172829A1 (zh) 2019-02-27 2019-02-27 一种神经网络模型处理方法及装置

Publications (1)

Publication Number Publication Date
WO2020172829A1 true WO2020172829A1 (zh) 2020-09-03

Family

ID=72238781

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/076374 WO2020172829A1 (zh) 2019-02-27 2019-02-27 一种神经网络模型处理方法及装置

Country Status (4)

Country Link
US (1) US20220121936A1 (zh)
EP (1) EP3907662A4 (zh)
CN (1) CN112189205A (zh)
WO (1) WO2020172829A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113055017A (zh) * 2019-12-28 2021-06-29 华为技术有限公司 数据压缩方法及计算设备
US20220067102A1 (en) * 2020-09-03 2022-03-03 International Business Machines Corporation Reasoning based natural language interpretation

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106796668A (zh) * 2016-03-16 2017-05-31 香港应用科技研究院有限公司 用于人工神经网络中比特深度减少的方法和系统
CN107395211A (zh) * 2017-09-12 2017-11-24 郑州云海信息技术有限公司 一种基于卷积神经网络模型的数据处理方法及装置
US9941900B1 (en) * 2017-10-03 2018-04-10 Dropbox, Inc. Techniques for general-purpose lossless data compression using a recurrent neural network
CN108470213A (zh) * 2017-04-20 2018-08-31 腾讯科技(深圳)有限公司 深度神经网络配置方法和深度神经网络配置装置
CN109389212A (zh) * 2018-12-30 2019-02-26 南京大学 一种面向低位宽卷积神经网络的可重构激活量化池化系统
CN109389214A (zh) * 2017-08-11 2019-02-26 谷歌有限责任公司 具有驻留在芯片上的参数的神经网络加速器

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107545889B (zh) * 2016-06-23 2020-10-23 华为终端有限公司 适用于模式识别的模型的优化方法、装置及终端设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106796668A (zh) * 2016-03-16 2017-05-31 香港应用科技研究院有限公司 用于人工神经网络中比特深度减少的方法和系统
CN108470213A (zh) * 2017-04-20 2018-08-31 腾讯科技(深圳)有限公司 深度神经网络配置方法和深度神经网络配置装置
CN109389214A (zh) * 2017-08-11 2019-02-26 谷歌有限责任公司 具有驻留在芯片上的参数的神经网络加速器
CN107395211A (zh) * 2017-09-12 2017-11-24 郑州云海信息技术有限公司 一种基于卷积神经网络模型的数据处理方法及装置
US9941900B1 (en) * 2017-10-03 2018-04-10 Dropbox, Inc. Techniques for general-purpose lossless data compression using a recurrent neural network
CN109389212A (zh) * 2018-12-30 2019-02-26 南京大学 一种面向低位宽卷积神经网络的可重构激活量化池化系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3907662A4 *

Also Published As

Publication number Publication date
CN112189205A (zh) 2021-01-05
US20220121936A1 (en) 2022-04-21
EP3907662A4 (en) 2022-01-19
EP3907662A1 (en) 2021-11-10

Similar Documents

Publication Publication Date Title
CN106104673B (zh) 深度神经网络的低资源占用适配和个性化
CN111401406B (zh) 一种神经网络训练方法、视频帧处理方法以及相关设备
CN111428520B (zh) 一种文本翻译方法及装置
CN108319599A (zh) 一种人机对话的方法和装置
CN111428645B (zh) 人体关键点的检测方法、装置、电子设备及存储介质
WO2020172829A1 (zh) 一种神经网络模型处理方法及装置
CN113268572A (zh) 问答方法及装置
CN110309339B (zh) 图片标签生成方法及装置、终端及存储介质
CN112434188A (zh) 一种异构数据库的数据集成方法、装置及存储介质
CN117540205A (zh) 模型训练方法、相关装置及存储介质
CN112035671A (zh) 状态检测方法、装置、计算机设备及存储介质
CN104412262A (zh) 用于提供基于任务的服务推荐的方法和装置
CN116204672A (zh) 图像识别、模型训练方法、装置、设备及存储介质
CN110490295B (zh) 一种数据处理方法及处理装置
CN110019648B (zh) 一种训练数据的方法、装置及存储介质
CN111310461B (zh) 事件元素提取方法、装置、设备及存储介质
CN110866114B (zh) 对象行为的识别方法、装置及终端设备
CN108595172A (zh) 一种提高游戏代码复用性的方法、终端装置及存储介质
CN113111008B (zh) 测试用例生成方法及装置
US11556768B2 (en) Optimization of sparsified neural network layers for semi-digital crossbar architectures
CN114360528A (zh) 语音识别方法、装置、计算机设备及存储介质
CN112750427B (zh) 一种图像处理方法、装置及存储介质
CN115525554B (zh) 模型的自动化测试方法、系统及存储介质
CN115982336B (zh) 动态对话状态图学习方法、装置、系统及存储介质
CN112925963B (zh) 数据推荐方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19917294

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019917294

Country of ref document: EP

Effective date: 20210806

NENP Non-entry into the national phase

Ref country code: DE