WO2020057000A1 - Network quantization method, service processing method and related products - Google Patents

Network quantization method, service processing method and related products Download PDF

Info

Publication number
WO2020057000A1
WO2020057000A1 PCT/CN2018/124834 CN2018124834W WO2020057000A1 WO 2020057000 A1 WO2020057000 A1 WO 2020057000A1 CN 2018124834 W CN2018124834 W CN 2018124834W WO 2020057000 A1 WO2020057000 A1 WO 2020057000A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
network
branch
convolutional network
deep neural
Prior art date
Application number
PCT/CN2018/124834
Other languages
French (fr)
Chinese (zh)
Inventor
周争光
王孝宇
吕旭涛
黄轩
Original Assignee
深圳云天励飞技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术有限公司 filed Critical 深圳云天励飞技术有限公司
Publication of WO2020057000A1 publication Critical patent/WO2020057000A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present invention relates to the field of machine learning, and particularly to the field of deep neural networks, and in particular, to a network quantization method, a service processing method based on a deep neural network, a network quantization device, a service processing device based on a deep neural network, and A network device.
  • Deep neural network has achieved remarkable results in many computer vision and natural language processing tasks.
  • DNN has been increasingly applied to mobile phones and embedded devices.
  • high-performance DNNs require a large amount of storage space and calculations, which makes running DNNs on mobile devices more challenging. Therefore, more and more network compression and acceleration methods have been proposed to reduce the storage space of the network and improve the speed of the network without significantly reducing the performance of the DNN.
  • network quantization is an effective compression method, which uses a small number of bits to represent each weight value or activation value of each layer in the network, so that it can be efficiently used in the CPU (Central Processing Unit, Central Processing Unit), GPU (Graphics, Processing Unit) and FPGA (Field-Programmable Gate Array, field programmable gate array).
  • CPU Central Processing Unit
  • GPU Graphics, Processing Unit
  • FPGA Field-Programmable Gate Array, field programmable gate array
  • the embodiments of the present invention provide a network quantization method, a deep neural network-based service processing method, and related products, which can make the deep neural network less volatile during network quantization training, and the trained network has higher accuracy and is beneficial to the business Smooth execution of the process.
  • an embodiment of the present invention provides a network quantization method, including:
  • the original deep neural network including a multi-layer convolutional network, each layer of the convolutional network including a first branch and a second branch, the second branch being a full-precision convolution structure;
  • an embodiment of the present invention provides a service processing method based on a deep neural network, including:
  • the service request carrying a business object to be processed;
  • the service request includes any of the following: an image processing request, a face recognition request, a visual processing request, and a natural language recognition processing request;
  • an embodiment of the present invention provides a network quantization apparatus, including:
  • An obtaining unit for obtaining an original deep neural network to be quantified includes a multi-layer convolutional network, and each layer of the convolutional network includes a first branch and a second branch, and the second branch is a full branch Precision convolution structure;
  • a quantization unit is configured to perform quantization training on each layer of the convolutional network, and perform attenuation processing on each layer of the convolutional network according to a scaling factor; wherein the scaling factor decreases as the number of training steps of the quantization training increases. small;
  • a processing unit configured to remove the second branch in each layer of the convolutional network in the original deep neural network when the quantization training is completed in all layers and the scaling factor is reduced to zero Processing to get the quantified target deep neural network.
  • an embodiment of the present invention provides a service processing apparatus based on a deep neural network, including:
  • a request receiving unit configured to receive a service request, the service request carrying a business object to be processed;
  • the service request includes any of the following: an image processing request, a face recognition request, a visual processing request, and a natural language recognition processing request;
  • a business processing unit is configured to call a target deep neural network to process the business object to obtain a business processing result.
  • the target deep neural network is obtained by using the network quantification method in the above aspect:
  • a result output unit configured to output the service processing result.
  • an embodiment of the present invention provides a network device, including:
  • a processor adapted to implement one or more instructions
  • a computer storage medium storing one or more first instructions, where the one or more first instructions are adapted to be loaded by the processor and execute the following network quantization method:
  • the original deep neural network including a multi-layer convolution network, and each layer of the convolution network includes a first branch and a second branch, and the second branch is a full-precision convolution structure;
  • the computer storage medium stores one or more second instructions, and the one or more second instructions are suitable for being loaded by the processor and executing a business processing method based on a deep neural network as follows:
  • the service request carrying a business object to be processed;
  • the service request includes any of the following: an image processing request, a face recognition request, a visual processing request, and a natural language recognition processing request;
  • the original deep neural network includes a multi-layer convolutional network, and each layer of the convolutional network includes a first branch and a second branch, and the second branch is a full-precision convolution structure;
  • Each layer of the convolutional network is subjected to quantization training, and each layer of the convolutional network is subjected to attenuation processing according to a scaling factor; wherein the scaling factor decreases as the number of training steps of the quantization training increases;
  • the second branch in each layer of the convolutional network in the original deep neural network is removed to obtain a quantized target deep neural network.
  • the second branch of the full-precision convolution structure is set in the convolutional network of each layer of the original deep neural network, which makes the output of the convolutional network of each layer have a stronger expression ability;
  • Each layer of the convolutional network is subjected to quantized training and attenuation processing, which can make the fluctuation of the network during quantized training smaller, the trained network has higher accuracy, and the obtained target deep neural network has better network performance.
  • a service request may be received, and the service request carries a business object to be processed; the service request includes any of the following: an image processing request, a face recognition request, a vision A processing request and a natural language recognition processing request; calling a target deep neural network to process the business object to obtain a business processing result; wherein the target deep neural network is obtained using the network quantization method; and outputting the business processing Results;
  • the target deep neural network used for business processing is obtained through the network quantization method, the target deep neural network has better network performance and higher accuracy, which can effectively improve services. Processing efficiency and quality of business processing.
  • FIG. 1 is a flowchart of a network quantization method according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of an initial deep neural network according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of an original deep neural network according to an embodiment of the present invention.
  • FIG. 4 is another schematic structural diagram of an original deep neural network according to an embodiment of the present invention.
  • step s102 shown in FIG. 1;
  • step s102 shown in FIG. 1;
  • FIG. 7 is a flowchart of a deep neural network-based service processing method according to an embodiment of the present invention.
  • 8a-8c are application scenario diagrams of a deep neural network-based service processing method according to an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a network quantization apparatus according to an embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of a deep neural network-based service processing apparatus according to an embodiment of the present invention.
  • FIG. 11 is a schematic structural diagram of a network device according to an embodiment of the present invention.
  • An embodiment of the present invention proposes a network quantization method. Referring to FIG. 1, the method specifically includes the following steps:
  • the original deep neural network includes a multi-layer convolution network.
  • Each layer of the convolution network includes a first branch and a second branch.
  • the second branch is a full-precision convolution. structure.
  • the original deep neural network is obtained by adding branches to the initial deep neural network.
  • the initial deep neural network is currently a common deep neural network.
  • the initial deep neural network includes a multi-layer convolutional network, and each layer of the convolutional network has a single structure.
  • the initial neural network includes L-layer convolution. Network, L is a positive integer, and any layer of the convolutional network is represented as a layer l convolutional network, l is a positive integer, and 1 ⁇ l ⁇ L, the structure of the layer l convolutional network of the initial deep neural network may be See Figure 2.
  • network parameters including weights and activation values
  • network parameters with only a single structure are quantized to a limited number of values, for example: if an 8-bit fixed-point number is used as required To quantify the weight and activation value of the initial deep neural network, then the weight and activation value are quantized to a limited number of values determined by the 8-bit fixed-point number; for another example: if a 2-bit fixed-point number is used to initialize the initial The weight and activation value of the deep neural network are quantified, and then the weight and activation value are quantized to a limited number of values determined by a 2-bit fixed point number.
  • the initial deep neural network has only a single structure, and the network parameters of the single structure are quantified to a limited number of values, which makes the output expression ability of the initial deep neural network weak, increases the difficulty of quantization training, and makes the quantization training process The fluctuations are very large.
  • the embodiment of the present invention improves the initial deep neural network, and determines the original single structure of each layer of the convolutional network in the initial deep neural network as the first branch. On this basis, a second branch is added to form the original deep neural network.
  • the original deep neural network includes a multi-layer convolutional network, and each layer of the convolutional network includes a first branch and a second branch.
  • the original deep neural network includes an L-layer convolutional network, where L is a positive integer, where The convolutional network of any layer is represented as the convolutional network of the first layer, where l is a positive integer, and 1 ⁇ l ⁇ L.
  • the first branch may be a convolution structure of fixed-point quantization accuracy.
  • the so-called fixed-point quantization accuracy refers to a precision range determined by the number of fixed-points. The determined accuracy range; if a 2-bit fixed-point number is used for network quantization, the fixed-point quantization accuracy is the accuracy range determined by the 2-bit fixed-point number.
  • the first branch in the layer l convolution network includes a weighting unit, a first convolution unit, a first activation unit, and an activation value quantization unit; wherein the weighting unit uses a weighting function to perform weighting on the first layer Quantization training; the first convolution unit is configured to perform a convolution operation on the quantized weight of the first layer and the output value of the first layer to the first layer; the first activation unit is configured to perform a convolution operation on the first convolution unit The output value is processed non-linearly; the activation value quantization unit uses the activation value quantization function to perform quantization training on the convolution network.
  • the second branch is a full-precision convolution structure.
  • the so-called full-precision refers to the precision range determined by floating-point numbers. For example, if a 32-bit floating-point number is used as required, the full precision refers to the accuracy determined by a 32-bit floating-point number. range.
  • the second branch in the layer l convolution network includes a second convolution unit, a second activation unit, and a scaling unit; wherein the second convolution unit is used to combine the weight of the layer l with the weight of the layer 1-1
  • the output value is subjected to a convolution operation; the second activation unit is configured to perform non-linear processing on the output value of the second convolution unit; and the scaling unit is configured to perform attenuation processing on the second branch.
  • the first branch and the second branch of each layer of the convolutional network in the original deep neural network may be combined by first adding and then quantizing, as shown in FIG. 3;
  • the first branch of each layer of the convolutional network in the original deep neural network may be combined by first quantizing and then adding the second branch. For details, see FIG. 4.
  • S102 Perform quantitative training on each layer of the convolutional network, and perform attenuation processing on each layer of the convolutional network according to a scaling factor; wherein the scaling factor decreases as the number of training steps of the quantization training increases.
  • the process of quantizing training a certain convolutional network includes a process of quantizing the weight of the convolutional network by using a weighting function, and / or a process of quantizing an activation value of the convolutional network by using an activation value quantization function. Since the convolutional network of each layer of the original deep neural network in this embodiment includes two branches, in step s102, the weights and activation values of the first branch of each layer of the convolutional network are quantified and trained, The weights and activation values of the second branch are not subjected to quantization training, and only attenuation processing is performed according to the scaling factor.
  • step s102 may use the following steps a1-a2 to obtain the scaling factor:
  • the quantization training starts from the input layer of the original deep neural network and iterates successively layer by layer.
  • Each layer is subjected to quantitative training according to the method described in step s102, that is, the first branch of each layer is subjected to quantitative training, and Attenuation processing is performed on the second branch of each layer according to the scaling factor until all layers of the original deep neural network have completed the quantization training and the scaling factor of the second branch is reduced to zero.
  • the scaling factor is reduced to zero, the second branch has no effect on the entire network.
  • the second branch of each layer can be removed, specifically, each layer is rolled up.
  • each layer of the convolutional network only includes the first branch.
  • the convolutional network of each layer after the two branches constitutes a quantified target deep neural network.
  • the original deep neural network includes a multi-layer convolutional network, and each layer of the convolutional network includes a first branch and a second branch, and the second branch is a full-precision convolution structure;
  • Each layer of the convolutional network is subjected to quantization training, and each layer of the convolutional network is subjected to attenuation processing according to a scaling factor; wherein the scaling factor decreases as the number of training steps of the quantization training increases;
  • the second branch in each layer of the convolutional network in the original deep neural network is removed to obtain a quantized target deep neural network.
  • the second branch of the full-precision convolution structure is set in the convolutional network of each layer of the original deep neural network, which makes the output of the convolutional network of each layer have a stronger expression ability;
  • Each layer of the convolutional network is subjected to quantized training and attenuation processing, which can make the fluctuation of the network during quantized training smaller, the trained network has higher accuracy, and the obtained target deep neural network has better network performance.
  • step s102 may specifically include the following step s501 -s508:
  • Quantitative training is performed on the input parameters of the first layer of the convolutional network in the first branch of the first layer of the convolutional network by using a weighted function to obtain the first of the first branch in the first layer of the convolutional network.
  • step s502 specifically includes the following steps:
  • the first convolution unit receives the weighted value of the first branch of the first layer of the convolutional network and the output value of the l-1 layer, and convolves the two values to obtain the first convolution of the first branch. value;
  • the first activation unit performs a non-linear process on the first convolution value of the first branch by using a non-linear function to obtain a first output parameter of the first branch.
  • the fixed-point quantization method uses the low-precision quantization model to perform quantitative training on the weights of the first branch of the first-level convolutional network to obtain the first branch of the first-level convolutional network. Weighted value.
  • W l 1 represents the weight of the first branch of the first-level convolutional network
  • Q w represents the quantization function of the weight by the fixed-point quantization method.
  • the first convolution unit receives the weighted value of the first branch of the first-level convolution network and the output value of the first-first layer, and the calculation formula of the convolution of the two values is as follows;
  • step b3 the first activation unit performs a non-linear processing on the first convolution value of the first branch by using a non-linear function, and a calculation formula is as follows:
  • s503 transmitting the input parameters of the first layer of the convolutional network to the second branch in the first layer of the convolutional network for training to obtain the original output parameters of the second branch in the first layer of the convolutional network;
  • step s503 specifically includes:
  • the second convolution unit receives the weight of the second branch of the l-th layer of the convolutional network and the output value of the l-1 layer, and convolves the two values to obtain the second convolution value of the second branch;
  • the second activation unit performs a non-linear processing on the second convolution value by using a non-linear function to obtain an original output parameter of a second branch of the first-layer convolution network.
  • the weight of the second branch of the first-layer convolutional network is expressed by a full-precision model.
  • the second activation unit uses a non-linear function (such as a ReLu function) to perform non-linear processing on the second convolution value.
  • the second convolution unit receives the weight of the second branch of the l-th layer of the convolutional network and the output value of the l-1 layer, and the calculation formula of the convolution of the two values is as follows:
  • W l 2 represents the weight of the second branch of the l-th layer convolution network
  • step c2 the second activation unit performs nonlinear processing on the second convolution value by using a non-linear function, and a calculation formula is as follows:
  • s504 Obtain a corresponding scaling factor according to the number of training steps during the quantized training of the first-layer convolutional network
  • S505 Attenuate the original output parameter of the second branch in the first-layer convolutional network by using the obtained scaling factor to obtain the second output parameter of the second branch in the first-layer convolutional network;
  • step s505 the obtained scaling factor is used to attenuate the original output parameters of the second branch in the first-layer convolutional network, and the calculation formula is as follows:
  • factor represents the obtained scaling factor
  • S506 Sum the first output parameter and the second output parameter to obtain an intermediate parameter.
  • step s506 the first output parameter and the second output parameter are summed to obtain an intermediate parameter, and the calculation formula is as follows:
  • a l represents an intermediate parameter.
  • S507 Quantize training the intermediate parameters by using an activation value quantization function to obtain output parameters of the first-layer convolution network
  • step s507 quantization training is performed on the intermediate parameter by using an activation value quantization function, and a calculation formula is as follows:
  • the original deep neural network includes a multi-layer convolutional network, and each layer of the convolutional network includes a first branch and a second branch, and the second branch is a full-precision convolution structure;
  • Each layer of the convolutional network is subjected to quantization training, and each layer of the convolutional network is subjected to attenuation processing according to the scaling factor until all layers of the convolutional network have completed quantization training and the scaling factor is reduced to zero; wherein, the The scaling factor decreases as the number of training steps of the quantized training increases.
  • step s102 specifically includes the following steps s601-s607. :
  • S601 Obtain input parameters of a layer l convolutional network, where the input parameters include weights and activation values;
  • Quantitative training is performed on the input parameters of the first layer of the convolutional network in the first branch of the first layer of the convolutional network by using a weighting function and an activation value quantization function to obtain the first layer of the first layer of the convolutional network.
  • step s602 specifically includes the following steps:
  • the first convolution unit receives the weighted value of the first branch of the convolutional network of the first layer and the output value of the first to first layer, and convolves the two values to obtain the first volume of the first branch.
  • the first activation unit performs a nonlinear process on the first convolution value of the first branch by using a non-linear function to obtain an activation value of the first branch;
  • step d1 a fixed-point quantization method is used to obtain the weighted value of the first branch of the first layer of the convolution network, and the calculation formula is as follows:
  • W l 1 represents the weight of the first branch of the first-level convolutional network
  • Q w represents the quantization function of the weight by the fixed-point quantization method.
  • the first convolution unit receives the weighted value of the first branch of the first-layer convolutional network and the output value of the l-1 layer, and the calculation formula of the convolution of the two values is as follows:
  • step d3 the first activation unit performs a non-linear processing on the first convolution value of the first branch by using a non-linear function, and a calculation formula is as follows:
  • step d4 the activation value quantization function is used to perform quantitative training on the activation value of the first branch, and the calculation formula is as follows:
  • Q a represents the quantization function of the activation value by the fixed-point quantization method.
  • step s603 specifically includes:
  • the second convolution unit receives the weight of the second branch of the first layer of the convolutional network and the output value of the first layer -1, and convolves the two values to obtain the second convolution value;
  • the second activation unit performs a non-linear processing on the second convolution value by using a non-linear function to obtain an original output parameter of a second branch of the first-layer convolution network.
  • step e1 the second convolution unit receives the weight of the second branch of the l-th layer convolution network and the output value of the l-1 layer, and the calculation formula of the convolution of the two values is as follows:
  • W l 2 represents the weight of the second branch of the l-th layer convolution network
  • step e2 the second activation unit performs nonlinear processing on the second convolution value by using a non-linear function, and a calculation formula is as follows:
  • S605 Attenuate the original output parameter of the second branch in the first-layer convolutional network by using the obtained scaling factor to obtain the second output parameter of the second branch in the first-layer convolutional network;
  • step s605 the obtained scaling factor is used to attenuate the original output parameters of the second branch in the first-layer convolutional network, and the calculation formula is as follows:
  • factor represents the obtained scaling factor
  • S606 Sum the first output parameter and the second output parameter to obtain an output parameter of the first-layer convolution network
  • step s606 the first output parameter and the second output parameter are summed up, and the calculation formula is as follows:
  • the original deep neural network includes a multi-layer convolutional network, and each layer of the convolutional network includes a first branch and a second branch, and the second branch is a full-precision convolution structure;
  • Each layer of the convolutional network is subjected to quantization training, and each layer of the convolutional network is subjected to attenuation processing according to the scaling factor until all layers of the convolutional network have completed quantization training and the scaling factor is reduced to zero; wherein, the The scaling factor decreases as the number of training steps of the quantized training increases.
  • the second branch of the full-precision convolution structure is set in the convolutional network of each layer of the original deep neural network, which makes the output of the convolutional network of each layer have a stronger expression ability;
  • Each layer of the convolutional network is subjected to quantized training and attenuation processing, so that the fluctuations in the quantized training of the network are smaller, and the trained network has higher accuracy.
  • the obtained target deep neural network can converge to a better local best advantage. Better network performance.
  • an embodiment of the present invention proposes a service processing method based on a deep neural network. Referring to FIG. 7, the method specifically includes the following steps:
  • S701 Receive a service request, where the service request carries a business object to be processed; the service request includes any of the following: an image processing request, a face recognition request, a visual processing request, and a natural language recognition processing request. It can be understood that the services to be processed may include, but are not limited to, image processing services, face recognition services, visual processing services, and natural language recognition services.
  • S702 Invoke a target deep neural network to process the business object to obtain a business processing result.
  • the target deep neural network is obtained by using the network quantization method.
  • the target deep neural network may be obtained by using the network quantization method shown in FIG. 1 to FIG. 6.
  • the target deep neural network may be set in a network device.
  • the network device may include, but is not limited to, terminal devices, embedded devices, and networks. Server and so on.
  • the terminal device may include, but is not limited to, a smart phone, a tablet computer, and a mobile wearable device;
  • the embedded device may include, but is not limited to, a DSP (Digital Signal Processing) chip device.
  • the network device invokes the target deep neural network, and transmits the to-be-processed business objects (such as images, face images, etc.) carried by the service request as input parameters to the target deep neural network for corresponding processing. Business processing to get business processing results.
  • the business processing result corresponds to the business object one-to-one; for example, if the business object is image processing, the corresponding business processing result may include, but is not limited to, image blurring, sharpening, and edge detection. For another example, if the business object is face recognition, the corresponding business processing result is matching face images, searching for associated identity information, and the like.
  • a service request may be received, and the service request carries a business object to be processed; the service request includes any of the following: an image processing request, a face recognition request, a vision A processing request and a natural language recognition processing request; calling a target deep neural network to process the business object to obtain a business processing result; wherein the target deep neural network is obtained using the network quantization method; and outputting the business processing Results;
  • the target deep neural network used for business processing is obtained through the network quantization method, the target deep neural network has better network performance and higher accuracy, which can effectively improve services. Processing efficiency and quality of business processing.
  • an embodiment of the present invention provides an application scenario of a deep neural network-based service processing method.
  • a deep neural network-based service processing method Please refer to FIG. 8a to FIG. 8c, taking a face recognition service as an example, and setting a target deep neural network.
  • the network is set in a face recognition APP (Application) in a mobile phone.
  • the processing steps of the face recognition service are as follows: 1 The user uses a mobile phone with a camera app (Application, application) and a face recognition app installed, opens the camera app, and clicks to take a photo, as shown in FIG. 8a.
  • the face recognition APP calls the target deep neural network to perform face recognition processing on the face photo taken by the user, as shown in FIG. 8b; after the processing is completed, output Face recognition results are shown in Figure 8c.
  • a face recognition request may be received, and the face recognition request carries a face image to be processed; the target deep neural network is called to perform recognition processing on the face image to obtain Face recognition results; wherein the target deep neural network is obtained by using the network quantization method; outputting the face recognition results; implementation of the embodiment of the present invention due to the target deep neural network used for face recognition processing Obtained through the network quantization method, the target deep neural network has better network performance and higher accuracy, which can effectively improve the efficiency of face recognition processing and ensure the accuracy of face recognition processing.
  • an embodiment of the present invention provides a network quantization apparatus.
  • the apparatus may be a computer program running on a network device, and may be applied to the above-mentioned FIGS. 1, 5, and 6.
  • the illustrated network quantization method is used to perform the corresponding steps in the network quantization method.
  • the device may include:
  • the obtaining unit 101 is configured to obtain an original deep neural network to be quantified.
  • the original deep neural network includes a multi-layer convolutional network.
  • Each layer of the convolutional network includes a first branch and a second branch.
  • the second branch is Full precision convolution structure;
  • a quantization unit 102 is configured to perform quantization training on each layer of the convolutional network, and perform attenuation processing on each layer of the convolutional network according to a scaling factor; wherein the scaling factor increases as the number of training steps of the quantization training increases. Decrease
  • a processing unit 103 configured to move the second branch in each layer of the convolutional network in the original deep neural network when the quantization training is completed in all layers and the scaling factor is reduced to zero; Divide the processing to get the quantified target deep neural network.
  • the obtaining unit 101 is specifically configured to:
  • each layer of the convolutional network includes a first branch, and the first branch is a fixed-point quantization accuracy convolution structure;
  • a second branch is set for each layer of the initial deep neural network to obtain the original deep neural network.
  • the quantization unit 102 is specifically configured to:
  • a weighted function is used to quantitatively train the input parameters of the first layer of the convolutional network in the first branch in the first layer of the convolutional network to obtain the first output parameters of the first branch in the first layer of the convolutional network.
  • the output parameters of the layer l convolutional network are determined as the input parameters of the layer l + 1 layer convolutional network, and the above steps are repeated until all the layer convolutional networks have completed quantization training and the scaling factor is reduced to zero.
  • the quantization unit 102 is specifically configured to:
  • the weighted function and the activation value quantization function are used to quantify and train the input parameters of the first-layer convolutional network in the first branch of the first-layer convolutional network to obtain the first branch in the first-layer convolutional network First output parameter;
  • the output parameters of the layer l convolutional network are determined as the input parameters of the layer l + 1 layer convolutional network, and the above steps are repeated until all the layer convolutional networks have completed quantization training and the scaling factor is reduced to zero.
  • the quantization unit 102 is specifically configured to:
  • the cosine attenuation function is called to calculate the scaling factor corresponding to the training steps in the first layer of the convolutional network for quantized training.
  • the original deep neural network includes a multi-layer convolutional network, and each layer of the convolutional network includes a first branch and a second branch, and the second branch is a full-precision convolution structure;
  • Each layer of the convolutional network is subjected to quantization training, and each layer of the convolutional network is subjected to attenuation processing according to a scaling factor; wherein the scaling factor decreases as the number of training steps of the quantization training increases;
  • the second branch in each layer of the convolutional network in the original deep neural network is removed to obtain a quantized target deep neural network.
  • the second branch of the full-precision convolution structure is set in the convolutional network of each layer of the original deep neural network, which makes the output of the convolutional network of each layer have a stronger expression ability;
  • Each layer of the convolutional network is subjected to quantized training and attenuation processing, which can make the fluctuation of the network during quantized training smaller, the trained network has higher accuracy, and the obtained target deep neural network has better network performance.
  • an embodiment of the present invention provides a deep neural network-based service processing apparatus.
  • the apparatus may be a computer program running on a network device, and may be applied to In the above-mentioned deep neural network-based service processing method shown in FIG. 7, the method is used to execute corresponding steps in the deep neural network-based service processing method.
  • the device may include:
  • the request receiving unit 201 is configured to receive a service request, where the service request carries a business object to be processed; the service request includes any of the following: an image processing request, a face recognition request, a visual processing request, and a natural language recognition processing request ;
  • a business processing unit 202 is configured to call a target deep neural network to process the business object to obtain a service processing result; wherein the target deep neural network is obtained by using the network quantization method;
  • the result output unit 203 is configured to output the service processing result.
  • a service request may be received, and the service request carries a business object to be processed; the service request includes any of the following: an image processing request, a face recognition request, a vision A processing request and a natural language recognition processing request; calling a target deep neural network to process the business object to obtain a business processing result; wherein the target deep neural network is obtained using the network quantization method; and outputting the business processing Results;
  • the target deep neural network used for business processing is obtained through the network quantization method, the target deep neural network has better network performance and higher accuracy, which can effectively improve services. Processing efficiency and quality of business processing.
  • the embodiment of the present invention further provides a network device that can be applied to the above-mentioned FIG. 1, FIG. 5, and FIG. 6
  • the network quantization method and the deep neural network-based business processing method shown in FIG. 7 are used to perform the corresponding steps in the network quantization method and the deep neural network-based business processing method.
  • the internal structure of the network device may include a processor, a network interface, and a computer storage medium.
  • the processor, the communication interface, and the computer storage medium in the network device may be connected through a bus or other methods.
  • the communication interface is a medium for implementing interaction and information exchange between network equipment and external equipment.
  • a processor or CPU (Central Processing Unit)
  • CPU Central Processing Unit
  • a processor is the computing core and control core of a network device. It is suitable for implementing one or more instructions, and is specifically suitable for loading and executing one or more instructions to achieve the corresponding Method flow or corresponding function;
  • a computer storage medium (Memory) is a memory device in a server and is used to store programs and data. It can be understood that the computer storage medium herein may include both a built-in storage medium of a network device and an extended storage medium supported by the network device.
  • the computer storage medium provides a storage space that stores an operating system of a network device.
  • one or more instructions suitable for being loaded and executed by the processor are stored in the storage space, and these instructions may be one or more computer programs (including program code).
  • the computer storage medium here may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), for example, at least one disk memory; optionally, at least one is located far away from the foregoing processor.
  • Computer storage media may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), for example, at least one disk memory; optionally, at least one is located far away from the foregoing processor.
  • the computer storage medium stores one or more first instructions
  • the processor loads and executes the one or more first instructions stored in the computer storage medium to implement the above-mentioned FIG. 1, FIG. 5 or FIG.
  • the original deep neural network including a multi-layer convolution network, and each layer of the convolution network includes a first branch and a second branch, and the second branch is a full-precision convolution structure;
  • the acquiring the original deep neural network to be quantified includes:
  • each layer of the convolutional network includes a first branch, and the first branch is a fixed-point quantization accuracy convolution structure;
  • a second branch is set for each layer of the initial deep neural network to obtain the original deep neural network.
  • the original deep neural network includes a L-layer convolutional network, where L is a positive integer; any one of the convolutional networks is represented as a l-th layer convolutional network, where l is a positive integer, and 1 ⁇ l ⁇ L;
  • full precision includes floating-point precision and fixed-point quantization precision.
  • performing quantized training on each layer of the convolutional network and performing attenuation processing on each layer of the convolutional network according to a scaling factor includes:
  • a weighted function is used to quantitatively train the input parameters of the first layer of the convolutional network in the first branch in the first layer of the convolutional network to obtain the first output parameters of the first branch in the first layer of the convolutional network.
  • the output parameters of the layer l convolutional network are determined as the input parameters of the layer l + 1 layer convolutional network, and the above steps are repeated until all the layer convolutional networks have completed quantization training and the scaling factor is reduced to zero.
  • performing quantized training on each layer of the convolutional network and performing attenuation processing on each layer of the convolutional network according to a scaling factor includes:
  • the weighted function and the activation value quantization function are used to quantify and train the input parameters of the first-layer convolutional network in the first branch of the first-layer convolutional network to obtain the first branch in the first-layer convolutional network First output parameter;
  • the output parameters of the layer l convolutional network are determined as the input parameters of the layer l + 1 layer convolutional network, and the above steps are repeated until all the layer convolutional networks have completed quantization training and the scaling factor is reduced to zero.
  • obtaining the corresponding scaling factor according to the number of training steps during quantized training according to the first-layer convolutional network includes:
  • the cosine attenuation function is called to calculate the scaling factor corresponding to the training steps in the first layer of the convolutional network for quantized training.
  • the original deep neural network includes a multi-layer convolutional network, and each layer of the convolutional network includes a first branch and a second branch, and the second branch is a full-precision convolution structure;
  • Each layer of the convolutional network is subjected to quantization training, and each layer of the convolutional network is subjected to attenuation processing according to a scaling factor; wherein the scaling factor decreases as the number of training steps of the quantization training increases;
  • the second branch in each layer of the convolutional network in the original deep neural network is removed to obtain a quantized target deep neural network.
  • the second branch of the full-precision convolution structure is set in the convolutional network of each layer of the original deep neural network, which makes the output of the convolutional network of each layer have a stronger expression ability;
  • Each layer of the convolutional network is subjected to quantized training and attenuation processing, which can make the fluctuation of the network during quantized training smaller, the trained network has higher accuracy, and the obtained target deep neural network has better network performance.
  • the computer storage medium stores one or more second instructions
  • the processor loads and executes one or more second instructions stored in the computer storage medium to implement the above-mentioned deep neural-based system shown in FIG. 7.
  • the service request carrying a business object to be processed;
  • the service request includes any of the following: an image processing request, a face recognition request, a visual processing request, and a natural language recognition processing request;
  • a service request may be received, and the service request carries a business object to be processed; the service request includes any of the following: an image processing request, a face recognition request, a vision A processing request and a natural language recognition processing request; calling a target deep neural network to process the business object to obtain a business processing result; wherein the target deep neural network is obtained using the network quantization method; and outputting the business processing Results;
  • the target deep neural network used for business processing is obtained through the network quantization method, the target deep neural network has better network performance and higher accuracy, which can effectively improve services. Processing efficiency and quality of business processing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

Embodiments of the present invention provide a network quantization method, a service processing method and related products. The method comprises obtaining an original deep neural network to be quantified, the original deep neural network comprising multilayer convolutional networks, any layer of the convolutional networks comprising a first branch and a second branch, and the second branch being a full precision convolutional structure; performing quantitative training on each layer of the convolutional networks, and performing attenuation processing on each layer of the convolutional networks according to a scaling factor, wherein the scaling factor decreases as the increase of the number of training steps of the quantitative training; when all layers of the convolutional networks complete the quantitative training and the scaling factor decreases to zero, removing the second branch in each layer of the convolutional networks in the original deep neural network to obtain a quantified target deep neural network. According to the present invention, fluctuations during network fixed point quantitative training can be smaller, and the trained network precision is higher, facilitating smoothly performing a service processing process.

Description

网络量化方法、业务处理方法及相关产品Network quantification method, business processing method and related products
本申请要求于2018年9月19日提交中国专利局,申请号为201811092329.X、发明名称为“网络量化方法、业务处理方法及相关产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed on September 19, 2018 with the Chinese Patent Office under the application number 201811092329.X and the invention name "Network Quantization Method, Business Processing Method and Related Products", the entire contents of which are hereby incorporated by reference Incorporated in this application.
技术领域Technical field
本发明涉及机器学习领域,具体涉及深度神经网络领域,尤其涉及一种网络量化方法、一种基于深度神经网络的业务处理方法、一种网络量化装置、一种基于深度神经网络的业务处理装置及一种网络设备。The present invention relates to the field of machine learning, and particularly to the field of deep neural networks, and in particular, to a network quantization method, a service processing method based on a deep neural network, a network quantization device, a service processing device based on a deep neural network, and A network device.
背景技术Background technique
深度神经网络(Deep Neural Network,DNN)在很多计算机视觉、自然语言处理任务中取得了瞩目的成绩,近些年,DNN则越来越多地被应用到移动手机和嵌入式设备上。然而,高性能的DNN却需要大量的存储空间和计算量等,这就使得移动设备运行DNN更具挑战。因此,越来越多的网络压缩与加速方法被提出,旨在保证DNN性能无明显降低的条件下减少网络的存储空间并提高网络的运行速度。其中,网络量化是一种有效的压缩方法,即用少的比特数来表示网络中的每个权值或者每层的激活值,从而可以高效地在CPU(Central Processing Unit,中央处理器)、GPU(Graphics Processing Unit,图形处理器)和FPGA(Field-Programmable Gate Array,现场可编程门阵列)上计算。但是,目前的网络量化方法往往波动较大,训练出来的网络精度也较低。Deep neural network (DNN) has achieved remarkable results in many computer vision and natural language processing tasks. In recent years, DNN has been increasingly applied to mobile phones and embedded devices. However, high-performance DNNs require a large amount of storage space and calculations, which makes running DNNs on mobile devices more challenging. Therefore, more and more network compression and acceleration methods have been proposed to reduce the storage space of the network and improve the speed of the network without significantly reducing the performance of the DNN. Among them, network quantization is an effective compression method, which uses a small number of bits to represent each weight value or activation value of each layer in the network, so that it can be efficiently used in the CPU (Central Processing Unit, Central Processing Unit), GPU (Graphics, Processing Unit) and FPGA (Field-Programmable Gate Array, field programmable gate array). However, current network quantization methods tend to fluctuate greatly, and the accuracy of trained networks is also low.
发明内容Summary of the Invention
本发明实施例提供了一种网络量化方法、基于深度神经网络的业务处理方法及相关产品,能够使深度神经网络在网络量化训练时的波动更小,训练出来的网络精度更高,有利于业务处理过程的顺利执行。The embodiments of the present invention provide a network quantization method, a deep neural network-based service processing method, and related products, which can make the deep neural network less volatile during network quantization training, and the trained network has higher accuracy and is beneficial to the business Smooth execution of the process.
一方面,本发明实施例提供了一种网络量化方法,包括:In one aspect, an embodiment of the present invention provides a network quantization method, including:
获取待量化的原始深度神经网络,所述原始深度神经网络包括多层卷积网络,每一层卷积网络均包括第一分支和第二分支,所述第二分支为全精度卷积结构;Obtaining an original deep neural network to be quantified, the original deep neural network including a multi-layer convolutional network, each layer of the convolutional network including a first branch and a second branch, the second branch being a full-precision convolution structure;
对每一层卷积网络进行量化训练,并按照缩放因子对每一层卷积网络进行衰减处理;其中,所述缩放因子随所述量化训练的训练步数的增加而减小;Perform quantization training on each layer of the convolutional network, and perform attenuation processing on each layer of the convolutional network according to a scaling factor; wherein the scaling factor decreases as the number of training steps of the quantization training increases;
当所有层卷积网络均完成量化训练且所述缩放因子减小至零时,将所述原始深度神经网络中每一层卷积网络中的所述第二分支进行移除处理,得到量化后的目标深度神经网络。When all layers of the convolutional network have completed quantization training and the scaling factor is reduced to zero, the second branch in each layer of the convolutional network in the original deep neural network is removed, and after quantization is obtained Target deep neural network.
另一方面,本发明实施例提供了一种基于深度神经网络的业务处理方法,包括:In another aspect, an embodiment of the present invention provides a service processing method based on a deep neural network, including:
接收业务请求,所述业务请求携带待处理的业务对象;所述业务请求包括以下任一种:图像处理请求、人脸识别请求、视觉处理请求及自然语言识别处理请求;Receiving a service request, the service request carrying a business object to be processed; the service request includes any of the following: an image processing request, a face recognition request, a visual processing request, and a natural language recognition processing request;
调用目标深度神经网络对所述业务对象进行处理,得到业务处理结果;其中,所述目标深度神经网络是采用上述方面的网络量化方法获得的;Calling a target deep neural network to process the business object to obtain a business processing result; wherein the target deep neural network is obtained by using the network quantization method of the above aspect;
输出所述业务处理结果。Output the business processing result.
再一方面,本发明实施例提供了一种网络量化装置,包括:In another aspect, an embodiment of the present invention provides a network quantization apparatus, including:
获取单元,用于获取待量化的原始深度神经网络;所述原始深度神经网络包括多层卷积网络,任一层卷积网络均包括第一分支和第二分支,所述第二分支为全精度卷积结构;An obtaining unit for obtaining an original deep neural network to be quantified; the original deep neural network includes a multi-layer convolutional network, and each layer of the convolutional network includes a first branch and a second branch, and the second branch is a full branch Precision convolution structure;
量化单元,用于对每一层卷积网络进行量化训练,并按照缩放因子对每一层卷积网络进行衰减处理;其中,所述缩放因子随所述量化训练的训练步数的增加而减小;A quantization unit is configured to perform quantization training on each layer of the convolutional network, and perform attenuation processing on each layer of the convolutional network according to a scaling factor; wherein the scaling factor decreases as the number of training steps of the quantization training increases. small;
处理单元,用于当所有层卷积网络均完成量化训练且所述缩放因子减小至零时,将所述原始深度神经网络中每一层卷积网络中的所述第二分支进行移除处理,得到量化后的目标深度神经网络。A processing unit configured to remove the second branch in each layer of the convolutional network in the original deep neural network when the quantization training is completed in all layers and the scaling factor is reduced to zero Processing to get the quantified target deep neural network.
再一方面,本发明实施例提供了一种基于深度神经网络的业务处理装置,包括:In another aspect, an embodiment of the present invention provides a service processing apparatus based on a deep neural network, including:
请求接收单元,用于接收业务请求,所述业务请求携带待处理的业务对象;所述业务请求包括以下任一种:图像处理请求、人脸识别请求、视觉处理请求及自然语言识别处理请求;A request receiving unit, configured to receive a service request, the service request carrying a business object to be processed; the service request includes any of the following: an image processing request, a face recognition request, a visual processing request, and a natural language recognition processing request;
业务处理单元,用于调用目标深度神经网络对所述业务对象进行处理,得到业务处理结果;其中所述目标深度神经网络是采用上述方面的网络量化方法获得的:A business processing unit is configured to call a target deep neural network to process the business object to obtain a business processing result. The target deep neural network is obtained by using the network quantification method in the above aspect:
结果输出单元,用于输出所述业务处理结果。A result output unit, configured to output the service processing result.
再一方面,本发明实施例提供了一种网络设备,包括:In another aspect, an embodiment of the present invention provides a network device, including:
处理器,适于实现一条或一条以上指令;以及A processor adapted to implement one or more instructions; and
计算机存储介质,所述计算机存储介质存储有一条或一条以上第一指令,所述一条或一条以上第一指令适于由所述处理器加载并执行如下网络量化方法:A computer storage medium storing one or more first instructions, where the one or more first instructions are adapted to be loaded by the processor and execute the following network quantization method:
获取待量化的原始深度神经网络,所述原始深度神经网络包括多层卷积网络,任一层卷积网络均包括第一分支和第二分支,所述第二分支为全精度卷积结构;Obtaining an original deep neural network to be quantified, the original deep neural network including a multi-layer convolution network, and each layer of the convolution network includes a first branch and a second branch, and the second branch is a full-precision convolution structure;
对每一层卷积网络进行量化训练,并按照缩放因子对每一层卷积网络进行衰减处理;其中,所述缩放因子随所述量化训练的训练步数的增加而减小;Perform quantization training on each layer of the convolutional network, and perform attenuation processing on each layer of the convolutional network according to a scaling factor; wherein the scaling factor decreases as the number of training steps of the quantization training increases;
当所有层卷积网络均完成量化训练且所述缩放因子减小至零时,将所述原始深度神经网络中每一层卷积网络中的所述第二分支进行移除处理,得到量化后的目标深度神经网络;When all layers of the convolutional network have completed quantization training and the scaling factor is reduced to zero, the second branch in each layer of the convolutional network in the original deep neural network is removed, and after quantization is obtained Target deep neural network;
或者,所述计算机存储介质存储有一条或一条以上第二指令,所述一条或一条以上第二指令适于由所述处理器加载并执行如下基于深度神经网络的业务处理方法:Alternatively, the computer storage medium stores one or more second instructions, and the one or more second instructions are suitable for being loaded by the processor and executing a business processing method based on a deep neural network as follows:
接收业务请求,所述业务请求携带待处理的业务对象;所述业务请求包括以下任一种:图像处理请求、人脸识别请求、视觉处理请求及自然语言识别处理请求;Receiving a service request, the service request carrying a business object to be processed; the service request includes any of the following: an image processing request, a face recognition request, a visual processing request, and a natural language recognition processing request;
调用目标深度神经网络对所述业务对象进行处理,得到业务处理结果;其中,所述目标深度神经网络是采用上述方面的网络量化方法获得的;Calling a target deep neural network to process the business object to obtain a business processing result; wherein the target deep neural network is obtained by using the network quantization method of the above aspect;
输出所述业务处理结果。Output the business processing result.
本发明实施例的网络量化过程中,原始深度神经网络包括多层卷积网络,每一层卷积网络均包括第一分支和第二分支,所述第二分支为全精度卷积结构;对每一层卷积网络进行量化训练,并按照缩放因子对每一层卷积网络进行衰减处理;其中,所述缩放因子随所述量化训练的训练步数的增加而减小;当所有层卷积网络均完成量化训练且所述缩放因子减小至零 时,将所述原始深度神经网络中每一层卷积网络中的所述第二分支进行移除处理,得到量化后的目标深度神经网络。实施本发明实施例,通过在原始深度神经网络的每一层卷积网络中设置全精度卷积结构的第二分支,这使得每一层卷积网络的输出具备更强的表达能力;再对每一层卷积网络进行量化训练及衰减处理,这样能够使网络量化训练时的波动更小,训练出来的网络精度更高,所得到的目标深度神经网络的网络性能更佳。In the network quantization process of the embodiment of the present invention, the original deep neural network includes a multi-layer convolutional network, and each layer of the convolutional network includes a first branch and a second branch, and the second branch is a full-precision convolution structure; Each layer of the convolutional network is subjected to quantization training, and each layer of the convolutional network is subjected to attenuation processing according to a scaling factor; wherein the scaling factor decreases as the number of training steps of the quantization training increases; When the convolutional network has completed quantization training and the scaling factor is reduced to zero, the second branch in each layer of the convolutional network in the original deep neural network is removed to obtain a quantized target deep neural network. The internet. By implementing the embodiment of the present invention, the second branch of the full-precision convolution structure is set in the convolutional network of each layer of the original deep neural network, which makes the output of the convolutional network of each layer have a stronger expression ability; Each layer of the convolutional network is subjected to quantized training and attenuation processing, which can make the fluctuation of the network during quantized training smaller, the trained network has higher accuracy, and the obtained target deep neural network has better network performance.
本发明实施例基于深度神经网络的业务处理过程中,可接收业务请求,所述业务请求携带待处理的业务对象;所述业务请求包括以下任一种:图像处理请求、人脸识别请求、视觉处理请求及自然语言识别处理请求;调用目标深度神经网络对所述业务对象进行处理,得到业务处理结果;其中,所述目标深度神经网络是采用所述网络量化方法获得的;输出所述业务处理结果;实施本发明实施例,由于用于进行业务处理的目标深度神经网络是通过所述网络量化方法获得的,该目标深度神经网络具备较佳的网络性能且精度较高,这样可有效提高业务处理的效率及保证业务处理的质量。During the service processing process based on the deep neural network in the embodiment of the present invention, a service request may be received, and the service request carries a business object to be processed; the service request includes any of the following: an image processing request, a face recognition request, a vision A processing request and a natural language recognition processing request; calling a target deep neural network to process the business object to obtain a business processing result; wherein the target deep neural network is obtained using the network quantization method; and outputting the business processing Results; In the implementation of the embodiment of the present invention, since the target deep neural network used for business processing is obtained through the network quantization method, the target deep neural network has better network performance and higher accuracy, which can effectively improve services. Processing efficiency and quality of business processing.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly explain the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings in the following description are merely These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without paying creative labor.
图1为本发明实施例提供的一种网络量化方法的流程图;FIG. 1 is a flowchart of a network quantization method according to an embodiment of the present invention; FIG.
图2为本发明实施例提供的初始深度神经网络的结构示意图;2 is a schematic structural diagram of an initial deep neural network according to an embodiment of the present invention;
图3为本发明实施例提供的原始深度神经网络的一种结构示意图;3 is a schematic structural diagram of an original deep neural network according to an embodiment of the present invention;
图4为本发明实施例提供的原始深度神经网络的另一种结构示意图;FIG. 4 is another schematic structural diagram of an original deep neural network according to an embodiment of the present invention; FIG.
图5为图1所示的步骤s102的一种具体实现方式的流程图;5 is a flowchart of a specific implementation manner of step s102 shown in FIG. 1;
图6为图1所示的步骤s102的另一种具体实现方式的流程图;6 is a flowchart of another specific implementation manner of step s102 shown in FIG. 1;
图7为本发明实施例提供的一种基于深度神经网络的业务处理方法的流程图;FIG. 7 is a flowchart of a deep neural network-based service processing method according to an embodiment of the present invention; FIG.
图8a-图8c为本发明实施例提供的一种基于深度神经网络的业务处理方法的应用场景图;8a-8c are application scenario diagrams of a deep neural network-based service processing method according to an embodiment of the present invention;
图9为本发明实施例提供的一种网络量化装置的结构示意图;9 is a schematic structural diagram of a network quantization apparatus according to an embodiment of the present invention;
图10本发明实施例提供的一种基于深度神经网络的业务处理装置的结构示意图;10 is a schematic structural diagram of a deep neural network-based service processing apparatus according to an embodiment of the present invention;
图11为本发明实施例提供的一种网络设备的结构示意图。FIG. 11 is a schematic structural diagram of a network device according to an embodiment of the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In the following, the technical solutions in the embodiments of the present invention will be clearly and completely described with reference to the drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
本发明实施例提出了一种网络量化方法,请参见图1,该方法具体包括以下步骤:An embodiment of the present invention proposes a network quantization method. Referring to FIG. 1, the method specifically includes the following steps:
s101,获取待量化的原始深度神经网络,所述原始深度神经网络包括多层卷积网络,每一层卷积网络均包括第一分支和第二分支,所述第二分支为全精度卷积结构。s101. Obtain an original deep neural network to be quantified. The original deep neural network includes a multi-layer convolution network. Each layer of the convolution network includes a first branch and a second branch. The second branch is a full-precision convolution. structure.
原始深度神经网络是在初始深度神经网络的基础上增加分支而得到的。该初始深度神经网络是目前常见的深度神经网络,该初始深度神经网络包括多层卷积网络,每一层卷积网络具有单一结构;为了便于描述,设所述初始神经网络包括L层卷积网络,L为正整数,其中的任一层卷积网络表示为第l层卷积网络,l为正整数,且1≤l≤L,初始深度神经网络的第l层卷积网络的结构可参见图2。由于初始深度神经网络仅有单一结构,在进行网络量化训练时,仅有单一结构的网络参数(包括权重和激活值)都被量化到有限个数值中,例如:若按照需求采用8位定点数来对初始深度神经网络的权重和激活值进行量化,那么该权重和激活值则被量化到8位定点数所决定的有限个数值中;再如:若按照需求采用2位定点数来对初始深度神经网络的权重和激活值进行量化,那么该权重和激活值则被量化到2位定点数所决定的有限个数值中。可见,初始深度神经网络仅有单一结构,而该单一结构的网络参数都被量化到有限个数值中,这使得初始深度神经网络的输出表达能力较弱,增加量化训练的难度,使得量化训练过程的波动很大。为了减小量化训练过程的波动,减小量化训练的难度,本发明实施例对初始深度神经网络进行改进,将初始深度神经网络中每一层卷积网络的原单一结构确定为第一分支,在此基础上再增加一个第二分支,从而构成所述原始深度神经网络。The original deep neural network is obtained by adding branches to the initial deep neural network. The initial deep neural network is currently a common deep neural network. The initial deep neural network includes a multi-layer convolutional network, and each layer of the convolutional network has a single structure. For convenience of description, it is assumed that the initial neural network includes L-layer convolution. Network, L is a positive integer, and any layer of the convolutional network is represented as a layer l convolutional network, l is a positive integer, and 1≤l≤L, the structure of the layer l convolutional network of the initial deep neural network may be See Figure 2. Because the initial deep neural network has only a single structure, when performing network quantization training, network parameters (including weights and activation values) with only a single structure are quantized to a limited number of values, for example: if an 8-bit fixed-point number is used as required To quantify the weight and activation value of the initial deep neural network, then the weight and activation value are quantized to a limited number of values determined by the 8-bit fixed-point number; for another example: if a 2-bit fixed-point number is used to initialize the initial The weight and activation value of the deep neural network are quantified, and then the weight and activation value are quantized to a limited number of values determined by a 2-bit fixed point number. It can be seen that the initial deep neural network has only a single structure, and the network parameters of the single structure are quantified to a limited number of values, which makes the output expression ability of the initial deep neural network weak, increases the difficulty of quantization training, and makes the quantization training process The fluctuations are very large. In order to reduce the fluctuation of the quantization training process and reduce the difficulty of quantization training, the embodiment of the present invention improves the initial deep neural network, and determines the original single structure of each layer of the convolutional network in the initial deep neural network as the first branch. On this basis, a second branch is added to form the original deep neural network.
原始深度神经网络包括多层卷积网络,每一层卷积网络均包括第一分支和第二分支,具体地:所述原始深度神经网络包括L层卷积网络,L为正整数,其中的任一层卷积网络表示为第l层卷积网络,l为正整数,且1≤l≤L。所述第一分支可以为定点量化精度卷积结构,所谓定点量化精度是指由定点数所决定的精度范围,例如前述若采用8位定点数进行网络量化,则定点量化精度为8位定点数所决定的精度范围;若采用2位定点数进行网络量化,则定点量化精度为2位定点数所决定的精度范围。第l层卷积网络中的第一分支包括权重量化单元、第一卷积单元、第一激活单元及激活值量化单元;其中,所述权重量化单元采用权重量化函数对第l层的权重进行量化训练;所述第一卷积单元用于将量化后的第l层的权重和第l-1层的输出值进行卷积运算;所述第一激活单元用于对第一卷积单元的输出值进行非线性处理;所述激活值量化单元采用激活值量化函数对卷积网络进行量化训练。所述第二分支为全精度卷积结构,所谓全精度是指由浮点数所决定的精度范围,例如若按需采用32位浮点数,则全精度是指由32位浮点数所决定的精度范围。第l层卷积网络中的第二分支包括第二卷积单元,第二激活单元及缩放单元;其中,所述第二卷积单元用于将第l层的权重和第l-1层的输出值进行卷积运算;所述第二激活单元用于对第二卷积单元的输出值进行非线性处理;所述缩放单元用于对所述第二分支进行衰减处理。在一种可行的实施方式中,原始深度神经网络中的每一层卷积网络的第一分支与第二分支可采用先相加再量化的方式进行组合,具体请如图3所示;另一种可行的实施方式中,原始深度神经网络中的每一层卷积网络的第一分支可以采用先量化再与第二分支相加的方式进行组合,具体请参见图4所示。The original deep neural network includes a multi-layer convolutional network, and each layer of the convolutional network includes a first branch and a second branch. Specifically, the original deep neural network includes an L-layer convolutional network, where L is a positive integer, where The convolutional network of any layer is represented as the convolutional network of the first layer, where l is a positive integer, and 1≤l≤L. The first branch may be a convolution structure of fixed-point quantization accuracy. The so-called fixed-point quantization accuracy refers to a precision range determined by the number of fixed-points. The determined accuracy range; if a 2-bit fixed-point number is used for network quantization, the fixed-point quantization accuracy is the accuracy range determined by the 2-bit fixed-point number. The first branch in the layer l convolution network includes a weighting unit, a first convolution unit, a first activation unit, and an activation value quantization unit; wherein the weighting unit uses a weighting function to perform weighting on the first layer Quantization training; the first convolution unit is configured to perform a convolution operation on the quantized weight of the first layer and the output value of the first layer to the first layer; the first activation unit is configured to perform a convolution operation on the first convolution unit The output value is processed non-linearly; the activation value quantization unit uses the activation value quantization function to perform quantization training on the convolution network. The second branch is a full-precision convolution structure. The so-called full-precision refers to the precision range determined by floating-point numbers. For example, if a 32-bit floating-point number is used as required, the full precision refers to the accuracy determined by a 32-bit floating-point number. range. The second branch in the layer l convolution network includes a second convolution unit, a second activation unit, and a scaling unit; wherein the second convolution unit is used to combine the weight of the layer l with the weight of the layer 1-1 The output value is subjected to a convolution operation; the second activation unit is configured to perform non-linear processing on the output value of the second convolution unit; and the scaling unit is configured to perform attenuation processing on the second branch. In a feasible implementation manner, the first branch and the second branch of each layer of the convolutional network in the original deep neural network may be combined by first adding and then quantizing, as shown in FIG. 3; In a feasible implementation manner, the first branch of each layer of the convolutional network in the original deep neural network may be combined by first quantizing and then adding the second branch. For details, see FIG. 4.
s102,对每一层卷积网络进行量化训练,并按照缩放因子对每一层卷积网络进行衰减处理;其中,所述缩放因子随所述量化训练的训练步数的增加而减小。S102: Perform quantitative training on each layer of the convolutional network, and perform attenuation processing on each layer of the convolutional network according to a scaling factor; wherein the scaling factor decreases as the number of training steps of the quantization training increases.
对某一卷积网络进行量化训练的过程包括采用权重量化函数对卷积网络的权重进行量化的过程,和/或采用激活值量化函数对卷积网络的激活值进行量化的过程。由于本实施例的原始深度神经网络的每一层卷积网络包括两个分支,因此在步骤s102中,针对每一层卷积网络的所述第一分支的权重和激活值均进行量化训练,而对所述第二分支的权重和激活值则均 不做量化训练,仅按照缩放因子进行衰减处理。The process of quantizing training a certain convolutional network includes a process of quantizing the weight of the convolutional network by using a weighting function, and / or a process of quantizing an activation value of the convolutional network by using an activation value quantization function. Since the convolutional network of each layer of the original deep neural network in this embodiment includes two branches, in step s102, the weights and activation values of the first branch of each layer of the convolutional network are quantified and trained, The weights and activation values of the second branch are not subjected to quantization training, and only attenuation processing is performed according to the scaling factor.
具体实现中,针对任一层卷积网络,步骤s102可采用以下步骤a1-a2来获取所述缩放因子:In specific implementation, for any layer of the convolutional network, step s102 may use the following steps a1-a2 to obtain the scaling factor:
a1,获取第l层卷积网络进行量化训练时的训练步数;a1, obtaining the number of training steps in the first layer of the convolutional network for quantized training;
a2,调用余弦衰减函数计算第l层卷积网络进行量化训练时的训练步数对应的缩放因子。a2, calling a cosine attenuation function to calculate a scaling factor corresponding to the number of training steps when the convolutional network of the first layer performs quantization training.
s103,当所有层卷积网络均完成量化训练且所述缩放因子减小至零时,将所述原始深度神经网络中每一层卷积网络中的所述第二分支进行移除处理,得到量化后的目标深度神经网络。s103. When all layers of the convolutional network have completed quantization training and the scaling factor is reduced to zero, the second branch in the convolutional network of each layer in the original deep neural network is removed to obtain Quantified target deep neural network.
所述量化训练从所述原始深度神经网络的输入层开始,逐层依次迭代,每一层都按照步骤s102所述的方法进行量化训练,即对每一层的第一分支进行量化训练,而对每一层的第二分支按照缩放因子进行衰减处理,直至所述原始深度神经网络的所有层都完成所述量化训练且所述第二分支的缩放因子减小至零。当所述缩放因子减小至零时,所述第二分支则对整个网络没有作用,那么在网络收敛时则可以将每一层的所述第二分支移除,具体是将每一层卷积网络中的第二分支的全精度卷积结构进行删除(即剪枝图3或图4中虚线所示第二分支),删除之后每一层卷积网络仅包含第一分支,移除第二分支之后的各层卷积网络构成量化后的目标深度神经网络。The quantization training starts from the input layer of the original deep neural network and iterates successively layer by layer. Each layer is subjected to quantitative training according to the method described in step s102, that is, the first branch of each layer is subjected to quantitative training, and Attenuation processing is performed on the second branch of each layer according to the scaling factor until all layers of the original deep neural network have completed the quantization training and the scaling factor of the second branch is reduced to zero. When the scaling factor is reduced to zero, the second branch has no effect on the entire network. When the network converges, the second branch of each layer can be removed, specifically, each layer is rolled up. The full-precision convolution structure of the second branch in the convolutional network is deleted (that is, the second branch shown by the dotted line in Fig. 3 or Fig. 4 is pruned). After the deletion, each layer of the convolutional network only includes the first branch. The convolutional network of each layer after the two branches constitutes a quantified target deep neural network.
本发明实施例的网络量化过程中,原始深度神经网络包括多层卷积网络,每一层卷积网络均包括第一分支和第二分支,所述第二分支为全精度卷积结构;对每一层卷积网络进行量化训练,并按照缩放因子对每一层卷积网络进行衰减处理;其中,所述缩放因子随所述量化训练的训练步数的增加而减小;当所有层卷积网络均完成量化训练且所述缩放因子减小至零时,将所述原始深度神经网络中每一层卷积网络中的所述第二分支进行移除处理,得到量化后的目标深度神经网络。实施本发明实施例,通过在原始深度神经网络的每一层卷积网络中设置全精度卷积结构的第二分支,这使得每一层卷积网络的输出具备更强的表达能力;再对每一层卷积网络进行量化训练及衰减处理,这样能够使网络量化训练时的波动更小,训练出来的网络精度更高,所得到的目标深度神经网络的网络性能更佳。In the network quantization process of the embodiment of the present invention, the original deep neural network includes a multi-layer convolutional network, and each layer of the convolutional network includes a first branch and a second branch, and the second branch is a full-precision convolution structure; Each layer of the convolutional network is subjected to quantization training, and each layer of the convolutional network is subjected to attenuation processing according to a scaling factor; wherein the scaling factor decreases as the number of training steps of the quantization training increases; When the convolutional network has completed quantization training and the scaling factor is reduced to zero, the second branch in each layer of the convolutional network in the original deep neural network is removed to obtain a quantized target deep neural network. The internet. By implementing the embodiment of the present invention, the second branch of the full-precision convolution structure is set in the convolutional network of each layer of the original deep neural network, which makes the output of the convolutional network of each layer have a stronger expression ability; Each layer of the convolutional network is subjected to quantized training and attenuation processing, which can make the fluctuation of the network during quantized training smaller, the trained network has higher accuracy, and the obtained target deep neural network has better network performance.
针对图3所示的原始深度神经网络的结构,本发明实施例提出了上述图1所示的步骤s102的一种具体实施方式的流程图,请参见图5,步骤s102具体可包括以下步骤s501-s508:Aiming at the structure of the original deep neural network shown in FIG. 3, an embodiment of the present invention provides a flowchart of a specific implementation manner of step s102 shown in FIG. 1 described above. Referring to FIG. 5, step s102 may specifically include the following step s501 -s508:
s501,获取第l层卷积网络的输入参数,所述输入参数包括权重和激活值;S501. Obtain input parameters of a layer l convolutional network, where the input parameters include weights and activation values;
s502,采用权重量化函数在第l层卷积网络中的第一分支中对第l层卷积网络的输入参数进行量化训练,得到第l层卷积网络中的所述第一分支的第一输出参数;s502. Quantitative training is performed on the input parameters of the first layer of the convolutional network in the first branch of the first layer of the convolutional network by using a weighted function to obtain the first of the first branch in the first layer of the convolutional network. Output parameters;
在一种可行的实施方式中,步骤s502具体包括以下步骤:In a feasible implementation manner, step s502 specifically includes the following steps:
b1,采用定点量化方法得到第l层卷积网络的第一分支的权重量化值;b1, using a fixed-point quantization method to obtain a weighted value of the first branch of the first layer of the convolutional network;
b2,第一卷积单元接收所述第l层卷积网络的第一分支的权重量化值和第l-1层的输出值,将两个值卷积后得到第一分支的第一卷积值;b2. The first convolution unit receives the weighted value of the first branch of the first layer of the convolutional network and the output value of the l-1 layer, and convolves the two values to obtain the first convolution of the first branch. value;
b3,第一激活单元采用非线性函数对所述第一分支的第一卷积值进行非线性处理,得到所述第一分支的第一输出参数。b3. The first activation unit performs a non-linear process on the first convolution value of the first branch by using a non-linear function to obtain a first output parameter of the first branch.
根据步骤s102,所述定点量化方法采用所述低精度量化模型来对所述第l层卷积网络的第一分支的权重进行量化训练,得到所述第l层卷积网络的第一分支的权重量化值。在步骤According to step s102, the fixed-point quantization method uses the low-precision quantization model to perform quantitative training on the weights of the first branch of the first-level convolutional network to obtain the first branch of the first-level convolutional network. Weighted value. In steps
Figure PCTCN2018124834-appb-000001
Figure PCTCN2018124834-appb-000001
其中,
Figure PCTCN2018124834-appb-000002
表示第l层卷积网络的第一分支的权重量化值,W l 1表示第l层卷积网络的第一分支的权重,Q w表示定点量化方法对权重的量化函数。
among them,
Figure PCTCN2018124834-appb-000002
Represents the weighted value of the first branch of the first-level convolutional network, W l 1 represents the weight of the first branch of the first-level convolutional network, and Q w represents the quantization function of the weight by the fixed-point quantization method.
在步骤b2中,第一卷积单元接收所述第l层卷积网络的第一分支的权重量化值和第l-1层的输出值,将两个值卷积的计算公式如下;In step b2, the first convolution unit receives the weighted value of the first branch of the first-level convolution network and the output value of the first-first layer, and the calculation formula of the convolution of the two values is as follows;
Figure PCTCN2018124834-appb-000003
Figure PCTCN2018124834-appb-000003
其中,
Figure PCTCN2018124834-appb-000004
表示第一分支的第一卷积值,
Figure PCTCN2018124834-appb-000005
表示第l-1层的输出值。
among them,
Figure PCTCN2018124834-appb-000004
Represents the first convolution value of the first branch,
Figure PCTCN2018124834-appb-000005
Represents the output value of layer l-1.
在步骤b3中,第一激活单元采用非线性函数对所述第一分支的第一卷积值进行非线性处理,计算公式如下:In step b3, the first activation unit performs a non-linear processing on the first convolution value of the first branch by using a non-linear function, and a calculation formula is as follows:
Figure PCTCN2018124834-appb-000006
Figure PCTCN2018124834-appb-000006
其中,
Figure PCTCN2018124834-appb-000007
表示所述第一分支的第一输出参数,φ表示第一激活单元的激活函数。
among them,
Figure PCTCN2018124834-appb-000007
Represents the first output parameter of the first branch, and φ represents the activation function of the first activation unit.
s503,将第l层卷积网络的输入参数传输至第l层卷积网络中的第二分支进行训练得到第l层卷积网络中的第二分支的原始输出参数;s503, transmitting the input parameters of the first layer of the convolutional network to the second branch in the first layer of the convolutional network for training to obtain the original output parameters of the second branch in the first layer of the convolutional network;
在一种可行的实施方式中,步骤s503具体包括:In a feasible implementation manner, step s503 specifically includes:
c1,第二卷积单元接收第l层卷积网络的第二分支的权重和第l-1层的输出值,将两个值卷积后得到第二分支的第二卷积值;c1. The second convolution unit receives the weight of the second branch of the l-th layer of the convolutional network and the output value of the l-1 layer, and convolves the two values to obtain the second convolution value of the second branch;
c2,第二激活单元采用非线性函数对所述第二卷积值进行非线性处理,得到所述第l层卷积网络的第二分支的原始输出参数。c2. The second activation unit performs a non-linear processing on the second convolution value by using a non-linear function to obtain an original output parameter of a second branch of the first-layer convolution network.
根据步骤s102,所述第l层卷积网络的第二分支的权重采用全精度模型表示。所述第二激活单元采用非线性函数(如ReLu函数)对所述第二卷积值进行非线性处理。在步骤c1中,第二卷积单元接收第l层卷积网络的第二分支的权重和第l-1层的输出值,将两个值卷积的计算公式如下:According to step s102, the weight of the second branch of the first-layer convolutional network is expressed by a full-precision model. The second activation unit uses a non-linear function (such as a ReLu function) to perform non-linear processing on the second convolution value. In step c1, the second convolution unit receives the weight of the second branch of the l-th layer of the convolutional network and the output value of the l-1 layer, and the calculation formula of the convolution of the two values is as follows:
Figure PCTCN2018124834-appb-000008
Figure PCTCN2018124834-appb-000008
其中,
Figure PCTCN2018124834-appb-000009
表示第二分支的第二卷积值,W l 2表示第l层卷积网络的第二分支的权重,
Figure PCTCN2018124834-appb-000010
表示第l-1层的输出值。
among them,
Figure PCTCN2018124834-appb-000009
Represents the second convolution value of the second branch, W l 2 represents the weight of the second branch of the l-th layer convolution network,
Figure PCTCN2018124834-appb-000010
Represents the output value of layer l-1.
在步骤c2中,第二激活单元采用非线性函数对所述第二卷积值进行非线性处理,计算公式如下:In step c2, the second activation unit performs nonlinear processing on the second convolution value by using a non-linear function, and a calculation formula is as follows:
Figure PCTCN2018124834-appb-000011
Figure PCTCN2018124834-appb-000011
其中,
Figure PCTCN2018124834-appb-000012
表示所述第二分支的原始输出参数,φ′表示第二激活单元的激活函数。
among them,
Figure PCTCN2018124834-appb-000012
Represents the original output parameter of the second branch, and φ ′ represents the activation function of the second activation unit.
s504,根据第l层卷积网络进行量化训练时的训练步数获取对应的缩放因子;s504: Obtain a corresponding scaling factor according to the number of training steps during the quantized training of the first-layer convolutional network;
s505,采用所获取的缩放因子对第l层卷积网络中的第二分支的原始输出参数进行衰减处理,得到第l层卷积网络中的所述第二分支的第二输出参数;S505: Attenuate the original output parameter of the second branch in the first-layer convolutional network by using the obtained scaling factor to obtain the second output parameter of the second branch in the first-layer convolutional network;
在步骤s505中,采用所获取的缩放因子对第l层卷积网络中的第二分支的原始输出参数进行衰减处理,计算公式如下:In step s505, the obtained scaling factor is used to attenuate the original output parameters of the second branch in the first-layer convolutional network, and the calculation formula is as follows:
Figure PCTCN2018124834-appb-000013
Figure PCTCN2018124834-appb-000013
其中,factor表示所述所获取的缩放因子。Wherein, factor represents the obtained scaling factor.
s506,对所述第一输出参数及所述第二输出参数进行求和处理得到中间参数;S506: Sum the first output parameter and the second output parameter to obtain an intermediate parameter.
在步骤s506中,对所述第一输出参数及所述第二输出参数进行求和处理得到中间参数,计算公式如下:In step s506, the first output parameter and the second output parameter are summed to obtain an intermediate parameter, and the calculation formula is as follows:
Figure PCTCN2018124834-appb-000014
Figure PCTCN2018124834-appb-000014
其中,A l表示中间参数。 Among them, A l represents an intermediate parameter.
s507,采用激活值量化函数对所述中间参数进行量化训练,得到第l层卷积网络的输出参数;S507: Quantize training the intermediate parameters by using an activation value quantization function to obtain output parameters of the first-layer convolution network;
在步骤s507中,采用激活值量化函数对所述中间参数进行量化训练,计算公式如下:In step s507, quantization training is performed on the intermediate parameter by using an activation value quantization function, and a calculation formula is as follows:
Figure PCTCN2018124834-appb-000015
Figure PCTCN2018124834-appb-000015
其中,
Figure PCTCN2018124834-appb-000016
表示第l层卷积网络的输出参数,Q a表示定点量化方法对激活值的量化函数。
among them,
Figure PCTCN2018124834-appb-000016
Represents the output parameters of the first layer of the convolutional network, and Q a represents the quantization function of the activation value by the fixed-point quantization method.
s508,将第l层卷积网络的输出参数确定为第l+1层卷积网络的输入参数,并重复上述步骤直至所有层卷积网络均完成量化训练且所述缩放因子减小至零为止。s508. Determine the output parameters of the layer l convolutional network as the input parameters of the layer l + 1 layer convolution network, and repeat the above steps until all the layer convolutional networks have completed quantization training and the scaling factor is reduced to zero. .
本发明实施例的网络量化过程中,原始深度神经网络包括多层卷积网络,每一层卷积网络均包括第一分支和第二分支,所述第二分支为全精度卷积结构;对每一层卷积网络进行量化训练,并按照缩放因子对每一层卷积网络进行衰减处理,直到所有层卷积网络均完成量化训练且所述缩放因子减小至零为止;其中,所述缩放因子随所述量化训练的训练步数的增加而减小。实施本发明实施例,通过在原始深度神经网络的每一层卷积网络中增设全精度卷积结构的第二分支,这使得每一层卷积网络的输出具备更强的表达能力;再对每一层卷积网络进行量化训练及衰减处理,这样能够使网络量化训练时的波动更小,训练出来的网络精度更高,所得到的目标深度神经网络可以收敛到更好的局部最优点,网络性能更佳。In the network quantization process of the embodiment of the present invention, the original deep neural network includes a multi-layer convolutional network, and each layer of the convolutional network includes a first branch and a second branch, and the second branch is a full-precision convolution structure; Each layer of the convolutional network is subjected to quantization training, and each layer of the convolutional network is subjected to attenuation processing according to the scaling factor until all layers of the convolutional network have completed quantization training and the scaling factor is reduced to zero; wherein, the The scaling factor decreases as the number of training steps of the quantized training increases. Implementing the embodiment of the present invention, by adding a second branch of the full-precision convolution structure to the convolutional network of each layer of the original deep neural network, this makes the output of the convolutional network of each layer have a stronger expression ability; Each layer of the convolutional network is subjected to quantized training and attenuation processing, so that the fluctuations in the quantized training of the network are smaller, and the trained network has higher accuracy. The obtained target deep neural network can converge to a better local best advantage. Better network performance.
针对图3所示的原始深度神经网络的结构,本发明实施例提出了图1所示的步骤s102的另一种具体实施方式流程图,请参见图6,步骤s102具体包括以下步骤s601-s607:Aiming at the structure of the original deep neural network shown in FIG. 3, an embodiment of the present invention proposes a flowchart of another specific implementation manner of step s102 shown in FIG. 1. Referring to FIG. 6, step s102 specifically includes the following steps s601-s607. :
s601,获取第l层卷积网络的输入参数,所述输入参数包括权重和激活值;S601: Obtain input parameters of a layer l convolutional network, where the input parameters include weights and activation values;
s602,采用权重量化函数及激活值量化函数在第l层卷积网络中的第一分支中对第l层卷积网络的输入参数进行量化训练,得到第l层卷积网络中的所述第一分支的第一输出参数;s602. Quantitative training is performed on the input parameters of the first layer of the convolutional network in the first branch of the first layer of the convolutional network by using a weighting function and an activation value quantization function to obtain the first layer of the first layer of the convolutional network. The first output parameter of a branch;
在一种可行的实施方式中,步骤s602具体包括以下步骤:In a feasible implementation manner, step s602 specifically includes the following steps:
d1,采用定点量化方法得到第l层卷积网络的第一分支的权重量化值;d1, using a fixed-point quantization method to obtain a weighted value of the first branch of the first layer of the convolutional network;
d2,第一卷积单元接收所述第l层卷积网络的第一分支的权重量化值和第l-1层的输出值,将两个值卷积后的得到第一分支的第一卷积值;d2. The first convolution unit receives the weighted value of the first branch of the convolutional network of the first layer and the output value of the first to first layer, and convolves the two values to obtain the first volume of the first branch. Product value
d3,第一激活单元采用非线性函数对所述第一分支的第一卷积值进行非线性处理,得到所述第一分支的激活值;d3. The first activation unit performs a nonlinear process on the first convolution value of the first branch by using a non-linear function to obtain an activation value of the first branch;
d4,采用激活值量化函数对所述第一分支的激活值进行量化训练得到所述第l层卷积网络中的所述第一分支的第一输出参数。d4. Use the activation value quantization function to perform quantization training on the activation value of the first branch to obtain a first output parameter of the first branch in the first-layer convolutional network.
在步骤d1中,采用定点量化方法得到第l层卷积网络的第一分支的权重量化值,计算公式如下:In step d1, a fixed-point quantization method is used to obtain the weighted value of the first branch of the first layer of the convolution network, and the calculation formula is as follows:
Figure PCTCN2018124834-appb-000017
Figure PCTCN2018124834-appb-000017
其中,
Figure PCTCN2018124834-appb-000018
表示第l层卷积网络的第一分支的权重量化值,W l 1表示第l层卷积网络的第一分支的权重,Q w表示定点量化方法对权重的量化函数。
among them,
Figure PCTCN2018124834-appb-000018
Represents the weighted value of the first branch of the first-level convolutional network, W l 1 represents the weight of the first branch of the first-level convolutional network, and Q w represents the quantization function of the weight by the fixed-point quantization method.
在步骤d2中,第一卷积单元接收所述第l层卷积网络的第一分支的权重量化值和第l-1 层的输出值,将两个值卷积的计算公式如下:In step d2, the first convolution unit receives the weighted value of the first branch of the first-layer convolutional network and the output value of the l-1 layer, and the calculation formula of the convolution of the two values is as follows:
Figure PCTCN2018124834-appb-000019
Figure PCTCN2018124834-appb-000019
其中,
Figure PCTCN2018124834-appb-000020
表示第一分支的第一卷积值,
Figure PCTCN2018124834-appb-000021
表示第l-1层的输出值。
among them,
Figure PCTCN2018124834-appb-000020
Represents the first convolution value of the first branch,
Figure PCTCN2018124834-appb-000021
Represents the output value of layer l-1.
在步骤d3中,第一激活单元采用非线性函数对所述第一分支的第一卷积值进行非线性处理,计算公式如下:In step d3, the first activation unit performs a non-linear processing on the first convolution value of the first branch by using a non-linear function, and a calculation formula is as follows:
Figure PCTCN2018124834-appb-000022
Figure PCTCN2018124834-appb-000022
其中,
Figure PCTCN2018124834-appb-000023
表示所述第一分支的第一输出参数,φ表示第一激活单元的激活函数。
among them,
Figure PCTCN2018124834-appb-000023
Represents the first output parameter of the first branch, and φ represents the activation function of the first activation unit.
在步骤d4中,采用激活值量化函数对所述第一分支的激活值进行量化训练,计算公式如下:In step d4, the activation value quantization function is used to perform quantitative training on the activation value of the first branch, and the calculation formula is as follows:
Figure PCTCN2018124834-appb-000024
Figure PCTCN2018124834-appb-000024
其中,
Figure PCTCN2018124834-appb-000025
表示第l层卷积网络中的所述第一分支的第一输出参数,Q a表示定点量化方法对激活值的量化函数。
among them,
Figure PCTCN2018124834-appb-000025
Represents the first output parameter of the first branch in the first-layer convolutional network, and Q a represents the quantization function of the activation value by the fixed-point quantization method.
s603,将第l层卷积网络的输入参数传输至第l层卷积网络中的第二分支进行训练得到第l层卷积网络中的第二分支的原始输出参数;s603. Transmit the input parameters of the first layer of the convolutional network to the second branch in the first layer of the convolutional network for training to obtain the original output parameters of the second branch in the first layer of the convolutional network;
在一种可行的实施方式中,步骤s603具体包括:In a feasible implementation manner, step s603 specifically includes:
e1,第二卷积单元接收第l层卷积网络的第二分支的权重和第l-1层的输出值,将两个值卷积后得到第二卷积值;e1. The second convolution unit receives the weight of the second branch of the first layer of the convolutional network and the output value of the first layer -1, and convolves the two values to obtain the second convolution value;
e2,第二激活单元采用非线性函数对所述第二卷积值进行非线性处理,得到所述第l层卷积网络的第二分支的原始输出参数。e2. The second activation unit performs a non-linear processing on the second convolution value by using a non-linear function to obtain an original output parameter of a second branch of the first-layer convolution network.
在步骤e1中,第二卷积单元接收第l层卷积网络的第二分支的权重和第l-1层的输出值,将两个值卷积的计算公式如下:In step e1, the second convolution unit receives the weight of the second branch of the l-th layer convolution network and the output value of the l-1 layer, and the calculation formula of the convolution of the two values is as follows:
Figure PCTCN2018124834-appb-000026
Figure PCTCN2018124834-appb-000026
其中,
Figure PCTCN2018124834-appb-000027
表示第二分支的第二卷积值,W l 2表示第l层卷积网络的第二分支的权重,
Figure PCTCN2018124834-appb-000028
表示第l-1层的输出值。
among them,
Figure PCTCN2018124834-appb-000027
Represents the second convolution value of the second branch, W l 2 represents the weight of the second branch of the l-th layer convolution network,
Figure PCTCN2018124834-appb-000028
Represents the output value of layer l-1.
在步骤e2中,第二激活单元采用非线性函数对所述第二卷积值进行非线性处理,计算公式如下:In step e2, the second activation unit performs nonlinear processing on the second convolution value by using a non-linear function, and a calculation formula is as follows:
Figure PCTCN2018124834-appb-000029
Figure PCTCN2018124834-appb-000029
其中,
Figure PCTCN2018124834-appb-000030
表示所述第二分支的原始输出参数,φ′表示第二激活单元的激活函数。
among them,
Figure PCTCN2018124834-appb-000030
Represents the original output parameter of the second branch, and φ ′ represents the activation function of the second activation unit.
s604,根据第l层卷积网络进行量化训练时的训练步数获取对应的缩放因子;s604. Obtain a corresponding scaling factor according to the number of training steps during the quantized training of the first-layer convolutional network.
s605,采用所获取的缩放因子对第l层卷积网络中的第二分支的原始输出参数进行衰减处理,得到第l层卷积网络中的所述第二分支的第二输出参数;S605: Attenuate the original output parameter of the second branch in the first-layer convolutional network by using the obtained scaling factor to obtain the second output parameter of the second branch in the first-layer convolutional network;
在步骤s605中,采用所获取的缩放因子对第l层卷积网络中的第二分支的原始输出参数进行衰减处理,计算公式如下:In step s605, the obtained scaling factor is used to attenuate the original output parameters of the second branch in the first-layer convolutional network, and the calculation formula is as follows:
Figure PCTCN2018124834-appb-000031
Figure PCTCN2018124834-appb-000031
其中,factor表示所述所获取的缩放因子。Wherein, factor represents the obtained scaling factor.
s606,对所述第一输出参数及所述第二输出参数进行求和处理得到第l层卷积网络的输出参数;S606: Sum the first output parameter and the second output parameter to obtain an output parameter of the first-layer convolution network;
在步骤s606中,对所述第一输出参数及所述第二输出参数进行求和处理,计算公式如 下:In step s606, the first output parameter and the second output parameter are summed up, and the calculation formula is as follows:
Figure PCTCN2018124834-appb-000032
Figure PCTCN2018124834-appb-000032
其中,
Figure PCTCN2018124834-appb-000033
表示第l层卷积网络的输出参数。
among them,
Figure PCTCN2018124834-appb-000033
Represents the output parameters of the l-th layer convolutional network.
s607,将第l层卷积网络的输出参数确定为第l+1层卷积网络的输入参数,并重复上述步骤直至所有层卷积网络均完成量化训练且所述缩放因子减小至零为止。s607. Determine the output parameters of the layer l convolutional network as the input parameters of the layer l + 1 layer convolution network, and repeat the above steps until all layer convolution networks have completed quantization training and the scaling factor is reduced to zero. .
本发明实施例的网络量化过程中,原始深度神经网络包括多层卷积网络,每一层卷积网络均包括第一分支和第二分支,所述第二分支为全精度卷积结构;对每一层卷积网络进行量化训练,并按照缩放因子对每一层卷积网络进行衰减处理,直到所有层卷积网络均完成量化训练且所述缩放因子减小至零为止;其中,所述缩放因子随所述量化训练的训练步数的增加而减小。实施本发明实施例,通过在原始深度神经网络的每一层卷积网络中设置全精度卷积结构的第二分支,这使得每一层卷积网络的输出具备更强的表达能力;再对每一层卷积网络进行量化训练及衰减处理,这样能够使网络量化训练时的波动更小,训练出来的网络精度更高,所得到的目标深度神经网络可以收敛到更好的局部最优点,网络性能更佳。In the network quantization process of the embodiment of the present invention, the original deep neural network includes a multi-layer convolutional network, and each layer of the convolutional network includes a first branch and a second branch, and the second branch is a full-precision convolution structure; Each layer of the convolutional network is subjected to quantization training, and each layer of the convolutional network is subjected to attenuation processing according to the scaling factor until all layers of the convolutional network have completed quantization training and the scaling factor is reduced to zero; wherein, the The scaling factor decreases as the number of training steps of the quantized training increases. By implementing the embodiment of the present invention, the second branch of the full-precision convolution structure is set in the convolutional network of each layer of the original deep neural network, which makes the output of the convolutional network of each layer have a stronger expression ability; Each layer of the convolutional network is subjected to quantized training and attenuation processing, so that the fluctuations in the quantized training of the network are smaller, and the trained network has higher accuracy. The obtained target deep neural network can converge to a better local best advantage. Better network performance.
基于上述网络量化方法的实施例的描述,本发明实施例提出了一种基于深度神经网络的业务处理方法,请参见图7,该方法具体包括以下步骤:Based on the description of the foregoing embodiment of the network quantization method, an embodiment of the present invention proposes a service processing method based on a deep neural network. Referring to FIG. 7, the method specifically includes the following steps:
s701,接收业务请求,所述业务请求携带待处理的业务对象;所述业务请求包括以下任一种:图像处理请求、人脸识别请求、视觉处理请求及自然语言识别处理请求。可以理解的是,所述待处理的业务可包括但不限于:图像处理业务,人脸识别业务,视觉处理业务及自然语言识别业务等。S701: Receive a service request, where the service request carries a business object to be processed; the service request includes any of the following: an image processing request, a face recognition request, a visual processing request, and a natural language recognition processing request. It can be understood that the services to be processed may include, but are not limited to, image processing services, face recognition services, visual processing services, and natural language recognition services.
s702,调用目标深度神经网络对所述业务对象进行处理,得到业务处理结果;其中,所述目标深度神经网络是采用所述网络量化方法获得的。S702: Invoke a target deep neural network to process the business object to obtain a business processing result. The target deep neural network is obtained by using the network quantization method.
目标深度神经网络可以是采用图1-图6所示的网络量化方法得到的,该目标深度神经网络可以设于网络设备中,该网络设备可包括但不限于:终端设备、嵌入式设备及网络服务器等等。其中,终端设备可以包括但不限于:智能手机、平板电脑及移动可穿戴设备等;嵌入式设备可包括但不限于DSP(Digital Signal Processing,数字信号处理)芯片设备等。当接收到业务请求时,网络设备调用所述目标深度神经网络,并将该业务请求携带的待处理的业务对象(如图像、人脸图像等)作为输入参数传输至目标深度神经网络进行相应的业务处理,得到业务处理结果。The target deep neural network may be obtained by using the network quantization method shown in FIG. 1 to FIG. 6. The target deep neural network may be set in a network device. The network device may include, but is not limited to, terminal devices, embedded devices, and networks. Server and so on. The terminal device may include, but is not limited to, a smart phone, a tablet computer, and a mobile wearable device; the embedded device may include, but is not limited to, a DSP (Digital Signal Processing) chip device. When a service request is received, the network device invokes the target deep neural network, and transmits the to-be-processed business objects (such as images, face images, etc.) carried by the service request as input parameters to the target deep neural network for corresponding processing. Business processing to get business processing results.
s703,输出所述业务处理结果。S703: Output the service processing result.
根据业务对象的不同,所述业务处理结果与所述业务对象一一对应;例如,所述业务对象为图像处理,则对应的业务处理结果可以包括但不限于图像的模糊、锐化、边缘检测等;又例如,所述业务对象为人脸识别,则对应的业务处理结果为匹配人脸图像,查找关联的身份信息等。According to different business objects, the business processing result corresponds to the business object one-to-one; for example, if the business object is image processing, the corresponding business processing result may include, but is not limited to, image blurring, sharpening, and edge detection. For another example, if the business object is face recognition, the corresponding business processing result is matching face images, searching for associated identity information, and the like.
本发明实施例基于深度神经网络的业务处理过程中,可接收业务请求,所述业务请求携带待处理的业务对象;所述业务请求包括以下任一种:图像处理请求、人脸识别请求、视觉处理请求及自然语言识别处理请求;调用目标深度神经网络对所述业务对象进行处理,得到业务处理结果;其中,所述目标深度神经网络是采用所述网络量化方法获得的;输出所述业务处理结果;实施本发明实施例,由于用于进行业务处理的目标深度神经网络是通过所述网 络量化方法获得的,该目标深度神经网络具备较佳的网络性能且精度较高,这样可有效提高业务处理的效率及保证业务处理的质量。During the service processing process based on the deep neural network in the embodiment of the present invention, a service request may be received, and the service request carries a business object to be processed; the service request includes any of the following: an image processing request, a face recognition request, a vision A processing request and a natural language recognition processing request; calling a target deep neural network to process the business object to obtain a business processing result; wherein the target deep neural network is obtained using the network quantization method; and outputting the business processing Results; In the implementation of the embodiment of the present invention, since the target deep neural network used for business processing is obtained through the network quantization method, the target deep neural network has better network performance and higher accuracy, which can effectively improve services. Processing efficiency and quality of business processing.
基于上述基于深度神经网络的业务处理方法,本发明实施例提供一种基于深度神经网络的业务处理方法的应用场景,请参见图8a-图8c,以人脸识别业务为例,设目标深度神经网络被设置于手机中的人脸识别APP(Application,应用程序)中。所述人脸识别业务的处理步骤如下:①用户使用安装了相机APP(Application,应用程序)和人脸识别APP的手机,打开相机APP,点击拍照,如图8a所示;用户拍摄照片后,打开人脸识别APP,选择进行人脸识别处理,则人脸识别APP调用所述目标深度神经网络对所述用户拍摄的人脸照片进行人脸识别处理,如图8b所示;处理完成后输出人脸识别结果,如图8c所示。Based on the foregoing deep neural network-based service processing method, an embodiment of the present invention provides an application scenario of a deep neural network-based service processing method. Please refer to FIG. 8a to FIG. 8c, taking a face recognition service as an example, and setting a target deep neural network. The network is set in a face recognition APP (Application) in a mobile phone. The processing steps of the face recognition service are as follows: ① The user uses a mobile phone with a camera app (Application, application) and a face recognition app installed, opens the camera app, and clicks to take a photo, as shown in FIG. 8a. After the user takes the photo, Open the face recognition APP and choose to perform face recognition processing, then the face recognition APP calls the target deep neural network to perform face recognition processing on the face photo taken by the user, as shown in FIG. 8b; after the processing is completed, output Face recognition results are shown in Figure 8c.
本发明实施例基于深度神经网络的业务处理过程中,可接收人脸识别请求,所述人脸识别请求携带待处理的人脸图像;调用目标深度神经网络对所述人脸图像进行识别处理得到人脸识别结果;其中,所述目标深度神经网络是采用所述网络量化方法获得的;输出所述人脸识别结果;实施本发明实施例,由于用于进行人脸识别处理的目标深度神经网络是通过所述网络量化方法获得的,该目标深度神经网络具备较佳的网络性能且精度较高,这样可有效提高人脸识别处理的效率,保证人脸识别处理的准确性。In the service processing process based on the deep neural network of the embodiment of the present invention, a face recognition request may be received, and the face recognition request carries a face image to be processed; the target deep neural network is called to perform recognition processing on the face image to obtain Face recognition results; wherein the target deep neural network is obtained by using the network quantization method; outputting the face recognition results; implementation of the embodiment of the present invention due to the target deep neural network used for face recognition processing Obtained through the network quantization method, the target deep neural network has better network performance and higher accuracy, which can effectively improve the efficiency of face recognition processing and ensure the accuracy of face recognition processing.
基于上述网络量化方法的实施例的描述,本发明实施例提供一种网络量化装置,该装置可以是运行于网络设备中的一个计算机程序,可以被应用于上述图1、图5及图6所示的网络量化方法中,以用于执行网络量化方法中的相应步骤。请参见图9,该装置可包括:Based on the description of the foregoing embodiment of the network quantization method, an embodiment of the present invention provides a network quantization apparatus. The apparatus may be a computer program running on a network device, and may be applied to the above-mentioned FIGS. 1, 5, and 6. The illustrated network quantization method is used to perform the corresponding steps in the network quantization method. Referring to FIG. 9, the device may include:
获取单元101,用于获取待量化的原始深度神经网络;所述原始深度神经网络包括多层卷积网络,任一层卷积网络均包括第一分支和第二分支,所述第二分支为全精度卷积结构;The obtaining unit 101 is configured to obtain an original deep neural network to be quantified. The original deep neural network includes a multi-layer convolutional network. Each layer of the convolutional network includes a first branch and a second branch. The second branch is Full precision convolution structure;
量化单元102,用于对每一层卷积网络进行量化训练,并按照缩放因子对每一层卷积网络进行衰减处理;其中,所述缩放因子随所述量化训练的训练步数的增加而减小;A quantization unit 102 is configured to perform quantization training on each layer of the convolutional network, and perform attenuation processing on each layer of the convolutional network according to a scaling factor; wherein the scaling factor increases as the number of training steps of the quantization training increases. Decrease
处理单元103,用于当所有层卷积网络均完成量化训练且所述缩放因子减小至零时,将所述原始深度神经网络中每一层卷积网络中的所述第二分支进行移除处理,得到量化后的目标深度神经网络。A processing unit 103, configured to move the second branch in each layer of the convolutional network in the original deep neural network when the quantization training is completed in all layers and the scaling factor is reduced to zero; Divide the processing to get the quantified target deep neural network.
一种实施例中,所述获取单元101具体用于:In an embodiment, the obtaining unit 101 is specifically configured to:
获取初始深度神经网络,所述初始深度神经网络包括多层卷积网络,每一层卷积网络包括第一分支,所述第一分支为定点量化精度卷积结构;Obtaining an initial deep neural network, where the initial deep neural network includes a multi-layer convolutional network, each layer of the convolutional network includes a first branch, and the first branch is a fixed-point quantization accuracy convolution structure;
为所述初始深度神经网络的每一层卷积网络设置第二分支得到所述原始深度神经网络。A second branch is set for each layer of the initial deep neural network to obtain the original deep neural network.
另一种实施例中,所述量化单元102具体用于:In another embodiment, the quantization unit 102 is specifically configured to:
获取第l层卷积网络的输入参数,所述输入参数包括权重和激活值;Obtaining input parameters of a layer l convolutional network, where the input parameters include weights and activation values;
采用权重量化函数在第l层卷积网络中的第一分支中对第l层卷积网络的输入参数进行量化训练,得到第l层卷积网络中的所述第一分支的第一输出参数;A weighted function is used to quantitatively train the input parameters of the first layer of the convolutional network in the first branch in the first layer of the convolutional network to obtain the first output parameters of the first branch in the first layer of the convolutional network. ;
将第l层卷积网络的输入参数传输至第l层卷积网络中的第二分支进行训练得到第l层卷积网络中的第二分支的原始输出参数;Transmitting the input parameters of the first-level convolutional network to the second branch in the first-level convolutional network for training to obtain the original output parameters of the second branch in the first-level convolutional network;
根据第l层卷积网络进行量化训练时的训练步数获取对应的缩放因子;Obtain a corresponding scaling factor according to the number of training steps during the quantized training of the first-layer convolutional network;
采用所获取的缩放因子对第l层卷积网络中的第二分支的原始输出参数进行衰减处理,得到第l层卷积网络中的所述第二分支的第二输出参数;Attenuating the original output parameters of the second branch in the first-layer convolutional network by using the obtained scaling factor to obtain the second output parameters of the second branch in the first-layer convolutional network;
对所述第一输出参数及所述第二输出参数进行求和处理得到中间参数;Summing the first output parameter and the second output parameter to obtain an intermediate parameter;
采用激活值量化函数对所述中间参数进行量化训练,得到第l层卷积网络的输出参数;Use the activation value quantization function to perform quantitative training on the intermediate parameters to obtain the output parameters of the first-layer convolutional network;
将第l层卷积网络的输出参数确定为第l+1层卷积网络的输入参数,并重复上述步骤直至所有层卷积网络均完成量化训练且所述缩放因子减小至零为止。The output parameters of the layer l convolutional network are determined as the input parameters of the layer l + 1 layer convolutional network, and the above steps are repeated until all the layer convolutional networks have completed quantization training and the scaling factor is reduced to zero.
再一种实施例中,所述量化单元102具体用于:In another embodiment, the quantization unit 102 is specifically configured to:
获取第l层卷积网络中的输入参数,所述输入参数包括权重和激活值;Obtaining input parameters in a layer l convolutional network, where the input parameters include weights and activation values;
采用权重量化函数及激活值量化函数在第l层卷积网络中的第一分支中对第l层卷积网络的输入参数进行量化训练,得到第l层卷积网络中的所述第一分支的第一输出参数;The weighted function and the activation value quantization function are used to quantify and train the input parameters of the first-layer convolutional network in the first branch of the first-layer convolutional network to obtain the first branch in the first-layer convolutional network First output parameter;
将第l层卷积网络的输入参数传输至第l层卷积网络中的第二分支进行训练得到第l层卷积网络中的第二分支的原始输出参数;Transmitting the input parameters of the first-level convolutional network to the second branch in the first-level convolutional network for training to obtain the original output parameters of the second branch in the first-level convolutional network;
根据第l层卷积网络进行量化训练时的训练步数获取对应的缩放因子;Obtain a corresponding scaling factor according to the number of training steps during the quantized training of the first-layer convolutional network;
采用计算得到的缩放因子对第l层卷积网络中的第二分支的原始输出参数进行衰减处理,得到第l层卷积网络中的所述第二分支的第二输出参数;Attenuating the original output parameters of the second branch in the first-layer convolutional network by using the calculated scaling factor to obtain the second output parameters of the second branch in the first-layer convolutional network;
对所述第一输出参数及所述第二输出参数进行求和处理得到第l层卷积网络的输出参数;Summing the first output parameter and the second output parameter to obtain an output parameter of the first-layer convolution network;
将第l层卷积网络的输出参数确定为第l+1层卷积网络的输入参数,并重复上述步骤直至所有层卷积网络均完成量化训练且所述缩放因子减小至零为止。The output parameters of the layer l convolutional network are determined as the input parameters of the layer l + 1 layer convolutional network, and the above steps are repeated until all the layer convolutional networks have completed quantization training and the scaling factor is reduced to zero.
再一种实施例中,所述量化单元102具体用于:In another embodiment, the quantization unit 102 is specifically configured to:
获取第l层卷积网络进行量化训练时的训练步数;Obtain the number of training steps when the l-th layer convolutional network is used for quantized training;
调用余弦衰减函数计算第l层卷积网络进行量化训练时的训练步数对应的缩放因子。The cosine attenuation function is called to calculate the scaling factor corresponding to the training steps in the first layer of the convolutional network for quantized training.
本发明实施例的网络量化过程中,原始深度神经网络包括多层卷积网络,每一层卷积网络均包括第一分支和第二分支,所述第二分支为全精度卷积结构;对每一层卷积网络进行量化训练,并按照缩放因子对每一层卷积网络进行衰减处理;其中,所述缩放因子随所述量化训练的训练步数的增加而减小;当所有层卷积网络均完成量化训练且所述缩放因子减小至零时,将所述原始深度神经网络中每一层卷积网络中的所述第二分支进行移除处理,得到量化后的目标深度神经网络。实施本发明实施例,通过在原始深度神经网络的每一层卷积网络中设置全精度卷积结构的第二分支,这使得每一层卷积网络的输出具备更强的表达能力;再对每一层卷积网络进行量化训练及衰减处理,这样能够使网络量化训练时的波动更小,训练出来的网络精度更高,所得到的目标深度神经网络的网络性能更佳。In the network quantization process of the embodiment of the present invention, the original deep neural network includes a multi-layer convolutional network, and each layer of the convolutional network includes a first branch and a second branch, and the second branch is a full-precision convolution structure; Each layer of the convolutional network is subjected to quantization training, and each layer of the convolutional network is subjected to attenuation processing according to a scaling factor; wherein the scaling factor decreases as the number of training steps of the quantization training increases; When the convolutional network has completed quantization training and the scaling factor is reduced to zero, the second branch in each layer of the convolutional network in the original deep neural network is removed to obtain a quantized target deep neural network. The internet. By implementing the embodiment of the present invention, the second branch of the full-precision convolution structure is set in the convolutional network of each layer of the original deep neural network, which makes the output of the convolutional network of each layer have a stronger expression ability; Each layer of the convolutional network is subjected to quantized training and attenuation processing, which can make the fluctuation of the network during quantized training smaller, the trained network has higher accuracy, and the obtained target deep neural network has better network performance.
基于上述基于深度神经网络的业务处理方法的实施例的描述,本发明实施例提供一种基于深度神经网络的业务处理装置,该装置可以是运行于网络设备中的一个计算机程序,可以被应用于上述图7所示的基于深度神经网络的业务处理方法中,以用于执行基于深度神经网络的业务处理方法中的相应步骤。请参见图10,该装置可包括:Based on the foregoing description of the embodiment of the deep neural network-based service processing method, an embodiment of the present invention provides a deep neural network-based service processing apparatus. The apparatus may be a computer program running on a network device, and may be applied to In the above-mentioned deep neural network-based service processing method shown in FIG. 7, the method is used to execute corresponding steps in the deep neural network-based service processing method. Referring to FIG. 10, the device may include:
请求接收单元201,用于接收业务请求,所述业务请求携带待处理的业务对象;所述业务请求包括以下任一种:图像处理请求、人脸识别请求、视觉处理请求及自然语言识别处理请求;The request receiving unit 201 is configured to receive a service request, where the service request carries a business object to be processed; the service request includes any of the following: an image processing request, a face recognition request, a visual processing request, and a natural language recognition processing request ;
业务处理单元202,用于调用目标深度神经网络对所述业务对象进行处理,得到业务处理结果;其中所述目标深度神经网络是采用所述网络量化方法获得的;A business processing unit 202 is configured to call a target deep neural network to process the business object to obtain a service processing result; wherein the target deep neural network is obtained by using the network quantization method;
结果输出单元203,用于输出所述业务处理结果。The result output unit 203 is configured to output the service processing result.
本发明实施例基于深度神经网络的业务处理过程中,可接收业务请求,所述业务请求携带待处理的业务对象;所述业务请求包括以下任一种:图像处理请求、人脸识别请求、视觉处理请求及自然语言识别处理请求;调用目标深度神经网络对所述业务对象进行处理,得到业务处理结果;其中,所述目标深度神经网络是采用所述网络量化方法获得的;输出所述业务处理结果;实施本发明实施例,由于用于进行业务处理的目标深度神经网络是通过所述网络量化方法获得的,该目标深度神经网络具备较佳的网络性能且精度较高,这样可有效提高业务处理的效率及保证业务处理的质量。During the service processing process based on the deep neural network in the embodiment of the present invention, a service request may be received, and the service request carries a business object to be processed; the service request includes any of the following: an image processing request, a face recognition request, a vision A processing request and a natural language recognition processing request; calling a target deep neural network to process the business object to obtain a business processing result; wherein the target deep neural network is obtained using the network quantization method; and outputting the business processing Results; In the implementation of the embodiment of the present invention, since the target deep neural network used for business processing is obtained through the network quantization method, the target deep neural network has better network performance and higher accuracy, which can effectively improve services. Processing efficiency and quality of business processing.
基于上述网络量化方法及装置,基于深度神经网络的业务处理方法及装置的实施例的描述,本发明实施例还提供一种网络设备,可以被应用于上述图1、图5及图6所示的网络量化方法及图7所示的基于深度神经网络的业务处理方法中,以用于执行网络量化方法及基于深度神经网络的业务处理方法中的相应步骤。请参见图11,该网络设备的内部结构可包括处理器、网络接口及计算机存储介质。其中,该网络设备内的处理器、通信接口及计算机存储介质可通过总线或其他方式连接。Based on the description of the embodiments of the network quantization method and device, and the deep neural network-based service processing method and device, the embodiment of the present invention further provides a network device that can be applied to the above-mentioned FIG. 1, FIG. 5, and FIG. 6 The network quantization method and the deep neural network-based business processing method shown in FIG. 7 are used to perform the corresponding steps in the network quantization method and the deep neural network-based business processing method. Referring to FIG. 11, the internal structure of the network device may include a processor, a network interface, and a computer storage medium. The processor, the communication interface, and the computer storage medium in the network device may be connected through a bus or other methods.
通信接口是实现网络设备与外部设备之间进行交互和信息交换的媒介。处理器(或称CPU(Central Processing Unit,中央处理器))是网络设备的计算核心以及控制核心,其适于实现一条或一条以上指令,具体适于加载并执行一条或一条以上指令从而实现相应方法流程或相应功能;计算机存储介质(Memory)是服务器中的记忆设备,用于存放程序和数据。可以理解的是,此处的计算机存储介质既可以包括网络设备的内置存储介质,当然也可以包括网络设备所支持的扩展存储介质。计算机存储介质提供存储空间,该存储空间存储了网络设备的操作系统。并且,在该存储空间中还存放了适于被处理器加载并执行的一条或一条以上的指令,这些指令可以是一个或一个以上的计算机程序(包括程序代码)。需要说明的是,此处的计算机存储介质可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器;可选的还可以是至少一个位于远离前述处理器的计算机存储介质。The communication interface is a medium for implementing interaction and information exchange between network equipment and external equipment. A processor (or CPU (Central Processing Unit)) is the computing core and control core of a network device. It is suitable for implementing one or more instructions, and is specifically suitable for loading and executing one or more instructions to achieve the corresponding Method flow or corresponding function; a computer storage medium (Memory) is a memory device in a server and is used to store programs and data. It can be understood that the computer storage medium herein may include both a built-in storage medium of a network device and an extended storage medium supported by the network device. The computer storage medium provides a storage space that stores an operating system of a network device. Moreover, one or more instructions suitable for being loaded and executed by the processor are stored in the storage space, and these instructions may be one or more computer programs (including program code). It should be noted that the computer storage medium here may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), for example, at least one disk memory; optionally, at least one is located far away from the foregoing processor. Computer storage media.
在一个实施例中,所述计算机存储介质存储有一条或一条以上第一指令,处理器加载并执行计算机存储介质中存放的一条或一条以上第一指令,以实现上述图1、图5或图6所示网络量化方法流程中的相应步骤;具体实现中,计算机存储介质中的一条或一条以上第一指令由处理器加载并执行如下步骤:In one embodiment, the computer storage medium stores one or more first instructions, and the processor loads and executes the one or more first instructions stored in the computer storage medium to implement the above-mentioned FIG. 1, FIG. 5 or FIG. Corresponding steps in the network quantization method flow shown in 6; in specific implementation, one or more first instructions in a computer storage medium are loaded by a processor and executed as follows:
获取待量化的原始深度神经网络,所述原始深度神经网络包括多层卷积网络,任一层卷积网络均包括第一分支和第二分支,所述第二分支为全精度卷积结构;Obtaining an original deep neural network to be quantified, the original deep neural network including a multi-layer convolution network, and each layer of the convolution network includes a first branch and a second branch, and the second branch is a full-precision convolution structure;
对每一层卷积网络进行量化训练,并按照缩放因子对每一层卷积网络进行衰减处理;其中,所述缩放因子随所述量化训练的训练步数的增加而减小;Perform quantization training on each layer of the convolutional network, and perform attenuation processing on each layer of the convolutional network according to a scaling factor; wherein the scaling factor decreases as the number of training steps of the quantization training increases;
当所有层卷积网络均完成量化训练且所述缩放因子减小至零时,将所述原始深度神经网络中每一层卷积网络中的所述第二分支进行移除处理,得到量化后的目标深度神经网络。When all layers of the convolutional network have completed quantization training and the scaling factor is reduced to zero, the second branch in each layer of the convolutional network in the original deep neural network is removed, and after quantization is obtained Target deep neural network.
另一种实施例中,所述获取待量化的原始深度神经网络,包括:In another embodiment, the acquiring the original deep neural network to be quantified includes:
获取初始深度神经网络,所述初始深度神经网络包括多层卷积网络,每一层卷积网络包括第一分支,所述第一分支为定点量化精度卷积结构;Obtaining an initial deep neural network, where the initial deep neural network includes a multi-layer convolutional network, each layer of the convolutional network includes a first branch, and the first branch is a fixed-point quantization accuracy convolution structure;
为所述初始深度神经网络的每一层卷积网络设置第二分支得到所述原始深度神经网络。A second branch is set for each layer of the initial deep neural network to obtain the original deep neural network.
再一种实施例中,所述原始深度神经网络包括L层卷积网络,L为正整数;其中的任一层卷积网络表示为第l层卷积网络,l为正整数,且1≤l≤L;In still another embodiment, the original deep neural network includes a L-layer convolutional network, where L is a positive integer; any one of the convolutional networks is represented as a l-th layer convolutional network, where l is a positive integer, and 1 ≤ l≤L;
其中,全精度包括浮点精度及定点量化精度。Among them, full precision includes floating-point precision and fixed-point quantization precision.
再一种实施例中,所述对每一层卷积网络进行量化训练,并按照缩放因子对每一层卷积网络进行衰减处理,包括:In still another embodiment, performing quantized training on each layer of the convolutional network and performing attenuation processing on each layer of the convolutional network according to a scaling factor includes:
获取第l层卷积网络的输入参数,所述输入参数包括权重和激活值;Obtaining input parameters of a layer l convolutional network, where the input parameters include weights and activation values;
采用权重量化函数在第l层卷积网络中的第一分支中对第l层卷积网络的输入参数进行量化训练,得到第l层卷积网络中的所述第一分支的第一输出参数;A weighted function is used to quantitatively train the input parameters of the first layer of the convolutional network in the first branch in the first layer of the convolutional network to obtain the first output parameters of the first branch in the first layer of the convolutional network. ;
将第l层卷积网络的输入参数传输至第l层卷积网络中的第二分支进行训练得到第l层卷积网络中的第二分支的原始输出参数;Transmitting the input parameters of the first-level convolutional network to the second branch in the first-level convolutional network for training to obtain the original output parameters of the second branch in the first-level convolutional network;
根据第l层卷积网络进行量化训练时的训练步数获取对应的缩放因子;Obtain a corresponding scaling factor according to the number of training steps during the quantized training of the first-layer convolutional network;
采用所获取的缩放因子对第l层卷积网络中的第二分支的原始输出参数进行衰减处理,得到第l层卷积网络中的所述第二分支的第二输出参数;Attenuating the original output parameters of the second branch in the first-layer convolutional network by using the obtained scaling factor to obtain the second output parameters of the second branch in the first-layer convolutional network;
对所述第一输出参数及所述第二输出参数进行求和处理得到中间参数;Summing the first output parameter and the second output parameter to obtain an intermediate parameter;
采用激活值量化函数对所述中间参数进行量化训练,得到第l层卷积网络的输出参数;Use the activation value quantization function to perform quantitative training on the intermediate parameters to obtain the output parameters of the first-layer convolutional network;
将第l层卷积网络的输出参数确定为第l+1层卷积网络的输入参数,并重复上述步骤直至所有层卷积网络均完成量化训练且所述缩放因子减小至零为止。The output parameters of the layer l convolutional network are determined as the input parameters of the layer l + 1 layer convolutional network, and the above steps are repeated until all the layer convolutional networks have completed quantization training and the scaling factor is reduced to zero.
再一种实施例中,所述对每一层卷积网络进行量化训练,并按照缩放因子对每一层卷积网络进行衰减处理,包括:In still another embodiment, performing quantized training on each layer of the convolutional network and performing attenuation processing on each layer of the convolutional network according to a scaling factor includes:
获取第l层卷积网络中的输入参数,所述输入参数包括权重和激活值;Obtaining input parameters in a layer l convolutional network, where the input parameters include weights and activation values;
采用权重量化函数及激活值量化函数在第l层卷积网络中的第一分支中对第l层卷积网络的输入参数进行量化训练,得到第l层卷积网络中的所述第一分支的第一输出参数;The weighted function and the activation value quantization function are used to quantify and train the input parameters of the first-layer convolutional network in the first branch of the first-layer convolutional network to obtain the first branch in the first-layer convolutional network First output parameter;
将第l层卷积网络的输入参数传输至第l层卷积网络中的第二分支进行训练得到第l层卷积网络中的第二分支的原始输出参数;Transmitting the input parameters of the first-level convolutional network to the second branch in the first-level convolutional network for training to obtain the original output parameters of the second branch in the first-level convolutional network;
根据第l层卷积网络进行量化训练时的训练步数获取对应的缩放因子;Obtain a corresponding scaling factor according to the number of training steps during the quantized training of the first-layer convolutional network;
采用计算得到的缩放因子对第l层卷积网络中的第二分支的原始输出参数进行衰减处理,得到第l层卷积网络中的所述第二分支的第二输出参数;Attenuating the original output parameters of the second branch in the first-layer convolutional network by using the calculated scaling factor to obtain the second output parameters of the second branch in the first-layer convolutional network;
对所述第一输出参数及所述第二输出参数进行求和处理得到第l层卷积网络的输出参数;Summing the first output parameter and the second output parameter to obtain an output parameter of the first-layer convolution network;
将第l层卷积网络的输出参数确定为第l+1层卷积网络的输入参数,并重复上述步骤直至所有层卷积网络均完成量化训练且所述缩放因子减小至零为止。The output parameters of the layer l convolutional network are determined as the input parameters of the layer l + 1 layer convolutional network, and the above steps are repeated until all the layer convolutional networks have completed quantization training and the scaling factor is reduced to zero.
再一种实施例中,所述根据第l层卷积网络进行量化训练时的训练步数获取对应的缩放因子,包括:In another embodiment, obtaining the corresponding scaling factor according to the number of training steps during quantized training according to the first-layer convolutional network includes:
获取第l层卷积网络进行量化训练时的训练步数;Obtain the number of training steps when the l-th layer convolutional network is used for quantized training;
调用余弦衰减函数计算第l层卷积网络进行量化训练时的训练步数对应的缩放因子。The cosine attenuation function is called to calculate the scaling factor corresponding to the training steps in the first layer of the convolutional network for quantized training.
本发明实施例的网络量化过程中,原始深度神经网络包括多层卷积网络,每一层卷积网络均包括第一分支和第二分支,所述第二分支为全精度卷积结构;对每一层卷积网络进行量化训练,并按照缩放因子对每一层卷积网络进行衰减处理;其中,所述缩放因子随所述量化训练的训练步数的增加而减小;当所有层卷积网络均完成量化训练且所述缩放因子减小至零 时,将所述原始深度神经网络中每一层卷积网络中的所述第二分支进行移除处理,得到量化后的目标深度神经网络。实施本发明实施例,通过在原始深度神经网络的每一层卷积网络中设置全精度卷积结构的第二分支,这使得每一层卷积网络的输出具备更强的表达能力;再对每一层卷积网络进行量化训练及衰减处理,这样能够使网络量化训练时的波动更小,训练出来的网络精度更高,所得到的目标深度神经网络的网络性能更佳。In the network quantization process of the embodiment of the present invention, the original deep neural network includes a multi-layer convolutional network, and each layer of the convolutional network includes a first branch and a second branch, and the second branch is a full-precision convolution structure; Each layer of the convolutional network is subjected to quantization training, and each layer of the convolutional network is subjected to attenuation processing according to a scaling factor; wherein the scaling factor decreases as the number of training steps of the quantization training increases; When the convolutional network has completed quantization training and the scaling factor is reduced to zero, the second branch in each layer of the convolutional network in the original deep neural network is removed to obtain a quantized target deep neural network. The internet. By implementing the embodiment of the present invention, the second branch of the full-precision convolution structure is set in the convolutional network of each layer of the original deep neural network, which makes the output of the convolutional network of each layer have a stronger expression ability; Each layer of the convolutional network is subjected to quantized training and attenuation processing, which can make the fluctuation of the network during quantized training smaller, the trained network has higher accuracy, and the obtained target deep neural network has better network performance.
在一个实施例中,所述计算机存储介质存储有一条或一条以上第二指令,处理器加载并执行计算机存储介质中存放的一条或一条以上第二指令,以实现上述图7所示基于深度神经网络的业务处理方法流程中的相应步骤;具体实现中,计算机存储介质中的一条或一条以上第二指令由处理器加载并执行如下步骤:In one embodiment, the computer storage medium stores one or more second instructions, and the processor loads and executes one or more second instructions stored in the computer storage medium to implement the above-mentioned deep neural-based system shown in FIG. 7. Corresponding steps in the network business processing method flow; in specific implementation, one or more second instructions in a computer storage medium are loaded by a processor and executed as follows:
接收业务请求,所述业务请求携带待处理的业务对象;所述业务请求包括以下任一种:图像处理请求、人脸识别请求、视觉处理请求及自然语言识别处理请求;Receiving a service request, the service request carrying a business object to be processed; the service request includes any of the following: an image processing request, a face recognition request, a visual processing request, and a natural language recognition processing request;
调用目标深度神经网络对所述业务对象进行处理,得到业务处理结果;其中,所述目标深度神经网络是采用所述网络量化方法获得的;Invoking a target deep neural network to process the business object to obtain a business processing result; wherein the target deep neural network is obtained by using the network quantization method;
输出所述业务处理结果。Output the business processing result.
本发明实施例基于深度神经网络的业务处理过程中,可接收业务请求,所述业务请求携带待处理的业务对象;所述业务请求包括以下任一种:图像处理请求、人脸识别请求、视觉处理请求及自然语言识别处理请求;调用目标深度神经网络对所述业务对象进行处理,得到业务处理结果;其中,所述目标深度神经网络是采用所述网络量化方法获得的;输出所述业务处理结果;实施本发明实施例,由于用于进行业务处理的目标深度神经网络是通过所述网络量化方法获得的,该目标深度神经网络具备较佳的网络性能且精度较高,这样可有效提高业务处理的效率及保证业务处理的质量。During the service processing process based on the deep neural network in the embodiment of the present invention, a service request may be received, and the service request carries a business object to be processed; the service request includes any of the following: an image processing request, a face recognition request, a vision A processing request and a natural language recognition processing request; calling a target deep neural network to process the business object to obtain a business processing result; wherein the target deep neural network is obtained using the network quantization method; and outputting the business processing Results; In the implementation of the embodiment of the present invention, since the target deep neural network used for business processing is obtained through the network quantization method, the target deep neural network has better network performance and higher accuracy, which can effectively improve services. Processing efficiency and quality of business processing.

Claims (10)

  1. 一种网络量化方法,其特征在于,所述方法包括:A network quantization method, characterized in that the method includes:
    获取待量化的原始深度神经网络,所述原始深度神经网络包括多层卷积网络,每一层卷积网络均包括第一分支和第二分支,所述第二分支为全精度卷积结构;Obtaining an original deep neural network to be quantified, the original deep neural network including a multi-layer convolutional network, each layer of the convolutional network including a first branch and a second branch, the second branch being a full-precision convolution structure;
    对每一层卷积网络进行量化训练,并按照缩放因子对每一层卷积网络进行衰减处理;其中,所述缩放因子随所述量化训练的训练步数的增加而减小;Perform quantization training on each layer of the convolutional network, and perform attenuation processing on each layer of the convolutional network according to a scaling factor; wherein the scaling factor decreases as the number of training steps of the quantization training increases;
    当所有层卷积网络均完成量化训练且所述缩放因子减小至零时,将所述原始深度神经网络中每一层卷积网络中的所述第二分支进行移除处理,得到量化后的目标深度神经网络。When all layers of the convolutional network have completed quantization training and the scaling factor is reduced to zero, the second branch in each layer of the convolutional network in the original deep neural network is removed, and after quantization is obtained Target deep neural network.
  2. 根据权利要求1所述的方法,其特征在于,所述获取待量化的原始深度神经网络,包括:The method according to claim 1, wherein the obtaining the original deep neural network to be quantified comprises:
    获取初始深度神经网络,所述初始深度神经网络包括多层卷积网络,每一层卷积网络包括第一分支,所述第一分支为定点量化精度卷积结构;Obtaining an initial deep neural network, where the initial deep neural network includes a multi-layer convolutional network, each layer of the convolutional network includes a first branch, and the first branch is a fixed-point quantization accuracy convolution structure;
    为所述初始深度神经网络的每一层卷积网络设置第二分支得到所述原始深度神经网络。A second branch is set for each layer of the initial deep neural network to obtain the original deep neural network.
  3. 根据权利要求1所述的方法,其特征在于,所述原始深度神经网络包括L层卷积网络,L为正整数;其中的任一层卷积网络表示为第l层卷积网络,l为正整数,且1≤l≤L;The method according to claim 1, wherein the original deep neural network comprises an L-layer convolutional network, where L is a positive integer; any one of the convolutional networks is represented as a l-th layer convolutional network, and l is Positive integer, and 1≤l≤L;
    其中,全精度包括浮点精度。Among them, full precision includes floating point precision.
  4. 根据权利要求3所述的方法,其特征在于,所述对每一层卷积网络进行量化训练,并按照缩放因子对每一层卷积网络进行衰减处理,包括:The method according to claim 3, wherein performing quantized training on each layer of the convolutional network and performing attenuation processing on each layer of the convolutional network according to a scaling factor comprises:
    获取第l层卷积网络的输入参数,所述输入参数包括权重和激活值;Obtaining input parameters of a layer l convolutional network, where the input parameters include weights and activation values;
    采用权重量化函数在第l层卷积网络中的第一分支中对第l层卷积网络的输入参数进行量化训练,得到第l层卷积网络中的所述第一分支的第一输出参数;A weighted function is used to quantitatively train the input parameters of the first layer of the convolutional network in the first branch in the first layer of the convolutional network to obtain the first output parameters of the first branch in the first layer of the convolutional network. ;
    将第l层卷积网络的输入参数传输至第l层卷积网络中的第二分支进行训练得到第l层卷积网络中的第二分支的原始输出参数;Transmitting the input parameters of the first-level convolutional network to the second branch in the first-level convolutional network for training to obtain the original output parameters of the second branch in the first-level convolutional network;
    根据第l层卷积网络进行量化训练时的训练步数获取对应的缩放因子;Obtain a corresponding scaling factor according to the number of training steps during the quantized training of the first-layer convolutional network;
    采用所获取的缩放因子对第l层卷积网络中的第二分支的原始输出参数进行衰减处理,得到第l层卷积网络中的所述第二分支的第二输出参数;Attenuating the original output parameters of the second branch in the first-layer convolutional network by using the obtained scaling factor to obtain the second output parameters of the second branch in the first-layer convolutional network;
    对所述第一输出参数及所述第二输出参数进行求和处理得到中间参数;Summing the first output parameter and the second output parameter to obtain an intermediate parameter;
    采用激活值量化函数对所述中间参数进行量化训练,得到第l层卷积网络的输出参数;Use the activation value quantization function to perform quantitative training on the intermediate parameters to obtain the output parameters of the first-layer convolutional network;
    将第l层卷积网络的输出参数确定为第l+1层卷积网络的输入参数,并重复上述步骤直至所有层卷积网络均完成量化训练且所述缩放因子减小至零为止。The output parameters of the layer l convolutional network are determined as the input parameters of the layer l + 1 layer convolutional network, and the above steps are repeated until all the layer convolutional networks have completed quantization training and the scaling factor is reduced to zero.
  5. 根据权利要求3所述的方法,其特征在于,所述对每一层卷积网络进行量化训练,并按照缩放因子对每一层卷积网络进行衰减处理,包括:The method according to claim 3, wherein performing quantized training on each layer of the convolutional network and performing attenuation processing on each layer of the convolutional network according to a scaling factor comprises:
    获取第l层卷积网络中的输入参数,所述输入参数包括权重和激活值;Obtaining input parameters in a layer l convolutional network, where the input parameters include weights and activation values;
    采用权重量化函数及激活值量化函数在第l层卷积网络中的第一分支中对第l层卷积网络的输入参数进行量化训练,得到第l层卷积网络中的所述第一分支的第一输出参数;The weighted function and the activation value quantization function are used to quantify and train the input parameters of the first-layer convolutional network in the first branch of the first-layer convolutional network to obtain the first branch in the first-layer convolutional network. First output parameter;
    将第l层卷积网络的输入参数传输至第l层卷积网络中的第二分支进行训练得到第l层卷积网络中的第二分支的原始输出参数;Transmitting the input parameters of the first-level convolutional network to the second branch in the first-level convolutional network for training to obtain the original output parameters of the second branch in the first-level convolutional network;
    根据第l层卷积网络进行量化训练时的训练步数获取对应的缩放因子;Obtain a corresponding scaling factor according to the number of training steps during the quantized training of the first-layer convolutional network;
    采用计算得到的缩放因子对第l层卷积网络中的第二分支的原始输出参数进行衰减处理,得到第l层卷积网络中的所述第二分支的第二输出参数;Attenuating the original output parameters of the second branch in the first-layer convolutional network by using the calculated scaling factor to obtain the second output parameters of the second branch in the first-layer convolutional network;
    对所述第一输出参数及所述第二输出参数进行求和处理得到第l层卷积网络的输出参数;Summing the first output parameter and the second output parameter to obtain an output parameter of the first-layer convolution network;
    将第l层卷积网络的输出参数确定为第l+1层卷积网络的输入参数,并重复上述步骤直至所有层卷积网络均完成量化训练且所述缩放因子减小至零为止。The output parameters of the layer l convolutional network are determined as the input parameters of the layer l + 1 layer convolutional network, and the above steps are repeated until all the layer convolutional networks have completed quantization training and the scaling factor is reduced to zero.
  6. 根据权利要求4或5所述的方法,其特征在于,所述根据第l层卷积网络进行量化训练时的训练步数获取对应的缩放因子,包括:The method according to claim 4 or 5, wherein the acquiring the corresponding scaling factor during the number of training steps during quantized training according to the first-layer convolution network comprises:
    获取第l层卷积网络进行量化训练时的训练步数;Obtain the number of training steps when the l-th layer convolutional network is used for quantized training;
    调用余弦衰减函数计算第l层卷积网络进行量化训练时的训练步数对应的缩放因子。The cosine attenuation function is called to calculate the scaling factor corresponding to the training steps in the first layer of the convolutional network for quantized training.
  7. 一种基于深度神经网络的业务处理方法,其特征在于,所述方法包括:A business processing method based on a deep neural network, wherein the method includes:
    接收业务请求,所述业务请求携带待处理的业务对象;所述业务请求包括以下任一种:图像处理请求、人脸识别请求、视觉处理请求及自然语言识别处理请求;Receiving a service request, the service request carrying a business object to be processed; the service request includes any of the following: an image processing request, a face recognition request, a visual processing request, and a natural language recognition processing request;
    调用目标深度神经网络对所述业务对象进行处理,得到业务处理结果;其中,所述目标深度神经网络是采用权利要求1-6任一项所述的网络量化方法获得的;Calling a target deep neural network to process the business object to obtain a business processing result; wherein the target deep neural network is obtained by using the network quantization method according to any one of claims 1-6;
    输出所述业务处理结果。Output the business processing result.
  8. 一种网络量化装置,其特征在于,所述装置包括:A network quantization device, characterized in that the device includes:
    获取单元,用于获取待量化的原始深度神经网络;所述原始深度神经网络包括多层卷积网络,任一层卷积网络均包括第一分支和第二分支,所述第二分支为全精度卷积结构;An obtaining unit for obtaining an original deep neural network to be quantified; the original deep neural network includes a multi-layer convolutional network, and each layer of the convolutional network includes a first branch and a second branch, and the second branch is a full branch Precision convolution structure;
    量化单元,用于对每一层卷积网络进行量化训练,并按照缩放因子对每一层卷积网络进行衰减处理;其中,所述缩放因子随所述量化训练的训练步数的增加而减小;A quantization unit is configured to perform quantization training on each layer of the convolutional network, and perform attenuation processing on each layer of the convolutional network according to a scaling factor; wherein the scaling factor decreases as the number of training steps of the quantization training increases. small;
    处理单元,用于当所有层卷积网络均完成量化训练且所述缩放因子减小至零时,将所述原始深度神经网络中每一层卷积网络中的所述第二分支进行移除处理,得到量化后的目标深度神经网络。A processing unit configured to remove the second branch in each layer of the convolutional network in the original deep neural network when the quantization training is completed in all layers and the scaling factor is reduced to zero Processing to get the quantified target deep neural network.
  9. 一种基于深度神经网络的业务处理装置,其特征在于,包括:A service processing device based on a deep neural network is characterized in that it includes:
    请求接收单元,用于接收业务请求,所述业务请求携带待处理的业务对象;所述业务请求包括以下任一种:图像处理请求、人脸识别请求、视觉处理请求及自然语言识别处理请求;A request receiving unit, configured to receive a service request, the service request carrying a business object to be processed; the service request includes any of the following: an image processing request, a face recognition request, a visual processing request, and a natural language recognition processing request;
    业务处理单元,用于调用目标深度神经网络对所述业务对象进行处理,得到业务处理结果;其中所述目标深度神经网络是采用权利要求1-6任一项所述的网络量化方法获得的;A business processing unit for invoking a target deep neural network to process the business object to obtain a business processing result; wherein the target deep neural network is obtained by using the network quantization method according to any one of claims 1-6;
    结果输出单元,用于输出所述业务处理结果。A result output unit, configured to output the service processing result.
  10. 一种网络设备,其特征在于,包括:A network device, comprising:
    处理器,适于实现一条或一条以上指令;以及A processor adapted to implement one or more instructions; and
    计算机存储介质,所述计算机存储介质存储有一条或一条以上第一指令,所述一条或一条以上第一指令适于由所述处理器加载并执行如权利要求1-6任一项所述的网络量化方法;或者,所述计算机存储介质存储有一条或一条以上第二指令,所述一条或一条以上第二指令适于由所述处理器加载并执行如权利要求7所述的基于深度神经网络的业务处理方法。A computer storage medium that stores one or more first instructions, and the one or more first instructions are adapted to be loaded by the processor and executed according to any one of claims 1-6 Network quantization method; or, the computer storage medium stores one or more second instructions, and the one or more second instructions are adapted to be loaded by the processor and execute the deep neural-based method according to claim 7. Network business processing methods.
PCT/CN2018/124834 2018-09-19 2018-12-28 Network quantization method, service processing method and related products WO2020057000A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811092329.XA CN110929865B (en) 2018-09-19 2018-09-19 Network quantification method, service processing method and related product
CN201811092329.X 2018-09-19

Publications (1)

Publication Number Publication Date
WO2020057000A1 true WO2020057000A1 (en) 2020-03-26

Family

ID=69855094

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/124834 WO2020057000A1 (en) 2018-09-19 2018-12-28 Network quantization method, service processing method and related products

Country Status (2)

Country Link
CN (1) CN110929865B (en)
WO (1) WO2020057000A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200272162A1 (en) * 2019-02-21 2020-08-27 Nvidia Corporation Quantizing autoencoders in a neural network
CN111985495A (en) * 2020-07-09 2020-11-24 珠海亿智电子科技有限公司 Model deployment method, device, system and storage medium
CN112766456A (en) * 2020-12-31 2021-05-07 平安科技(深圳)有限公司 Quantification method, device, equipment and storage medium of floating point type deep neural network
CN112861602A (en) * 2020-12-10 2021-05-28 华南理工大学 Face living body recognition model compression and transplantation method based on depth separable convolution
CN113128440A (en) * 2021-04-28 2021-07-16 平安国际智慧城市科技股份有限公司 Target object identification method, device, equipment and storage medium based on edge equipment
CN113780513A (en) * 2020-06-10 2021-12-10 杭州海康威视数字技术股份有限公司 Network model quantification and inference method and device, electronic equipment and storage medium
CN116911350A (en) * 2023-09-12 2023-10-20 苏州浪潮智能科技有限公司 Quantification method based on graph neural network model, task processing method and task processing device

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111523526A (en) * 2020-07-02 2020-08-11 杭州雄迈集成电路技术股份有限公司 Target detection method, computer equipment and readable storage medium
CN112712164B (en) * 2020-12-30 2022-08-26 上海熠知电子科技有限公司 Non-uniform quantization method of neural network
CN116187420B (en) * 2023-05-04 2023-07-25 上海齐感电子信息科技有限公司 Training method, system, equipment and medium for lightweight deep neural network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160328645A1 (en) * 2015-05-08 2016-11-10 Qualcomm Incorporated Reduced computational complexity for fixed point neural network
US20170076195A1 (en) * 2015-09-10 2017-03-16 Intel Corporation Distributed neural networks for scalable real-time analytics
US20170286830A1 (en) * 2016-04-04 2017-10-05 Technion Research & Development Foundation Limited Quantized neural network training and inference
CN107480770A (en) * 2017-07-27 2017-12-15 中国科学院自动化研究所 The adjustable neutral net for quantifying bit wide quantifies the method and device with compression
CN108256632A (en) * 2018-01-29 2018-07-06 百度在线网络技术(北京)有限公司 Information processing method and device

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10373050B2 (en) * 2015-05-08 2019-08-06 Qualcomm Incorporated Fixed point neural network based on floating point neural network quantization
US20180046903A1 (en) * 2016-08-12 2018-02-15 DeePhi Technology Co., Ltd. Deep processing unit (dpu) for implementing an artificial neural network (ann)
CN106778684A (en) * 2017-01-12 2017-05-31 易视腾科技股份有限公司 deep neural network training method and face identification method
CN106971160A (en) * 2017-03-23 2017-07-21 西京学院 Winter jujube disease recognition method based on depth convolutional neural networks and disease geo-radar image
CN107368857A (en) * 2017-07-24 2017-11-21 深圳市图芯智能科技有限公司 Image object detection method, system and model treatment method, equipment, terminal
CN108491927A (en) * 2018-03-16 2018-09-04 新智认知数据服务有限公司 A kind of data processing method and device based on neural network
CN108510467B (en) * 2018-03-28 2022-04-08 西安电子科技大学 SAR image target identification method based on depth deformable convolution neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160328645A1 (en) * 2015-05-08 2016-11-10 Qualcomm Incorporated Reduced computational complexity for fixed point neural network
US20170076195A1 (en) * 2015-09-10 2017-03-16 Intel Corporation Distributed neural networks for scalable real-time analytics
US20170286830A1 (en) * 2016-04-04 2017-10-05 Technion Research & Development Foundation Limited Quantized neural network training and inference
CN107480770A (en) * 2017-07-27 2017-12-15 中国科学院自动化研究所 The adjustable neutral net for quantifying bit wide quantifies the method and device with compression
CN108256632A (en) * 2018-01-29 2018-07-06 百度在线网络技术(北京)有限公司 Information processing method and device

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200272162A1 (en) * 2019-02-21 2020-08-27 Nvidia Corporation Quantizing autoencoders in a neural network
US11977388B2 (en) * 2019-02-21 2024-05-07 Nvidia Corporation Quantizing autoencoders in a neural network
CN113780513B (en) * 2020-06-10 2024-05-03 杭州海康威视数字技术股份有限公司 Network model quantization and reasoning method and device, electronic equipment and storage medium
CN113780513A (en) * 2020-06-10 2021-12-10 杭州海康威视数字技术股份有限公司 Network model quantification and inference method and device, electronic equipment and storage medium
CN111985495A (en) * 2020-07-09 2020-11-24 珠海亿智电子科技有限公司 Model deployment method, device, system and storage medium
CN111985495B (en) * 2020-07-09 2024-02-02 珠海亿智电子科技有限公司 Model deployment method, device, system and storage medium
CN112861602B (en) * 2020-12-10 2023-05-26 华南理工大学 Face living body recognition model compression and transplantation method based on depth separable convolution
CN112861602A (en) * 2020-12-10 2021-05-28 华南理工大学 Face living body recognition model compression and transplantation method based on depth separable convolution
CN112766456B (en) * 2020-12-31 2023-12-26 平安科技(深圳)有限公司 Quantization method, device and equipment for floating-point deep neural network and storage medium
CN112766456A (en) * 2020-12-31 2021-05-07 平安科技(深圳)有限公司 Quantification method, device, equipment and storage medium of floating point type deep neural network
CN113128440A (en) * 2021-04-28 2021-07-16 平安国际智慧城市科技股份有限公司 Target object identification method, device, equipment and storage medium based on edge equipment
CN116911350A (en) * 2023-09-12 2023-10-20 苏州浪潮智能科技有限公司 Quantification method based on graph neural network model, task processing method and task processing device
CN116911350B (en) * 2023-09-12 2024-01-09 苏州浪潮智能科技有限公司 Quantification method based on graph neural network model, task processing method and task processing device

Also Published As

Publication number Publication date
CN110929865B (en) 2021-03-05
CN110929865A (en) 2020-03-27

Similar Documents

Publication Publication Date Title
WO2020057000A1 (en) Network quantization method, service processing method and related products
CN110363279B (en) Image processing method and device based on convolutional neural network model
WO2020233010A1 (en) Image recognition method and apparatus based on segmentable convolutional network, and computer device
WO2020228522A1 (en) Target tracking method and apparatus, storage medium and electronic device
WO2022193432A1 (en) Model parameter updating method, apparatus and device, storage medium, and program product
CN109978137B (en) Processing method of convolutional neural network
WO2020056718A1 (en) Quantization method and apparatus for neural network model in device
WO2021135715A1 (en) Image compression method and apparatus
WO2020119188A1 (en) Program detection method, apparatus and device, and readable storage medium
WO2020207174A1 (en) Method and apparatus for generating quantized neural network
US20210176174A1 (en) Load balancing device and method for an edge computing network
CN110795235B (en) Method and system for deep learning and cooperation of mobile web
CN106980967A (en) Payment processing method and processing device
WO2022088063A1 (en) Method and apparatus for quantizing neural network model, and method and apparatus for processing data
CN111967608A (en) Data processing method, device, equipment and storage medium
WO2021127982A1 (en) Speech emotion recognition method, smart device, and computer-readable storage medium
CN113780549A (en) Quantitative model training method, device, medium and terminal equipment for overflow perception
WO2023206889A1 (en) Model inference methods and apparatuses, devices, and storage medium
CN111461302A (en) Data processing method, device and storage medium based on convolutional neural network
CN110211017B (en) Image processing method and device and electronic equipment
WO2021081854A1 (en) Convolution operation circuit and convolution operation method
US20230196086A1 (en) Increased precision neural processing element
CN112397086A (en) Voice keyword detection method and device, terminal equipment and storage medium
WO2021238289A1 (en) Sequence processing method and apparatus
US20220138528A1 (en) Data processing method for neural network accelerator, device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18934130

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18934130

Country of ref document: EP

Kind code of ref document: A1