WO2023137710A1 - Neural network training method, image processing method and device, system, and medium - Google Patents

Neural network training method, image processing method and device, system, and medium Download PDF

Info

Publication number
WO2023137710A1
WO2023137710A1 PCT/CN2022/073246 CN2022073246W WO2023137710A1 WO 2023137710 A1 WO2023137710 A1 WO 2023137710A1 CN 2022073246 W CN2022073246 W CN 2022073246W WO 2023137710 A1 WO2023137710 A1 WO 2023137710A1
Authority
WO
WIPO (PCT)
Prior art keywords
auxiliary
layer
output value
neural network
convolutional layer
Prior art date
Application number
PCT/CN2022/073246
Other languages
French (fr)
Chinese (zh)
Inventor
聂谷洪
肖立睿
陈铂
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2022/073246 priority Critical patent/WO2023137710A1/en
Publication of WO2023137710A1 publication Critical patent/WO2023137710A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the present application relates to the technical field of image processing, in particular, to a neural network training method, image processing method, device, system and storage medium.
  • neural network technology is applied to all aspects of life, such as using neural network technology for image recognition (such as face recognition or expression recognition, etc.) tasks, or image classification tasks.
  • image recognition such as face recognition or expression recognition, etc.
  • image classification tasks such as face recognition or expression recognition, etc.
  • quantization acceleration that is, by quantizing the floating-point values in the neural network, the redundant precision of the data is cut out, so that floating-point calculations are converted into bit operations (or small integer calculations), which can not only reduce network storage, but also greatly accelerate.
  • the 8-bit quantized neural network (that is, the floating-point value in the neural network is quantized to 8 bits) is currently a more commonly used quantization model.
  • the 8-bit quantized neural network still needs to occupy most of the resources during the inference process, thereby affecting small devices to perform other tasks.
  • one of the objectives of the present application is to provide a neural network training method, image processing method, device, system and storage medium.
  • the embodiment of the present application provides a training method of a neural network, the neural network is used to process computer vision tasks, the method includes:
  • the neural network includes at least one residual block; the convolution layer in the residual block performs a binary convolution operation, and the number of quantized bits of the input value and the output value of the residual block is 1 bit.
  • the embodiment of the present application provides an image processing method, including:
  • the image to be processed is input into a pre-trained neural network to obtain image processing results; wherein the neural network includes at least one residual block; the convolution layer in the residual block performs binary convolution operation, and the quantized bit numbers of the input value and the output value of the residual block are 1 bit.
  • the embodiment of the present application provides a neural network training device, the neural network is used to process computer vision tasks, and the device includes:
  • processors one or more processors
  • the one or more processors execute the executable instructions, they are individually or collectively configured to execute the method described in the first aspect.
  • an image processing device including:
  • processors one or more processors
  • the one or more processors execute the executable instructions, they are individually or collectively configured to execute the method described in the second aspect.
  • the embodiment of the present application provides an image processing system, including the image processing device and the mobile platform described in the fourth aspect;
  • the movable platform is provided with a photographing device, and the movable platform is used to send the image captured by the photographing device to the image processing device.
  • an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores executable instructions, and when the executable instructions are executed by a processor, the method described in the first aspect or the second aspect is implemented.
  • the embodiments of the present application provide a neural network training method, image processing method, device, system, and storage medium.
  • a neural network is obtained by training image samples with labels and the neural network is used to process computer vision tasks; wherein, the neural network includes at least one residual block; the convolutional layer in the residual block performs binary convolution operation, which reduces the computational complexity, and the quantization bits of the input value and output value of the residual block are 1 bit, which helps to reduce the computing bandwidth and the storage capacity of the neural network, and has universal applicability. The amount is reduced, the delay is lower, and it can meet the real-time requirements in some scenarios.
  • Figure 1 Figure 1 and Figure 3 are schematic diagrams of three different application scenarios of neural networks for processing different computer vision tasks provided by the embodiment of the present application;
  • Fig. 4 is a schematic flow chart of a neural network training method provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a residual block provided by an embodiment of the present application.
  • FIG. 6A is a schematic structural diagram of a residual block including an addition layer provided by an embodiment of the present application.
  • FIG. 6B is a schematic structural diagram of a residual block including a fusion layer provided by an embodiment of the present application.
  • Fig. 7 is a schematic diagram of the structure of the auxiliary operator introduced in the training process of the residual block provided by the embodiment of the present application;
  • FIG. 8A is a schematic structural diagram of introducing auxiliary operators and auxiliary parameters into the training process of the convolutional layer provided by the embodiment of the present application;
  • Fig. 8B is a schematic diagram of the structure of the convolutional layer provided by the embodiment of the present application, after the training is completed, the auxiliary operators and auxiliary parameters are eliminated and/or fused;
  • FIG. 9 is a schematic structural diagram of a 1-bit systolic array provided by an embodiment of the present application.
  • FIG. 10 is a schematic flowchart of an image processing method provided in an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a neural network training device provided by an embodiment of the present application.
  • the embodiment of the present application trains to obtain a neural network, which is used to process computer vision tasks.
  • the neural network includes at least one residual block, the convolution layer in the residual block performs binary convolution operation, and the quantized bit numbers of the input value and the output value of the residual block are both 1 bit.
  • the number of bits of data in the neural network is further quantified.
  • the convolutional layer performs binary convolution operations.
  • the input value and output value of the residual block are both 1-bit values, which helps to reduce the computing bandwidth and the storage capacity of the neural network, and has universal applicability.
  • the neural network trained in the embodiment of the present application includes a residual block, which can be used to replace the residual block of the non-binarization operation in the related art, so it can be transferred to most computer vision tasks without adjustment or with a small amount of adjustment, and has strong portability. That is to say, the neural network trained in the embodiment of the present application can be used to perform one or more of the following computer vision tasks: image classification task, image localization task, target detection task, target tracking task, semantic segmentation task, instance segmentation task or super-resolution reconstruction task, etc.
  • the neural network can be applied to different scenarios or different devices based on the computer vision tasks it handles.
  • the neural network trained in the present application can be installed on a mobile platform, which includes but not limited to drones, unmanned vehicles, mobile robots, unmanned ships or cloud platforms.
  • the movable platform includes a photographing device, and after the movable platform acquires the image taken by the photographing device, the image can be input into the neural network, so that the neural network can perform computer vision tasks based on the input image; the computer vision tasks include but are not limited to image positioning tasks, target detection tasks, or target tracking tasks.
  • the unmanned aerial vehicle is equipped with a photographing device, and the photographing device at least includes a photosensitive element, such as a complementary metal oxide semiconductor (Complementary Metal Oxide Semiconductor, CMOS) sensor or a charge-coupled device (Charge-coupled Device, CCD) sensor;
  • CMOS complementary Metal Oxide Semiconductor
  • CCD charge-coupled Device
  • the images captured by the photographing device include but are not limited to color images, grayscale images, infrared images, or depth images.
  • the shooting device 200 is used to shoot the following target object 300 .
  • the neural network can perform target detection based on the image to obtain obstacle information, so that the UAV replans the flight path 400 based on the obstacle information, realizes obstacle-avoiding flight, and ensures the flight safety of the UAV.
  • the neural network trained in this application can be installed in the terminal device.
  • the terminal device acquires the image to be processed, it can input the image to be processed into the neural network, so that the neural network can perform computer vision tasks based on the input image; the computer vision tasks include but are not limited to semantic segmentation tasks, instance segmentation tasks, or super-resolution reconstruction tasks.
  • the terminal devices include, but not limited to, but not limited to smart phone/mobile phone, tablet computer, personal digital assistant (PDA), knee computer, desktop computer, media content player, video game station/system, virtual reality system, augmented reality system, wearable device (for example, watches, glasses, gloves, headdresses (such as hats, helmets, virtual reality headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones,
  • the terminal device is connected in communication with a mobile platform (such as a UAV), and the UAV 100 can transmit the captured image to the terminal device 500.
  • a mobile platform such as a UAV
  • the terminal device 500 can input the received image into the neural network for resolution reconstruction processing, so as to obtain an image with better image quality (such as an image with a higher resolution).
  • a neural network training method provided in the embodiments of the present application may be applied to a training device.
  • the training device may be an electronic device with data processing capabilities, such as a computer, server, cloud server or terminal, etc.; it may also be a computer chip or an integrated circuit with data processing capabilities, such as a central processing unit (Central Processing Unit, CPU), a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC) or a ready-made programmable gate array (Field-Programmable Gate Array, FPGA), etc.
  • CPU Central Processing Unit
  • DSP digital signal processor
  • ASIC Application Specific Integrated Circuit
  • FPGA ready-made programmable gate array
  • FIG. 4 is a schematic flowchart of a neural network training method provided by the embodiment of the present application.
  • the neural network is used to process computer vision tasks, and the method can be performed by training a device, and the method includes:
  • step S101 an image sample carrying a label is obtained.
  • a preset neural network is trained according to the image sample carrying the label; wherein, the neural network includes at least one residual block; the convolutional layer in the residual block performs binary convolution operation, and the quantized bit numbers of the input value and the output value of the residual block are both 1 bit.
  • the convolutional layer in the residual block performs a binary convolution operation.
  • the input value and output value of the residual block are both 1-bit values, which helps to reduce the computing bandwidth and the storage capacity of the neural network, and has universal applicability. For example, it is suitable for small devices with limited operating resources and storage resources (such as wearable devices, mobile terminals or small drones, etc.), and the amount of data involved in the calculation is reduced, and the delay is lower, which can meet the real-time requirements in some scenarios.
  • the number of quantized bits of the input value and the output value of the residual block is 1 bit (bit);
  • the residual block includes a fusion layer 20, at least one convolutional layer 10 located in the main branch, and at least one convolutional layer 10 located in the jumper branch; wherein, the number of convolutional layers 10 in the main branch is more than the number of convolutional layers 10 in the jumper branch.
  • the convolution layer 10 performs a binary convolution operation, and the quantization bits of the input value and the weight value of the convolution layer 10 are both 1 bit, which helps to reduce the operation bandwidth and improve the operation efficiency.
  • the fusion layer 20 is used to fuse the output value of the convolutional layer 10 in the main branch and the output value of the convolutional layer 10 in the jumper branch.
  • the quantized bit number of the output value of the convolutional layer 10 is 1 bit, so that the input value of the convolutional layer 10 is 1 bit, without PReLU, Sign or other nonlinear function processing, which saves frequent precision conversion and reduces the overall data movement amount. It is also beneficial to improve computing efficiency and reduce computing resources.
  • the quantization bit number of the output value of the convolution layer 10 is greater than 1 bit, such as the quantization bit number of the output value of the convolution layer 10 is any of the following: 2 bits, 4 bits, 8 bits or 16 bits, and the quantization bit number of the output value of the convolution layer 10 can be set according to the actual application scene.
  • the fusion layer 20 can convert the fusion result obtained by fusing the output value of the convolution layer 10 in the main branch and the output value of the convolution layer 10 in the jumper branch into 1 bit, that is, the quantization bit number of the output value of the fusion layer 20 is 1 bit.
  • the fusion layer 20 uses data whose quantization bit number is greater than 1 bit for fusion, which is beneficial to improve fusion accuracy.
  • the fusion layer 20 includes an addition layer 21 or a splicing layer 22 .
  • the addition layer 21 is used to add the output values of at least two convolutional layers 10 connected to the addition layer 21, for example, the output value of the convolutional layer 10 of the main branch and the output value of the convolutional layer 10 of the jumper branch can be added bit by bit.
  • the output value of the convolutional layer 10 of the main branch and the output value of the convolutional layer 10 of the jumper branch can be added bit by bit.
  • the splicing layer 22 is used to splice the output values of at least two convolutional layers 10 connected to the splicing layer 22, for example, the output values corresponding to the convolutional layer 10 of the main branch and the convolutional layer 10 of the jumper branch can be spliced in cascade.
  • the residual block provided by this embodiment has a simple structure.
  • the input value of the convolutional layer 10 in the residual block and the number of quantized bits of the weight value are both 1 bit, that is, the convolutional layer 10 performs a binary convolution operation, and only fusion processing of more than 1 bit is performed in the fusion layer 20. This simplicity helps to reduce data movement and bandwidth, and has lower delay.
  • the residual block also includes at least one convolutional layer 10 after the splicing layer 22, if the fusion layer 20 is a splicing layer 22, the output value of the residual block is output by the next convolutional layer 10 of the splicing layer 22, and the quantization bit number of the output value of the residual block is 1 bit.
  • the residual block provided by this embodiment can be used to replace the residual block of the non-binarization operation in the related art, and it can be transferred to most computer vision tasks without adjustment or with a small amount of adjustment, and has strong portability.
  • the neural network obtained by training in this embodiment and including the residual block with the structure shown in FIG. 6A or FIG. 6B can be used to perform different computer vision tasks, such as image classification tasks, image positioning tasks, target detection tasks, target tracking tasks, semantic segmentation tasks, instance segmentation tasks or super-resolution reconstruction tasks, etc.
  • corresponding labels can be determined based on expected computer vision tasks, and the labels corresponding to different computer vision tasks are different.
  • the label is the category to which each pixel in the image sample belongs.
  • the label is position information of an object in an image sample.
  • the embodiment of the present application realizes that during the training process of the neural network, auxiliary parameters and auxiliary operators are introduced to assist the training of the convolutional layer, thereby helping to improve network performance.
  • the weight of the convolutional layer corresponds to a first auxiliary parameter and a second auxiliary parameter; wherein, the first auxiliary parameter is used to control the degree of quantization of the floating-point weight of the convolutional layer into 1 bit; the second auxiliary parameter is used to indicate the degree of scaling of the quantized weight.
  • a first auxiliary operator and a second auxiliary operator are introduced, and the floating-point weight of the convolution layer is quantized by using the first auxiliary operator and the second auxiliary operator; wherein, the first auxiliary operator is used to quantize the floating-point weight of the convolution layer into 1 bit according to the first auxiliary parameter in the forward transfer process; the second auxiliary operator is used to determine the sign of the quantized weight.
  • the first auxiliary operator includes a Tanh function; the second auxiliary operator includes a sign function.
  • the output value of the convolutional layer corresponds to a third auxiliary parameter, a fourth auxiliary parameter, and at least two fifth auxiliary parameters;
  • the third auxiliary parameter is used to control the quantization degree of quantizing the output value of the convolutional layer into 1 bit;
  • the fourth auxiliary parameter is used to indicate the scaling degree of the quantized output value;
  • the at least two fifth auxiliary parameters are different offsets of the output value.
  • the training device uses the third auxiliary operator, the fourth auxiliary operator and the second auxiliary operator to quantize the floating-point output value of the convolution layer 10; wherein, the third auxiliary operator is used to perform nonlinear processing on the floating-point output value of the convolution layer 10 according to the third auxiliary parameter and one of the fifth auxiliary parameters; Used to determine the sign of the quantized output value.
  • the third auxiliary operator includes an activation function
  • the fourth auxiliary operator includes a hard-tanh function
  • the second auxiliary operator includes a sign function.
  • the weight of the convolutional layer is W
  • the input value is X
  • the output value is A.
  • W ⁇ R N ⁇ C ⁇ K ⁇ K of the convolutional layer the input value is X ⁇ R B ⁇ H ⁇ W ⁇ C
  • the output value A Conv(W,X), that is, the output value is the result of the convolution operation between the weight of the convolutional layer and the input value;
  • N, C, H, W, B, and K are the output channel, input channel, height, width, batch size, and kernel size, respectively.
  • the weight of the convolutional layer is W
  • the floating-point weight is W f
  • the quantized weight is W b
  • the first auxiliary parameter is ⁇
  • the second auxiliary parameter is ⁇ .
  • can adopt different ranges (such as ⁇ RN , or RN ⁇ C , or RN ⁇ C ⁇ K ⁇ K ) to adjust the required degree of overparameterization, and the size of each ⁇ coefficient controls the sharpness of the approximation to binarization.
  • ⁇ ⁇ R N is a scaling factor for each output channel to compensate for the magnitude difference between W f and W b in ⁇ 1,1 ⁇ .
  • ⁇ and ⁇ can be adjusted during training.
  • auxiliary parameters and auxiliary operators are introduced into the weight of the convolutional layer during the training process, which expands the network capacity and helps to improve network performance.
  • the output value of the convolutional layer is A
  • the floating-point output value is A f
  • the quantized output value is A b
  • the third auxiliary parameter is ⁇
  • the fourth auxiliary parameter is ⁇
  • at least two fifth auxiliary parameters include b 0 and b 1 .
  • a b Sign(Htanh(PReLU( ⁇ A f +b 0 )+b 1 )) (3);
  • Htanh is the hard-tanh function
  • using the hard-tanh function can clamp the input at [-1,1] during the forward pass, and use the sinusoid in the backward pass.
  • the PReLU function and the Sign function use Straight-Through-Estimator (STE) to calculate the gradient.
  • ST Straight-Through-Estimator
  • the transformation process described above can reshape the input distribution and help regulate the training process of the neural network.
  • auxiliary parameters and auxiliary operators are introduced into the output value of the convolutional layer during the training process, which expands the network capacity and helps to improve network performance.
  • batch norm processing (BatchNorms) during the training process to process (such as whitening processing) the floating-point output value A f obtained by converting the scaling factor to solve the problem of data distribution changes.
  • the auxiliary parameters and auxiliary operators can be absorbed into the regular parameters of the convolutional layer, so that the operation efficiency will not be reduced while ensuring the accuracy of the neural network.
  • FIG. 8B introduces auxiliary parameters and auxiliary operators to expand the network capacity.
  • the auxiliary parameters and auxiliary operators can be absorbed into the regular parameters of the convolution layer, so that the convolution layer shown in FIG. 8B can be obtained.
  • the convolution layer performs binary convolution operations.
  • the input value of the convolution layer and the quantization bit number of the weight value are both 1 bit.
  • the training device may eliminate and/or fuse auxiliary parameters and auxiliary operators corresponding to the weight value of the convolutional layer; wherein, the quantized weight of the convolutional layer can be simplified as follows: determined according to the sign of the first auxiliary parameter and the sign of the floating-point weight.
  • the auxiliary parameters and auxiliary operators corresponding to the weights in the training process can be absorbed into a simple form during the inference process.
  • binarization only cares about the relative value of two numbers, regardless of their magnitude.
  • W b Sign( ⁇ ) ⁇ Sign(W f )
  • the training device may eliminate and/or fuse auxiliary parameters and auxiliary operators corresponding to the output value of the convolutional layer.
  • auxiliary operators shown in the dashed box in FIG. 7 and the auxiliary parameters not shown in FIG. 7 , which can be fused or eliminated after the neural network training is completed to obtain the structure shown in FIG. 5 .
  • the quantized output value of the convolutional layer can be simplified as: determined according to the sign of the third auxiliary parameter and the sign of the preset difference; the preset difference is the difference between the floating-point output value and the preset parameter; the preset parameter is determined according to the third auxiliary parameter and the at least two fifth auxiliary parameters.
  • the auxiliary parameters and auxiliary operators corresponding to the output values during the training process can be absorbed into a simple form during the reasoning process. Essentially, binarization only cares about the relative value of two numbers, regardless of their magnitude.
  • a b Sign(Htanh(PReLU( ⁇ A f +b 0 )+b 1 ))
  • a b (n) Sign( ⁇ (n))Sign(A f (n) ⁇ (n)).
  • the quantized output value of the convolutional layer ie the binarized output value
  • ⁇ (n) ⁇ R is a threshold depending on b 0 (n), b 1 (n) and ⁇ (n) of channel n.
  • the reason for this simplification is that given b 0 (n), b 1 (n) and ⁇ (n), all transformations in equation (3) are monotonic, so only the zero point ⁇ (n) needs to be solved to determine the symbol A b (n).
  • the operation process of the neural network only involves bitwise convolution and threshold ( ⁇ (n)), which simplifies the reasoning form on the basis of ensuring the network accuracy remains unchanged.
  • the convolution output can be approximated as: Conv(W f ,A f ) ⁇ ( ⁇ )Conv(W b ,A b ).
  • the second auxiliary parameter ( ⁇ ) corresponding to the weight of the convolutional layer and the fourth auxiliary parameter ( ⁇ ) corresponding to the output value of the convolutional layer are the scaling factors of each channel, ( ⁇ ) ⁇ RN , which can be brought into the next layer and can be absorbed during the processing of the activation function of the next convolutional layer.
  • the activation function as the PReLU function as an example
  • the second auxiliary parameter ( ⁇ ) and the fourth auxiliary parameter ( ⁇ ) can be absorbed into the PReLU operation of the next layer, so they do not need to be calculated during inference.
  • the batch normalization process is also suitable for parameter fusion.
  • the auxiliary parameters and auxiliary operators corresponding to the output values of the convolutional layers may be fused during the batch normalization process.
  • ⁇ , ⁇ R N be the scale and deviation of batch normalization processing (BatchNorm)
  • x is the output value A f (n) of the convolutional layer
  • the simplification principle of the PReLU function is the same as the simplification principle of (3) above.
  • the number of channels of the input value and/or output value of the residual block can be further expanded during the training process, and since the expanded channels are binarized, the memory occupied by the 8-bit quantized neural network is less.
  • the neural network in order to improve the accuracy of the neural network, can also include at least one network block, the quantization bit number of the input value or output value of the network block is greater than 1 bit, such as the input value or output value of the network block includes but not limited to 2 bits, 4 bits, 8 bits or 16 bits, etc.; the input value and the quantization bit number of the weight of the convolution layer in the network block are also greater than 1 bit, such as the convolution layer.
  • the network block can be located at any position of the neural network, and this embodiment does not impose any limitation on this, for example, the network block is arranged at the beginning and end of the neural network, and at least one residual block is arranged in the middle. Alternatively, the network block may also be set at an intermediate position in the neural network, which may be specifically set according to an actual application scenario.
  • the number of channels of the input value and/or output value of the residual block can be set to be greater than the number of channels of the input value and/or output value of the network block.
  • the number of channels of the input value and/or output value of the residual block is 2 times or 4 times the number of channels of the input value and/or output value of the network block.
  • the convolution operation of the convolution layer in the residual block is performed through a specified systolic array; the systolic array includes a plurality of processing units supporting 1-bit operation. Considering that the quantization bit numbers of the input value and the output value of the convolution layer are both 1 bit, the input bandwidth and the output bandwidth of the systolic array can be set to be the same. Exemplarily, the systolic array may be a square array to ensure the same bandwidth for data loading and writing.
  • the systolic array includes multiple input lines for 1-bit data input.
  • the weights of the convolutional layers are stored in NHWC format.
  • the weights of the convolutional layers can also be stored in NCHW format.
  • the utilization rate of MAC using data multiplexing and different data arrangements is different (the utilization rate of NCHW arrangement depends on whether the W (width) dimension can divide the MAC size, and the NHWC arrangement utilization depends on whether the C (channel) dimension can divide the MAC size).
  • FIG. 9 shows a schematic diagram of a 1-bit systolic array, which is used to perform bit-by-bit convolution operations according to the weights and input values of the convolutional layer.
  • the 1bit systolic array is obtained by changing the 16bit*16bit systolic array.
  • 1bit PE the smallest processing unit of the systolic array
  • the 1bit systolic array includes multiple 1bit PEs.
  • each 8bit input line is replaced with 8 1bit input lines, basically creating 8 times as many lines in the 1bit design. Since the array is square, the number of columns is also 8 times larger. Compared with 8bit, this means that the PE has been increased by 64 times, theoretically accelerated by 64 times, and the data processing efficiency has been significantly improved.
  • the embodiment of the present application also provides an image processing method, including:
  • step S201 an image to be processed is acquired.
  • step S202 the image to be processed is input into a pre-trained neural network to obtain an image processing result; wherein, the neural network includes at least one residual block; the convolution layer in the residual block performs a binary convolution operation, and the number of quantized bits of the input value and output value of the residual block is 1 bit.
  • the neural network is trained based on the training method shown in the embodiment shown in FIG. 4 .
  • the training method shown in the embodiment shown in FIG. 4 For relevant parts, refer to the description of the embodiment shown in FIG. 4 , which will not be repeated here.
  • the neural network is used to process any one of the following computer vision tasks: image classification task, image localization task, object detection task, object tracking task, semantic segmentation task, instance segmentation task or super-resolution reconstruction task.
  • the embodiment of the present application also provides a neural network training device, the neural network is used for processing computer vision tasks, and the device includes:
  • memory 102 for storing executable instructions
  • processors 101 one or more processors 101;
  • processors 101 when the one or more processors 101 execute the executable instructions, they are individually or collectively configured to execute the above method.
  • the processor 101 executes the executable instructions included in the memory 102.
  • the processor 101 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the memory 102 stores executable instructions of the training method of the neural network
  • the memory 102 may include at least one type of storage medium
  • the storage medium includes a flash memory, a hard disk, a multimedia card, a card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
  • the device may cooperate with a network storage device that performs a storage function of the memory through a network connection.
  • the memory 102 may be an internal storage unit of the device, such as a hard disk or internal memory of the device.
  • the memory 102 can also be an external storage device of the device, such as a plug-in hard disk equipped on the device, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash memory card (Flash Card) and the like. Further, the storage 102 may also include both an internal storage unit of the apparatus and an external storage device.
  • the memory 102 is used to store computer programs and other data required by the device.
  • the memory 102 can also be used to temporarily store data that has been output or will be output.
  • processors 101 when the one or more processors 101 execute the executable instructions, they are individually or jointly configured to:
  • the neural network includes at least one residual block; the convolution layer in the residual block performs a binary convolution operation, and the number of quantized bits of the input value and the output value of the residual block is 1 bit.
  • the residual block includes a fusion layer, at least one convolution layer located in the main branch, and at least one convolution layer located in the jump branch; the input value of the convolution layer and the quantization bit number of the weight value are both 1 bit; the fusion layer is used to fuse the output value of the convolution layer in the main branch and the output value of the convolution layer in the jump branch; wherein, the number of convolution layers in the main branch is more than the number of convolution layers in the jump branch.
  • the quantization bit number of the output value of the convolution layer is greater than 1 bit; if the next layer of the convolution layer is not the fusion layer, the quantization bit number of the output value of the convolution layer is 1 bit.
  • the number of quantization bits of the output value of the fusion layer is 1 bit.
  • the number of quantized bits of the output value of the convolutional layer is any of the following: 2 bits, 4 bits, 8 bits or 16 bits.
  • the fusion layer includes an addition layer or a stitching layer; the addition layer is used to add the output values of at least two convolutional layers connected to the addition layer; the stitching layer is used to stitch the output values of at least two convolutional layers connected to the stitching layer.
  • the output value of the residual block is output by the addition layer; if the fusion layer is a splicing layer, the output value of the residual block is output by the next convolutional layer of the splicing layer.
  • tags corresponding to different computer vision tasks are different.
  • the computer vision task includes any one or more of the following: image classification task, image localization task, object detection task, object tracking task, semantic segmentation task, instance segmentation task or super-resolution reconstruction task.
  • auxiliary parameters and auxiliary operators are introduced to assist the training of the convolutional layer; after the training of the neural network is completed, the auxiliary parameters and auxiliary operators are absorbed into the parameters of the convolutional layer.
  • the weight of the convolutional layer corresponds to a first auxiliary parameter and a second auxiliary parameter; wherein, the first auxiliary parameter is used to control the degree of quantization of the floating-point weight of the convolutional layer into 1 bit; the second auxiliary parameter is used to indicate the scaling degree of the quantized weight.
  • the processor is further configured to: during the training process, use a first auxiliary operator and a second auxiliary operator to quantize the floating-point weight of the convolutional layer; wherein the first auxiliary operator is used to quantize the floating-point weight of the convolutional layer into 1 bit according to the first auxiliary parameter in the forward pass process; and the second auxiliary operator is used to determine the sign of the quantized weight.
  • the first auxiliary operator includes a Tanh function; the second auxiliary operator includes a sign function.
  • the processor is further configured to: after the neural network training is completed, eliminate and/or fuse auxiliary parameters and auxiliary operators corresponding to the weight values of the convolutional layer; wherein, the quantized weight of the convolutional layer can be simplified as: determined according to the sign of the first auxiliary parameter and the sign of the floating-point weight.
  • the output value of the convolutional layer corresponds to a third auxiliary parameter, a fourth auxiliary parameter, and at least two fifth auxiliary parameters;
  • the third auxiliary parameter is used to control the quantization degree of quantizing the output value of the convolutional layer into 1 bit;
  • the fourth auxiliary parameter is used to indicate the scaling degree of the quantized output value;
  • the at least two fifth auxiliary parameters are different offsets of the output value.
  • the processor is further configured to: during the training process, use a third auxiliary operator, a fourth auxiliary operator, and a second auxiliary operator to quantize the floating-point output value of the convolutional layer; wherein, the third auxiliary operator is used to perform nonlinear processing on the floating-point output value of the convolutional layer according to the third auxiliary parameter and one of the fifth auxiliary parameters; The sign of the output value.
  • the third auxiliary operator includes an activation function
  • the fourth auxiliary operator includes a hard-tanh function
  • the second auxiliary operator includes a sign function
  • the processor is further configured to: after the neural network training is completed, eliminate and/or fuse auxiliary parameters and auxiliary operators corresponding to the output values of the convolutional layer; wherein, the quantized output value of the convolutional layer can be simplified as follows: determined according to the sign of the third auxiliary parameter and the sign of a preset difference; the preset difference is the difference between the floating-point output value and a preset parameter; the preset parameter is determined according to the third auxiliary parameter and the at least two fifth auxiliary parameters.
  • the second auxiliary parameter corresponding to the weight of the convolutional layer and the fourth auxiliary parameter corresponding to the output value of the convolutional layer can be absorbed during the processing of the activation function of the next convolutional layer.
  • the neural network further includes at least one network block, and the number of quantized bits of the input value or output value of the network block is greater than 1 bit.
  • the number of channels of the input value and/or output value of the residual block is greater than the number of channels of the input value and/or output value of the network block.
  • the convolution operation of the convolution layer in the residual block is performed through a specified systolic array; the systolic array includes a plurality of processing units supporting 1-bit operation.
  • the input bandwidth and the output bandwidth of the systolic array are the same.
  • the pulsation array is a square array.
  • the systolic array includes a plurality of input lines for 1-bit data input.
  • the weights of the convolutional layers are stored in NHWC format.
  • an image processing device including:
  • processors one or more processors
  • the one or more processors execute the executable instructions, they are individually or jointly configured to:
  • the image to be processed is input into a pre-trained neural network to obtain image processing results; wherein the neural network includes at least one residual block; the convolution layer in the residual block performs binary convolution operation, and the quantized bit numbers of the input value and the output value of the residual block are 1 bit.
  • the neural network is used to process any one of the following computer vision tasks: image classification task, image localization task, object detection task, object tracking task, semantic segmentation task, instance segmentation task or super-resolution reconstruction task.
  • an embodiment of the present application also provides an image processing system, including the above-mentioned image processing device and a movable platform;
  • the movable platform is provided with a photographing device, and the movable platform is used to send the image captured by the photographing device to the image processing device.
  • the mobile platform includes, but is not limited to, a drone, an unmanned vehicle, a mobile robot, an unmanned ship, or a cloud platform.
  • the image processing device may be integrated in a mobile platform, as shown in FIG. 1 or FIG. 2 .
  • the image processing apparatus may be installed in a terminal device, and the terminal device is communicatively connected to the movable platform.
  • the terminal device may be, for example, a remote controller of the movable platform, as shown in FIG. 3 .
  • non-transitory computer-readable storage medium including instructions, such as a memory including instructions, which are executable by a processor of an apparatus to perform the above method.
  • the non-transitory computer readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.
  • a non-transitory computer-readable storage medium enabling the terminal to execute the above method when instructions in the storage medium are executed by a processor of the terminal.
  • Various implementations described herein can be implemented using a computer readable medium such as computer software, hardware, or any combination thereof.
  • the embodiments described herein may be implemented using at least one of an Application Specific Integrated Circuit (ASIC), Digital Signal Processor (DSP), Digital Signal Processing Device (DSPD), Programmable Logic Device (PLD), Field Programmable Gate Array (FPGA), processor, controller, microcontroller, microprocessor, electronic unit designed to perform the functions described herein.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processor
  • DSPD Digital Signal Processing Device
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • processor controller, microcontroller, microprocessor, electronic unit designed to perform the functions described herein.
  • an embodiment such as a procedure or a function may be implemented with a separate software module that allows at least one function or operation to be performed.
  • the software codes can be implemented by a software application (or program) written in any suitable programming language, stored in memory and executed by a controller.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

A neural network training method, an image processing method and device, a system, and a storage medium. The neural network training method comprises: obtaining an image sample carrying a label; and training a preset neural network according to the image sample carrying the label, wherein the neural network comprises at least one residual block, a convolutional layer in the residual block performs binary convolution operation, and the quantization bit number of an input value of the residual block and the quantization bit number of an output value of the residual block are both 1 bit. The operation bandwidth and the storage capacity of the neural network are reduced.

Description

神经网络的训练方法、图像处理方法、装置、系统及介质Neural network training method, image processing method, device, system and medium 技术领域technical field
本申请涉及图像处理技术领域,具体而言,涉及一种神经网络的训练方法、图像处理方法、装置、系统及存储介质。The present application relates to the technical field of image processing, in particular, to a neural network training method, image processing method, device, system and storage medium.
背景技术Background technique
随着技术的发展,神经网络技术应用于生活中的方方面面,比如利用神经网络技术进行图像识别(诸如人脸识别或者表情识别等)任务、或者图像分类任务等。With the development of technology, neural network technology is applied to all aspects of life, such as using neural network technology for image recognition (such as face recognition or expression recognition, etc.) tasks, or image classification tasks.
然而,神经网络的运行是一个计算密集和存储密集的过程。为了减少网络存储量,提高运行效率,其中一个改进方向是量化加速,即通过对神经网络中的浮点值进行量化处理,裁剪掉数据的冗余精度,使得浮点数计算转换为位操作(或者小整数计算),不仅能够减少网络的存储,而且能够大幅度进行加速。However, the operation of a neural network is a computationally and memory intensive process. In order to reduce the amount of network storage and improve operating efficiency, one of the improvement directions is quantization acceleration, that is, by quantizing the floating-point values in the neural network, the redundant precision of the data is cut out, so that floating-point calculations are converted into bit operations (or small integer calculations), which can not only reduce network storage, but also greatly accelerate.
其中,8bit量化神经网络(即将神经网络中的浮点值量化成8bit)是目前较为常用的量化模型。但对于运行资源和存储资源受限的小型设备(如可穿戴设备、移动终端或者小型无人机等)而言,8bit量化神经网络在推理过程中仍需占据大部分资源,从而影响小型设备执行其他任务。Among them, the 8-bit quantized neural network (that is, the floating-point value in the neural network is quantized to 8 bits) is currently a more commonly used quantization model. However, for small devices with limited operating resources and storage resources (such as wearable devices, mobile terminals, or small drones, etc.), the 8-bit quantized neural network still needs to occupy most of the resources during the inference process, thereby affecting small devices to perform other tasks.
发明内容Contents of the invention
有鉴于此,本申请的目的之一是提供一种神经网络的训练方法、图像处理方法、装置、系统及存储介质。In view of this, one of the objectives of the present application is to provide a neural network training method, image processing method, device, system and storage medium.
第一方面,本申请实施例提供了一种神经网络的训练方法,所述神经网络用于处理计算机视觉任务,所述方法包括:In the first aspect, the embodiment of the present application provides a training method of a neural network, the neural network is used to process computer vision tasks, the method includes:
获取携带有标签的图像样本;Obtain image samples with labels;
根据所述携带有标签的图像样本,对预设的神经网络进行训练;Training a preset neural network according to the labeled image samples;
其中,所述神经网络包括有至少一个残差块;所述残差块中的卷积层进行二进制卷积运算,所述残差块的输入值和输出值的量化比特数均为1比特。Wherein, the neural network includes at least one residual block; the convolution layer in the residual block performs a binary convolution operation, and the number of quantized bits of the input value and the output value of the residual block is 1 bit.
第二方面,本申请实施例提供了一种图像处理方法,包括:In a second aspect, the embodiment of the present application provides an image processing method, including:
获取待处理图像;Get the image to be processed;
将所述待处理图像输入预先训练的神经网络中,获取图像处理结果;其中,所述神经网络包括有至少一个残差块;所述残差块中的卷积层进行二进制卷积运算,所述残差块的输入值和输出值的量化比特数均为1比特。The image to be processed is input into a pre-trained neural network to obtain image processing results; wherein the neural network includes at least one residual block; the convolution layer in the residual block performs binary convolution operation, and the quantized bit numbers of the input value and the output value of the residual block are 1 bit.
第三方面,本申请实施例提供了一种神经网络的训练装置,所述神经网络用于处理计算机视觉任务,所述装置包括:In a third aspect, the embodiment of the present application provides a neural network training device, the neural network is used to process computer vision tasks, and the device includes:
用于存储可执行指令的存储器;memory for storing executable instructions;
一个或多个处理器;one or more processors;
其中,所述一个或多个处理器执行所述可执行指令时,被单独地或共同地配置成执行第一方面所述的方法。Wherein, when the one or more processors execute the executable instructions, they are individually or collectively configured to execute the method described in the first aspect.
第四方面,本申请实施例提供了一种图像处理装置,包括:In a fourth aspect, the embodiment of the present application provides an image processing device, including:
用于存储可执行指令的存储器;memory for storing executable instructions;
一个或多个处理器;one or more processors;
其中,所述一个或多个处理器执行所述可执行指令时,被单独地或共同地配置成执行第二方面所述的方法。Wherein, when the one or more processors execute the executable instructions, they are individually or collectively configured to execute the method described in the second aspect.
第五方面,本申请实施例提供了一种图像处理系统,包括第四方面所述的图像处理装置和可移动平台;In the fifth aspect, the embodiment of the present application provides an image processing system, including the image processing device and the mobile platform described in the fourth aspect;
所述可移动平台设置有拍摄装置,所述可移动平台用于将拍摄装置拍摄的图像发送给所述图像处理装置。The movable platform is provided with a photographing device, and the movable platform is used to send the image captured by the photographing device to the image processing device.
第六方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有可执行指令,所述可执行指令被处理器执行时实现第一方面或第二方面所述的方法。In a sixth aspect, an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores executable instructions, and when the executable instructions are executed by a processor, the method described in the first aspect or the second aspect is implemented.
本申请实施例所提供的一种神经网络的训练方法、图像处理方法、装置、系统及存储介质。利用携带有标签的图像样本训练得到一神经网络以及利用该神经网络处理计算机视觉任务;其中,所述神经网络包括有至少一个残差块;所述残差块中的卷积层进行二进制卷积运算,降低了计算复杂度,所述残差块的输入值和输出值的量化比特数均为1比特,有助于减少运算带宽以及神经网络的存储量,具有普遍适用性,比如适用于运行资源和存储资源受限的小型设备(如可穿戴设备、移动终端或者小型无人机等),并且参与运算的数据量减少了,延迟更低,能够满足某些场景下的实时性要求。The embodiments of the present application provide a neural network training method, image processing method, device, system, and storage medium. A neural network is obtained by training image samples with labels and the neural network is used to process computer vision tasks; wherein, the neural network includes at least one residual block; the convolutional layer in the residual block performs binary convolution operation, which reduces the computational complexity, and the quantization bits of the input value and output value of the residual block are 1 bit, which helps to reduce the computing bandwidth and the storage capacity of the neural network, and has universal applicability. The amount is reduced, the delay is lower, and it can meet the real-time requirements in some scenarios.
附图说明Description of drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the accompanying drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without paying creative labor.
图1、图2和图3是本申请实施例提供的用于处理不同计算机视觉任务的神经网络的三种不同的应用场景示意图;Figure 1, Figure 2 and Figure 3 are schematic diagrams of three different application scenarios of neural networks for processing different computer vision tasks provided by the embodiment of the present application;
图4是本申请实施例提供的一种神经网络的训练方法的流程示意图;Fig. 4 is a schematic flow chart of a neural network training method provided by an embodiment of the present application;
图5是本申请实施例提供的一种残差块的结构示意图;FIG. 5 is a schematic structural diagram of a residual block provided by an embodiment of the present application;
图6A是本申请实施例提供的包括有加法层的残差块的结构示意图;FIG. 6A is a schematic structural diagram of a residual block including an addition layer provided by an embodiment of the present application;
图6B是本申请实施例提供的包括有融合层的残差块的结构示意图;FIG. 6B is a schematic structural diagram of a residual block including a fusion layer provided by an embodiment of the present application;
图7是本申请实施例提供的残差块在训练过程中引入辅助算子的结构示意图;Fig. 7 is a schematic diagram of the structure of the auxiliary operator introduced in the training process of the residual block provided by the embodiment of the present application;
图8A是本申请实施例提供的卷积层在训练过程中引入辅助算子和辅助参数的结构示意图;FIG. 8A is a schematic structural diagram of introducing auxiliary operators and auxiliary parameters into the training process of the convolutional layer provided by the embodiment of the present application;
图8B是本申请实施例提供的卷积层在训练完成之后消除和/或融合了辅助算子和辅助参数的结构示意图;Fig. 8B is a schematic diagram of the structure of the convolutional layer provided by the embodiment of the present application, after the training is completed, the auxiliary operators and auxiliary parameters are eliminated and/or fused;
图9是本申请实施例提供的一种1bit脉动阵列的结构示意图;FIG. 9 is a schematic structural diagram of a 1-bit systolic array provided by an embodiment of the present application;
图10是本申请实施例提供的一种图像处理方法的流程示意图;FIG. 10 is a schematic flowchart of an image processing method provided in an embodiment of the present application;
图11是本申请实施例提供的一种神经网络的训练装置的结构示意图。FIG. 11 is a schematic structural diagram of a neural network training device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some of the embodiments of the application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.
考虑到相关技术中的8bit量化神经网络在推理过程中仍需占据小型设备的大部分资源,从而影响小型设备执行其他任务。本申请实施例训练得到一种神经网络,该神经网络用于处理计算机视觉任务,该神经网络包括有至少一个残差块,所述残差块中的卷积层进行二进制卷积运算,所述残差块的输入值和输出值的量化比特数均为1比 特。本实施例中,实现进一步量化神经网络中的数据的比特数,卷积层进行的是二进制卷积运算,残差块的输入值和输出值均为1bit数值,有助于减少运算带宽以及神经网络的存储量,具有普遍适用性,比如适用于运行资源和存储资源受限的小型设备(如可穿戴设备、移动终端或者小型无人机等),并且参与运算的数据量减少了,延迟更低,能够满足某些场景下的实时性要求。Considering that the 8-bit quantized neural network in the related technology still needs to occupy most of the resources of the small device during the inference process, thus affecting the performance of other tasks on the small device. The embodiment of the present application trains to obtain a neural network, which is used to process computer vision tasks. The neural network includes at least one residual block, the convolution layer in the residual block performs binary convolution operation, and the quantized bit numbers of the input value and the output value of the residual block are both 1 bit. In this embodiment, the number of bits of data in the neural network is further quantified. The convolutional layer performs binary convolution operations. The input value and output value of the residual block are both 1-bit values, which helps to reduce the computing bandwidth and the storage capacity of the neural network, and has universal applicability.
本申请实施例训练好的神经网络包括残差块,其可用于替代相关技术中非二值化运算的残差块,因此无需调整或者经过少量调整即可将其转移到大多数计算机视觉任务中,具有较强的可移植性。即是说,本申请实施例训练好的神经网络可用于执行以下一种或多种计算机视觉任务:图像分类任务、图像定位任务、目标检测任务、目标跟踪任务、语义分割任务、实例分割任务或者超分辨率重建任务等等。The neural network trained in the embodiment of the present application includes a residual block, which can be used to replace the residual block of the non-binarization operation in the related art, so it can be transferred to most computer vision tasks without adjustment or with a small amount of adjustment, and has strong portability. That is to say, the neural network trained in the embodiment of the present application can be used to perform one or more of the following computer vision tasks: image classification task, image localization task, target detection task, target tracking task, semantic segmentation task, instance segmentation task or super-resolution reconstruction task, etc.
则所述神经网络基于其所处理的计算机视觉任务,可应用于不同的场景或不同的设备中。Then the neural network can be applied to different scenarios or different devices based on the computer vision tasks it handles.
在一示例性的实施例中,本申请训练好的神经网络可安装到可移动平台中,所述可移动平台包括但不限于无人机、无人驾驶车辆、移动机器人、无人驾驶船只或者云台。所述可移动平台包括有拍摄装置,所述可移动平台在获取拍摄装置拍摄的图像之后,可将所述图像输入神经网络中,以便由所述神经网络基于输入的图像执行计算机视觉任务;所述计算机视觉任务包括但不限于图像定位任务、目标检测任务或者目标跟踪任务等。In an exemplary embodiment, the neural network trained in the present application can be installed on a mobile platform, which includes but not limited to drones, unmanned vehicles, mobile robots, unmanned ships or cloud platforms. The movable platform includes a photographing device, and after the movable platform acquires the image taken by the photographing device, the image can be input into the neural network, so that the neural network can perform computer vision tasks based on the input image; the computer vision tasks include but are not limited to image positioning tasks, target detection tasks, or target tracking tasks.
在一个例子中,以所述可移动平台为无人机为例,无人机搭载有拍摄装置,所述拍摄装置至少包括感光元件,该感光元件例如为互补金属氧化物半导体(Complementary Metal Oxide Semiconductor,CMOS)传感器或电荷耦合元件(Charge-coupled Device,CCD)传感器;所述拍摄装置拍摄的图像包括但不限于彩色图像、灰度图像、红外图像或者深度图像等等。In one example, taking the movable platform as an unmanned aerial vehicle as an example, the unmanned aerial vehicle is equipped with a photographing device, and the photographing device at least includes a photosensitive element, such as a complementary metal oxide semiconductor (Complementary Metal Oxide Semiconductor, CMOS) sensor or a charge-coupled device (Charge-coupled Device, CCD) sensor; the images captured by the photographing device include but are not limited to color images, grayscale images, infrared images, or depth images.
请参阅图1,以所述神经网络用于执行目标跟踪任务为例,在无人机100跟随拍摄场景中,利用拍摄装置200对跟随的目标对象300进行拍摄,无人机100在获取拍摄装置200拍摄的图像之后,可以由所述神经网络基于所述图像进行目标跟踪获得目标对象300的目标轨迹,以便无人机100基于所述目标对象300的目标轨迹对目标对象300进行精准地跟踪拍摄。Please refer to FIG. 1 , taking the neural network used to perform target tracking tasks as an example. In the following shooting scene of the UAV 100 , the shooting device 200 is used to shoot the following target object 300 .
请参阅图2,以所述神经网络用于执行目标检测任务为例,在无人机100执行飞行任务的过程中,利用拍摄装置对当前飞行环境进行拍摄,无人机100在获取拍摄装置拍摄的图像之后,可以由所述神经网络基于所述图像进行目标检测获取障碍物信息, 以便无人机基于所述障碍物信息重新规划飞行路径400,实现避障式飞行,保证无人机的飞行安全性。Please refer to FIG. 2 , taking the neural network as an example for performing a target detection task. During the flight mission of the UAV 100, the current flight environment is photographed by a photographing device. After the UAV 100 obtains the image taken by the photographing device, the neural network can perform target detection based on the image to obtain obstacle information, so that the UAV replans the flight path 400 based on the obstacle information, realizes obstacle-avoiding flight, and ensures the flight safety of the UAV.
在另一示例性的实施例中,本申请训练好的神经网络可安装到终端设备中,例如所述终端设备在获取待处理图像之后,可以将待处理图像输入神经网络中,以便由所述神经网络基于输入的图像执行计算机视觉任务;所述计算机视觉任务包括但不限于语义分割任务、实例分割任务或者超分辨率重建任务等。所述终端设备包括但不限于智能电话/手机、平板计算机、个人数字助理(PDA)、膝上计算机、台式计算机、媒体内容播放器、视频游戏站/系统、虚拟现实系统、增强现实系统、可穿戴式装置(例如,手表、眼镜、手套、头饰(例如,帽子、头盔、虚拟现实头戴耳机、增强现实头戴耳机、头装式装置(HMD)、头带)、挂件、臂章、腿环、鞋子、马甲)、遥控器、或者任何其他类型的装置。In another exemplary embodiment, the neural network trained in this application can be installed in the terminal device. For example, after the terminal device acquires the image to be processed, it can input the image to be processed into the neural network, so that the neural network can perform computer vision tasks based on the input image; the computer vision tasks include but are not limited to semantic segmentation tasks, instance segmentation tasks, or super-resolution reconstruction tasks. The terminal devices include, but not limited to, but not limited to smart phone/mobile phone, tablet computer, personal digital assistant (PDA), knee computer, desktop computer, media content player, video game station/system, virtual reality system, augmented reality system, wearable device (for example, watches, glasses, gloves, headdresses (such as hats, helmets, virtual reality headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones, wearing headphones. Augmented reality headphones, headwear -type devices (HMD), headbands), pendants, armbands, legs, shoes, vests), remote control, or any other type of device.
在一个例子中,请参阅图3,所述终端设备与可移动平台(例如无人机)通信连接,所述无人机100可以将拍摄得到的图像传输给终端设备500,以所述神经网络用于执行超分辨率重建任务为例,终端设备500可以将接收到的图像输入所述神经网络中进行分辨率重建处理,以便获取图像质量更佳的图像(比如分辨率更高的图像)。In an example, referring to FIG. 3 , the terminal device is connected in communication with a mobile platform (such as a UAV), and the UAV 100 can transmit the captured image to the terminal device 500. Taking the neural network as an example for performing a super-resolution reconstruction task, the terminal device 500 can input the received image into the neural network for resolution reconstruction processing, so as to obtain an image with better image quality (such as an image with a higher resolution).
在一些实施例中,本申请实施例提供的一种神经网络的训练方法可应用于训练装置中。示例性的,所述训练装置可以是具有数据处理能力的电子设备,如电脑、服务器、云端服务器或者终端等;也可以是具有数据处理能力的计算机芯片或者集成电路,例如中央处理单元(Central Processing Unit,CPU)、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)或者现成可编程门阵列(Field-Programmable Gate Array,FPGA)等。In some embodiments, a neural network training method provided in the embodiments of the present application may be applied to a training device. Exemplarily, the training device may be an electronic device with data processing capabilities, such as a computer, server, cloud server or terminal, etc.; it may also be a computer chip or an integrated circuit with data processing capabilities, such as a central processing unit (Central Processing Unit, CPU), a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC) or a ready-made programmable gate array (Field-Programmable Gate Array, FPGA), etc.
接下来对本申请实施例提供的训练方法进行说明:如图4所示,图4为本申请实施例提供的一种神经网络的训练方法的流程示意图。所述神经网络用于处理计算机视觉任务,所述方法可以训练装置来执行,所述方法包括:Next, the training method provided by the embodiment of the present application will be described: as shown in FIG. 4 , which is a schematic flowchart of a neural network training method provided by the embodiment of the present application. The neural network is used to process computer vision tasks, and the method can be performed by training a device, and the method includes:
在步骤S101中,获取携带有标签的图像样本。In step S101, an image sample carrying a label is obtained.
在步骤S102中,根据所述携带有标签的图像样本,对预设的神经网络进行训练;其中,所述神经网络包括有至少一个残差块;所述残差块中的卷积层进行二进制卷积运算,所述残差块的输入值和输出值的量化比特数均为1比特。In step S102, a preset neural network is trained according to the image sample carrying the label; wherein, the neural network includes at least one residual block; the convolutional layer in the residual block performs binary convolution operation, and the quantized bit numbers of the input value and the output value of the residual block are both 1 bit.
本实施例中,残差块中的卷积层进行的是二进制卷积运算,残差块的输入值和输出值均为1bit数值,有助于减少运算带宽以及神经网络的存储量,具有普遍适用性,比如适用于运行资源和存储资源受限的小型设备(如可穿戴设备、移动终端或者小型 无人机等),并且参与运算的数据量减少了,延迟更低,能够满足某些场景下的实时性要求。In this embodiment, the convolutional layer in the residual block performs a binary convolution operation. The input value and output value of the residual block are both 1-bit values, which helps to reduce the computing bandwidth and the storage capacity of the neural network, and has universal applicability. For example, it is suitable for small devices with limited operating resources and storage resources (such as wearable devices, mobile terminals or small drones, etc.), and the amount of data involved in the calculation is reduced, and the delay is lower, which can meet the real-time requirements in some scenarios.
在一些实施例中,请参阅图5,所述残差块的输入值和输出值的量化比特数均为1比特(bit);所述残差块包括有融合层20、位于主支路的至少一个卷积层10、和位于跳线支路的至少一个卷积层10;其中,所述主支路中的卷积层10数量多于所述跳线支路中的卷积层10数量。所述卷积层10进行二进制卷积运算,则所述卷积层10的输入值和权重值的量化比特数均为1比特(bit),有助于减少运算带宽,提高运算效率。所述融合层20用于将所述主支路中的卷积层10的输出值和所述跳线支路中的卷积层10的输出值进行融合。In some embodiments, referring to FIG. 5 , the number of quantized bits of the input value and the output value of the residual block is 1 bit (bit); the residual block includes a fusion layer 20, at least one convolutional layer 10 located in the main branch, and at least one convolutional layer 10 located in the jumper branch; wherein, the number of convolutional layers 10 in the main branch is more than the number of convolutional layers 10 in the jumper branch. The convolution layer 10 performs a binary convolution operation, and the quantization bits of the input value and the weight value of the convolution layer 10 are both 1 bit, which helps to reduce the operation bandwidth and improve the operation efficiency. The fusion layer 20 is used to fuse the output value of the convolutional layer 10 in the main branch and the output value of the convolutional layer 10 in the jumper branch.
其中,若所述卷积层10的下一层非所述融合层20,则所述卷积层10的输出值的量化比特数为1比特,从而卷积层10的输入值都是1bit,无需经过PReLU、Sign或其他非线性函数处理,省去了频繁的精度转换,减少了整体的数据移动量,卷积层10融合层20卷积层10即在所述卷积层10之后不包含PReLU、Sign或其他非线性函数处理,消除了对额外电路的需要,也有利于提高运算效率和减少运算资源。Wherein, if the next layer of the convolutional layer 10 is not the fusion layer 20, the quantized bit number of the output value of the convolutional layer 10 is 1 bit, so that the input value of the convolutional layer 10 is 1 bit, without PReLU, Sign or other nonlinear function processing, which saves frequent precision conversion and reduces the overall data movement amount. It is also beneficial to improve computing efficiency and reduce computing resources.
若所述卷积层10的下一层为所述融合层20,则所述卷积层10的输出值的量化比特数大于1比特,比如所述卷积层10的输出值的量化比特数为以下任一种:2比特、4比特、8比特或者16比特,可依据实际应用场景设置所述卷积层10的输出值的量化比特数。并且,所述融合层20可以将所述主支路中的卷积层10的输出值和所述跳线支路中的卷积层10的输出值进行融合获得的融合结果转换为1bit,即融合层20的输出值的量化比特数为1比特。本实施例中融合层20在进行数据融合时使用量化比特数大于1比特的数据进行融合,有利于提高融合精度。If the next layer of the convolution layer 10 is the fusion layer 20, the quantization bit number of the output value of the convolution layer 10 is greater than 1 bit, such as the quantization bit number of the output value of the convolution layer 10 is any of the following: 2 bits, 4 bits, 8 bits or 16 bits, and the quantization bit number of the output value of the convolution layer 10 can be set according to the actual application scene. In addition, the fusion layer 20 can convert the fusion result obtained by fusing the output value of the convolution layer 10 in the main branch and the output value of the convolution layer 10 in the jumper branch into 1 bit, that is, the quantization bit number of the output value of the fusion layer 20 is 1 bit. In this embodiment, when performing data fusion, the fusion layer 20 uses data whose quantization bit number is greater than 1 bit for fusion, which is beneficial to improve fusion accuracy.
示例性的,请参阅图6A以及图6B,所述融合层20包括加法层21或者拼接层22。在图6A中,所述加法层21用于将与加法层21连接的至少两个卷积层10的输出值相加,比如可以将主支路的卷积层10的输出值和跳线支路的卷积层10的输出值按位相加。在图6B中,所述拼接层22用于拼接与所述拼接层22连接的至少两个卷积层10的输出值,比如可以以级联方式拼接主支路的卷积层10和跳线支路的卷积层10分别对应的输出值。本实施例提供的残差块结构简单,残差块中的卷积层10的输入值和权重值的量化比特数均为1比特,即卷积层10进行的是二进制卷积运算,仅在融合层20有大于1比特的融合处理,这种简单性有助于减少数据移动和带宽,且延迟更低。Exemplarily, please refer to FIG. 6A and FIG. 6B , the fusion layer 20 includes an addition layer 21 or a splicing layer 22 . In FIG. 6A, the addition layer 21 is used to add the output values of at least two convolutional layers 10 connected to the addition layer 21, for example, the output value of the convolutional layer 10 of the main branch and the output value of the convolutional layer 10 of the jumper branch can be added bit by bit. In FIG. 6B, the splicing layer 22 is used to splice the output values of at least two convolutional layers 10 connected to the splicing layer 22, for example, the output values corresponding to the convolutional layer 10 of the main branch and the convolutional layer 10 of the jumper branch can be spliced in cascade. The residual block provided by this embodiment has a simple structure. The input value of the convolutional layer 10 in the residual block and the number of quantized bits of the weight value are both 1 bit, that is, the convolutional layer 10 performs a binary convolution operation, and only fusion processing of more than 1 bit is performed in the fusion layer 20. This simplicity helps to reduce data movement and bandwidth, and has lower delay.
示例性的,请参阅图6A,若所述融合层20为加法层21,所述残差块的输出值由所述加法层21输出,所述残差块输出值的量化比特数为1比特。请参阅图6B,所述 残差块还包括在所述拼接层22之后的至少一个卷积层10,若所述融合层20为拼接层22,所述残差块的输出值由所述拼接层22的下一卷积层10输出,所述残差块输出值的量化比特数为1比特。For example, please refer to FIG. 6A, if the fusion layer 20 is an addition layer 21, the output value of the residual block is output by the addition layer 21, and the quantization bit number of the output value of the residual block is 1 bit. Please refer to FIG. 6B, the residual block also includes at least one convolutional layer 10 after the splicing layer 22, if the fusion layer 20 is a splicing layer 22, the output value of the residual block is output by the next convolutional layer 10 of the splicing layer 22, and the quantization bit number of the output value of the residual block is 1 bit.
本实施例提供的残差块可用于替代相关技术中非二值化运算的残差块,无需调整或者经过少量调整即可将其转移到大多数计算机视觉任务中,具有较强的可移植性。本实施例训练得到的包括有如图6A或图6B所示结构的残差块的神经网络能够用于执行不同的计算机视觉任务,如图像分类任务、图像定位任务、目标检测任务、目标跟踪任务、语义分割任务、实例分割任务或者超分辨率重建任务等等。The residual block provided by this embodiment can be used to replace the residual block of the non-binarization operation in the related art, and it can be transferred to most computer vision tasks without adjustment or with a small amount of adjustment, and has strong portability. The neural network obtained by training in this embodiment and including the residual block with the structure shown in FIG. 6A or FIG. 6B can be used to perform different computer vision tasks, such as image classification tasks, image positioning tasks, target detection tasks, target tracking tasks, semantic segmentation tasks, instance segmentation tasks or super-resolution reconstruction tasks, etc.
基于实际应用场景的需要,在神经网络的训练过程中,可以基于期望的计算机视觉任务确定相应的标签,不同的计算机视觉任务对应的所述标签不同。在一个例子中,比如在语义分割任务中,所述标签为图像样本中的各个像素所属类别。在另一个例子中,比如在目标检测任务中,所述标签为图像样本中的目标的位置信息。则在训练过程中,预设的神经网络对图像样本进行处理获取的预测值,然后测试装置基于预测值和所述图像样本的标签之前的差异,调整所述神经网络的参数,获取训练好的神经网络。Based on the needs of actual application scenarios, during the training process of the neural network, corresponding labels can be determined based on expected computer vision tasks, and the labels corresponding to different computer vision tasks are different. In an example, such as in a semantic segmentation task, the label is the category to which each pixel in the image sample belongs. In another example, such as in an object detection task, the label is position information of an object in an image sample. Then, during the training process, the preset neural network processes the image samples to obtain predicted values, and then the test device adjusts the parameters of the neural network based on the difference between the predicted values and the labels of the image samples to obtain a trained neural network.
在一些实施例中,为了扩大网络容量和提高网络精度,本申请实施例实现在神经网络训练过程中,引入辅助参数和辅助算子辅助所述卷积层训练,从而有助于提高网络性能。In some embodiments, in order to expand the network capacity and improve the network accuracy, the embodiment of the present application realizes that during the training process of the neural network, auxiliary parameters and auxiliary operators are introduced to assist the training of the convolutional layer, thereby helping to improve network performance.
针对于所述卷积层的权重,在训练过程中,所述卷积层的权重对应有第一辅助参数和第二辅助参数;其中,所述第一辅助参数用于控制将所述卷积层的浮点型权重量化成1比特的量化程度;所述第二辅助参数用于表示量化后的权重的缩放程度。For the weight of the convolutional layer, during the training process, the weight of the convolutional layer corresponds to a first auxiliary parameter and a second auxiliary parameter; wherein, the first auxiliary parameter is used to control the degree of quantization of the floating-point weight of the convolutional layer into 1 bit; the second auxiliary parameter is used to indicate the degree of scaling of the quantized weight.
在神经网络训练过程中,引入第一辅助算子和第二辅助算子,使用第一辅助算子和第二辅助算子对所述卷积层的浮点型权重进行量化;其中,所述第一辅助算子用于在前向传递过程中依据所述第一辅助参数将所述卷积层的浮点型权重量化成1比特;所述第二辅助算子用于确定量化后的权重的符号。示例性的,所述第一辅助算子包括Tanh函数;所述第二辅助算子包括sign函数。In the neural network training process, a first auxiliary operator and a second auxiliary operator are introduced, and the floating-point weight of the convolution layer is quantized by using the first auxiliary operator and the second auxiliary operator; wherein, the first auxiliary operator is used to quantize the floating-point weight of the convolution layer into 1 bit according to the first auxiliary parameter in the forward transfer process; the second auxiliary operator is used to determine the sign of the quantized weight. Exemplarily, the first auxiliary operator includes a Tanh function; the second auxiliary operator includes a sign function.
针对于所述神经网络的输出值,所述卷积层的输出值对应有第三辅助参数、第四辅助参数和至少两个第五辅助参数;所述第三辅助参数用于控制将所述卷积层的输出值量化成1比特的量化程度;所述第四辅助参数用于表示量化后的输出值的缩放程度;所述至少两个第五辅助参数为所述输出值的不同偏置。For the output value of the neural network, the output value of the convolutional layer corresponds to a third auxiliary parameter, a fourth auxiliary parameter, and at least two fifth auxiliary parameters; the third auxiliary parameter is used to control the quantization degree of quantizing the output value of the convolutional layer into 1 bit; the fourth auxiliary parameter is used to indicate the scaling degree of the quantized output value; the at least two fifth auxiliary parameters are different offsets of the output value.
在神经网络训练过程中,请参阅图7,所述训练装置使用第三辅助算子、第四辅 助算子和第二辅助算子对所述卷积层10的浮点型输出值进行量化;其中,所述第三辅助算子用于依据所述第三辅助参数和其中一个所述第五辅助参数对所述卷积层10的浮点型输出值进行非线性化处理;第四辅助算子用于在前向传递过程中,依据另一个所述第五辅助参数将所述第三辅助算子输出的结果量化成1比特;所述第二辅助函数用于确定量化后的输出值的符号。示例性的,所述第三辅助算子包括激活函数,所述第四辅助算子包括hard-tanh函数,所述第二辅助算子包括sign函数。During the training process of the neural network, please refer to Fig. 7, the training device uses the third auxiliary operator, the fourth auxiliary operator and the second auxiliary operator to quantize the floating-point output value of the convolution layer 10; wherein, the third auxiliary operator is used to perform nonlinear processing on the floating-point output value of the convolution layer 10 according to the third auxiliary parameter and one of the fifth auxiliary parameters; Used to determine the sign of the quantized output value. Exemplarily, the third auxiliary operator includes an activation function, the fourth auxiliary operator includes a hard-tanh function, and the second auxiliary operator includes a sign function.
在一示例性的实施例中,设所述卷积层的权重为W,输入值为X,输出值为A。假设有卷积层的权重W∈R N×C×K×K,输入值为X∈R B×H×W×C,输出值A=Conv(W,X),即输出值为卷积层的权重和输入值进行卷积运算的结果;其中N、C、H、W、B和K分别是输出通道、输入通道、高度、宽度、批量大小和内核大小。 In an exemplary embodiment, it is assumed that the weight of the convolutional layer is W, the input value is X, and the output value is A. Assume that there is a weight W∈R N×C×K×K of the convolutional layer, the input value is X∈R B×H×W×C , and the output value A=Conv(W,X), that is, the output value is the result of the convolution operation between the weight of the convolutional layer and the input value; where N, C, H, W, B, and K are the output channel, input channel, height, width, batch size, and kernel size, respectively.
在一个例子中,设所述卷积层的权重为W,浮点型权重为W f,量化后的权重为W b,第一辅助参数为α,第二辅助参数为λ。在训练过程中,使用浮点权重W f进行训练,并在前向传递期间使用Tanh函数将其近似二值化,请参阅图8A,则有: In one example, it is assumed that the weight of the convolutional layer is W, the floating-point weight is W f , the quantized weight is W b , the first auxiliary parameter is α, and the second auxiliary parameter is λ. During training, the floating-point weights W f are used for training, and Tanh function is used to approximate binarization during the forward pass, see Figure 8A, then:
W b=Sign(Tanh(α·W f))   (1); W b =Sign(Tanh(α·W f )) (1);
W f≈λ·W b   (2)。 W f ≈λ·W b (2).
示例性的,α可以采用不同的范围(比如∈R N,或R N×C,或R N×C×K×K)来调整所需的过参数化程度,每个α系数的大小控制着逼近二值化的锐度程度。λ∈R N是每个输出通道的缩放因子,用于补偿{-1,1}中的W f和W b之间的幅度差异。α和λ可以在训练过程中进行调整。本实施例中在训练过程中为卷积层的权重引入辅助参数和辅助算子,扩大了网络容量,有助于提高网络性能。 Exemplarily, α can adopt different ranges (such as ∈RN , or RN ×C , or RN ×C×K×K ) to adjust the required degree of overparameterization, and the size of each α coefficient controls the sharpness of the approximation to binarization. λ ∈ R N is a scaling factor for each output channel to compensate for the magnitude difference between W f and W b in {−1,1}. α and λ can be adjusted during training. In this embodiment, auxiliary parameters and auxiliary operators are introduced into the weight of the convolutional layer during the training process, which expands the network capacity and helps to improve network performance.
在一个例子中,设所述卷积层的输出值为A,浮点型输出值为A f,量化后的输出值为A b,第三辅助参数为τ,第四辅助参数为κ,至少两个第五辅助参数包括b 0和b 1。在训练过程中,由于近似权重和卷积运算都是浮点数,因此输出是实值的,必须在输入下一层之前进行二值化,通过以下一系列转换使用二值化激活A b来近似实值输出A f,请参阅图8A,则有: In one example, it is assumed that the output value of the convolutional layer is A, the floating-point output value is A f , the quantized output value is A b , the third auxiliary parameter is τ, the fourth auxiliary parameter is κ, and at least two fifth auxiliary parameters include b 0 and b 1 . During the training process, since the approximate weights and convolution operations are both floating-point numbers, the output is real-valued and must be binarized before being input to the next layer. The binarized activation A b is used to approximate the real-valued output A f through the following series of transformations. See Figure 8A, then:
A b=Sign(Htanh(PReLU(τA f+b 0)+b 1))   (3); A b =Sign(Htanh(PReLU(τA f +b 0 )+b 1 )) (3);
A f≈κ·A b   (4)。 A f ≈ κ·A b (4).
其中,Htanh是hard-tanh函数,使用hard-tanh函数能够在正向传递期间将输入钳位在[-1,1]处,并在反向传递中使用正弦曲线。PReLU函数和Sign函数使用Straight-Through-Estimator(STE)来计算梯度。上述转换过程可以重塑输入分布,有助于调节神经网络的训练过程。本实施例中在训练过程中为卷积层的输出值引入辅助 参数和辅助算子,扩大了网络容量,有助于提高网络性能。Among them, Htanh is the hard-tanh function, using the hard-tanh function can clamp the input at [-1,1] during the forward pass, and use the sinusoid in the backward pass. The PReLU function and the Sign function use Straight-Through-Estimator (STE) to calculate the gradient. The transformation process described above can reshape the input distribution and help regulate the training process of the neural network. In this embodiment, auxiliary parameters and auxiliary operators are introduced into the output value of the convolutional layer during the training process, which expands the network capacity and helps to improve network performance.
并且为了加快收敛速度,请参阅图8A,在训练过程中使用批规范处理(BatchNorms),对经过缩放因子转换得到的浮点型输出值A f进行处理(比如白化处理),解决数据分布变化的问题。 And in order to speed up the convergence speed, please refer to FIG. 8A , use batch norm processing (BatchNorms) during the training process to process (such as whitening processing) the floating-point output value A f obtained by converting the scaling factor to solve the problem of data distribution changes.
在一些实施例中,在所述神经网络训练完成后,所述辅助参数和辅助算子可以被吸收到所述卷积层的常规参数中,从而在保证神经网络的精度的情况下运算效率也不会降低。示例性的,请参阅图8B,在训练过程中如图8B所示的实施例,引入了辅助参数和辅助算子来扩大网络容量,而在神经网络训练完成之后,辅助参数和辅助算子可以被吸收到所述卷积层的常规参数中,因此可以得到如图8B所示的卷积层,该卷积层进行二进制卷积运算,卷积层的输入值和权重值的量化比特数均为1比特。In some embodiments, after the training of the neural network is completed, the auxiliary parameters and auxiliary operators can be absorbed into the regular parameters of the convolutional layer, so that the operation efficiency will not be reduced while ensuring the accuracy of the neural network. Exemplarily, please refer to FIG. 8B. During the training process, the embodiment shown in FIG. 8B introduces auxiliary parameters and auxiliary operators to expand the network capacity. After the training of the neural network is completed, the auxiliary parameters and auxiliary operators can be absorbed into the regular parameters of the convolution layer, so that the convolution layer shown in FIG. 8B can be obtained. The convolution layer performs binary convolution operations. The input value of the convolution layer and the quantization bit number of the weight value are both 1 bit.
针对于所述卷积层的权重,在所述神经网络训练完成之后,所述训练装置可以消除和/或融合所述卷积层的权重值对应的辅助参数和辅助算子;其中,所述卷积层的所述量化后的权重能够被简化为:根据所述第一辅助参数的符号和所述浮点型权重的符号确定。For the weight of the convolutional layer, after the training of the neural network is completed, the training device may eliminate and/or fuse auxiliary parameters and auxiliary operators corresponding to the weight value of the convolutional layer; wherein, the quantized weight of the convolutional layer can be simplified as follows: determined according to the sign of the first auxiliary parameter and the sign of the floating-point weight.
在一个例子中,训练过程中所述权重对应的辅助参数和辅助算子在推理过程中可以被吸收为一个简单的形式。本质上,二值化只关心两个数的相对值,而忽略它们的大小。利用这个特性,等式(1)即W b=Sign(Tanh(α·W f))可以简化为W b=Sign(α)·Sign(W f)。即是说,只需要知道第一辅助参数的符号和所述浮点型权重的符号,则可以确定出所述量化后的权重(即二值化权重)。 In an example, the auxiliary parameters and auxiliary operators corresponding to the weights in the training process can be absorbed into a simple form during the inference process. Essentially, binarization only cares about the relative value of two numbers, regardless of their magnitude. Utilizing this property, equation (1), ie, W b =Sign(Tanh(α·W f )), can be simplified to W b =Sign(α)·Sign(W f ). That is to say, only the sign of the first auxiliary parameter and the sign of the floating-point weight need to be known, and then the quantized weight (ie, the binarized weight) can be determined.
针对于所述卷积层的输出值,在所述神经网络训练完成之后,所述训练装置可以消除和/或融合所述卷积层的输出值对应的辅助参数和辅助算子。比如请参阅图7中的虚线框内示出的三个辅助算子和图7未示出的辅助参数,在所述神经网络训练完成之后可以被融合或者消除,获得如图5所示的结构。其中,所述卷积层的所述量化后的输出值能够被简化为:根据所述第三辅助参数的符号和预设差值的符号确定;所述预设差值为所述浮点型输出值与预设参数的差值;所述预设参数根据所述第三辅助参数和所述至少两个第五辅助参数确定。For the output value of the convolutional layer, after the training of the neural network is completed, the training device may eliminate and/or fuse auxiliary parameters and auxiliary operators corresponding to the output value of the convolutional layer. For example, please refer to the three auxiliary operators shown in the dashed box in FIG. 7 and the auxiliary parameters not shown in FIG. 7 , which can be fused or eliminated after the neural network training is completed to obtain the structure shown in FIG. 5 . Wherein, the quantized output value of the convolutional layer can be simplified as: determined according to the sign of the third auxiliary parameter and the sign of the preset difference; the preset difference is the difference between the floating-point output value and the preset parameter; the preset parameter is determined according to the third auxiliary parameter and the at least two fifth auxiliary parameters.
在一个例子中,训练过程中所述输出值对应的辅助参数和辅助算子在推理过程中可以被吸收为一个简单的形式。本质上,二值化只关心两个数的相对值,而忽略它们的大小。利用这个特性,对于每个输出通道n,等式(3)即A b=Sign(Htanh(PReLU(τA f+b 0)+b 1))可以简化为A b(n)=Sign(τ(n))Sign(A f(n)-θ(n))。即是说,只需要知道 第三辅助参数的符号、所述浮点型输出值与预设参数(即θ(n))的差值的符号,即可确定出所述卷积层量化后的输出值(即二值化输出值)。 In an example, the auxiliary parameters and auxiliary operators corresponding to the output values during the training process can be absorbed into a simple form during the reasoning process. Essentially, binarization only cares about the relative value of two numbers, regardless of their magnitude. Using this property, for each output channel n, Equation (3), ie, A b =Sign(Htanh(PReLU(τA f +b 0 )+b 1 )), can be simplified to A b (n)=Sign(τ(n))Sign(A f (n)−θ(n)). That is to say, only need to know the sign of the third auxiliary parameter, the sign of the difference between the floating-point output value and the preset parameter (ie θ(n)), the quantized output value of the convolutional layer (ie the binarized output value) can be determined.
其中,θ(n)∈R是一个阈值,取决于通道n的b 0(n)、b 1(n)和τ(n)。这种简化的原因在于,给定b 0(n)、b 1(n)和τ(n),等式(3)中的所有变换都是单调的,因此只需求解零点θ(n)即可确定符号A b(n)。在神经网络训练完成之后,所述神经网络的运行过程仅涉及逐位卷积和阈值(θ(n)),在保证网络精度不变的基础上简化了推理形式。 where θ(n)∈R is a threshold depending on b 0 (n), b 1 (n) and τ(n) of channel n. The reason for this simplification is that given b 0 (n), b 1 (n) and τ(n), all transformations in equation (3) are monotonic, so only the zero point θ(n) needs to be solved to determine the symbol A b (n). After the training of the neural network is completed, the operation process of the neural network only involves bitwise convolution and threshold (θ(n)), which simplifies the reasoning form on the basis of ensuring the network accuracy remains unchanged.
最后,卷积输出可以近似为:Conv(W f,A f)≈(κ·λ)Conv(W b,A b)。其中,所述卷积层的权重对应的第二辅助参数(λ)和所述卷积层的输出值对应的第四辅助参数(κ)是每个通道的缩放因子,(κ·λ)∈R N,可以被带入下一层中,并且能够在下一卷积层的激活函数处理过程中被吸收。以激活函数为PReLU函数为例,第二辅助参数(λ)和第四辅助参数(κ)可以被吸收到下一层的PReLU操作中,因此不需要在推理过程中计算。 Finally, the convolution output can be approximated as: Conv(W f ,A f )≈(κ·λ)Conv(W b ,A b ). Wherein, the second auxiliary parameter (λ) corresponding to the weight of the convolutional layer and the fourth auxiliary parameter (κ) corresponding to the output value of the convolutional layer are the scaling factors of each channel, (κ·λ) ∈RN , which can be brought into the next layer and can be absorbed during the processing of the activation function of the next convolutional layer. Taking the activation function as the PReLU function as an example, the second auxiliary parameter (λ) and the fourth auxiliary parameter (κ) can be absorbed into the PReLU operation of the next layer, so they do not need to be calculated during inference.
示例性的,批规范化处理过程也适合做参数融合,在一个例子中,所述卷积层的输出值对应的辅助参数和辅助算子可以在批规范化处理过程中被融合。例如让γ,β∈R N是批规范化处理(BatchNorm)尺度和偏差,则有:PReLU(τ·BN(x)+b 0)+b 1=PReLU(τ·(γ·x+β)+b 0)+b 1=PReLU((τ·γ)·x+(τ·β+b 0))+b 1。其中,x为卷积层的输出值A f(n),PReLU函数的简化原理与上述等(3)的简化原理相同。 Exemplarily, the batch normalization process is also suitable for parameter fusion. In one example, the auxiliary parameters and auxiliary operators corresponding to the output values of the convolutional layers may be fused during the batch normalization process. For example, let γ, β∈R N be the scale and deviation of batch normalization processing (BatchNorm), then: PReLU(τ·BN(x)+ b 0 ) +b 1 =PReLU(τ·(γ·x+β)+b 0 )+b 1 =PReLU((τ·γ)·x+(τ·β+b 0 ))+b 1 . Among them, x is the output value A f (n) of the convolutional layer, and the simplification principle of the PReLU function is the same as the simplification principle of (3) above.
在一些实施例中,为了扩大网络容量和提升网络精度,在训练过程中,也可以进一步扩大所述残差块的输入值和/或输出值的通道数,并且由于扩大的通道是二值化的,因此比8bit量化神经网络占用的内存更少。In some embodiments, in order to expand the network capacity and improve the network accuracy, the number of channels of the input value and/or output value of the residual block can be further expanded during the training process, and since the expanded channels are binarized, the memory occupied by the 8-bit quantized neural network is less.
在一些实施例中,为了提高神经网络的精度,所述神经网络还可以包括至少一个网络块,所述网络块的输入值或者输出值的量化比特数大于1比特,比如所述网络块的输入值或者输出值包括但不限于2比特、4比特、8比特或者16比特等等;所述网络块中的卷积层的输入值和权重的量化比特数也大于1比特,比如所述卷积层进行的是8bit卷积运算。In some embodiments, in order to improve the accuracy of the neural network, the neural network can also include at least one network block, the quantization bit number of the input value or output value of the network block is greater than 1 bit, such as the input value or output value of the network block includes but not limited to 2 bits, 4 bits, 8 bits or 16 bits, etc.; the input value and the quantization bit number of the weight of the convolution layer in the network block are also greater than 1 bit, such as the convolution layer.
可以理解的是,所述网络块可处于所述神经网络的任意位置,本实施例对此不做任何限制,比如所述神经网络的首尾设置有所述网络块,中间设置有至少一个所述残差块。或者,在所述神经网络中的中间位置也可以设置所述网络块,可依据实际应用场景进行具体设置。It can be understood that the network block can be located at any position of the neural network, and this embodiment does not impose any limitation on this, for example, the network block is arranged at the beginning and end of the neural network, and at least one residual block is arranged in the middle. Alternatively, the network block may also be set at an intermediate position in the neural network, which may be specifically set according to an actual application scenario.
在一可能的实施方式中,为了扩大网络容量和提升网络精度,在训练过程中,可以设置所述残差块的输入值和/或输出值的通道数大于所述网络块的输入值和/或输出值的通道数。在一个例子中,比如所述残差块的输入值和/或输出值的通道数为所述网络块的输入值和/或输出值的通道数的2倍或者4倍等。In a possible implementation, in order to expand the network capacity and improve the network accuracy, during the training process, the number of channels of the input value and/or output value of the residual block can be set to be greater than the number of channels of the input value and/or output value of the network block. In an example, for example, the number of channels of the input value and/or output value of the residual block is 2 times or 4 times the number of channels of the input value and/or output value of the network block.
在一些实施例中,所述残差块中的卷积层的卷积运算通过指定的脉动阵列进行运算;所述脉动阵列包括有多个支持1比特运算的处理单元。考虑到所述卷积层的输入值和输出值的量化比特数均为1bit,则可以设置所述脉动阵列的输入带宽和输出带宽相同。示例性地,所述脉动阵列可以选取方形阵列以确保数据加载和写入的带宽相同。In some embodiments, the convolution operation of the convolution layer in the residual block is performed through a specified systolic array; the systolic array includes a plurality of processing units supporting 1-bit operation. Considering that the quantization bit numbers of the input value and the output value of the convolution layer are both 1 bit, the input bandwidth and the output bandwidth of the systolic array can be set to be the same. Exemplarily, the systolic array may be a square array to ensure the same bandwidth for data loading and writing.
为了充分利用带宽,所述脉动阵列包括多条1比特数据输入的输入线。In order to fully utilize the bandwidth, the systolic array includes multiple input lines for 1-bit data input.
示例性的,所述卷积层的权重使用NHWC格式存储。当然,所述卷积层的权重也可以使用NCHW格式存储。利用数据复用和数据不同排列(NCHW和NHWC)的MAC利用率不一样(NCHW排列利用率取决于W(宽度)维度是否可以整除MAC大小,NHWC排列利用率取决于C(channel)维度是否可以整除MAC大小)。Exemplarily, the weights of the convolutional layers are stored in NHWC format. Of course, the weights of the convolutional layers can also be stored in NCHW format. The utilization rate of MAC using data multiplexing and different data arrangements (NCHW and NHWC) is different (the utilization rate of NCHW arrangement depends on whether the W (width) dimension can divide the MAC size, and the NHWC arrangement utilization depends on whether the C (channel) dimension can divide the MAC size).
在一个例子中,请参阅图9,图9示出了1bit脉动阵列的示意图,该1bit脉动阵列用于根据卷积层的权重和输入值进行逐位卷积运算。该1bit脉动阵列有16bit*16bit的脉动阵列改动得到,为了创建1bit脉动阵列,使用1bit PE(脉动阵列的最小处理单元)替换8bit PE,1bit脉动阵列包括多个1bit的PE。为了充分利用输入带宽,将每个8bit输入线替换为8条1bit输入线,基本上在1bit设计中创建了8倍的行。由于数组是方形的,列数也是8倍。与8bit相比,这意味着PE增加了64倍,理论上加速了64倍,显著提高了数据处理效率。In an example, please refer to FIG. 9 , which shows a schematic diagram of a 1-bit systolic array, which is used to perform bit-by-bit convolution operations according to the weights and input values of the convolutional layer. The 1bit systolic array is obtained by changing the 16bit*16bit systolic array. In order to create a 1bit systolic array, 1bit PE (the smallest processing unit of the systolic array) is used to replace the 8bit PE. The 1bit systolic array includes multiple 1bit PEs. In order to fully utilize the input bandwidth, each 8bit input line is replaced with 8 1bit input lines, basically creating 8 times as many lines in the 1bit design. Since the array is square, the number of columns is also 8 times larger. Compared with 8bit, this means that the PE has been increased by 64 times, theoretically accelerated by 64 times, and the data processing efficiency has been significantly improved.
以上实施方式中的各种技术特征可以任意进行组合,只要特征之间的组合不存在冲突或矛盾,因此上述实施方式中的各种技术特征的任意进行组合也属于本说明书公开的范围。Various technical features in the above embodiments can be combined arbitrarily, as long as there is no conflict or contradiction between the combinations of features, so any combination of various technical features in the above embodiments also falls within the scope of the disclosure of this specification.
相应地,请参阅图10,本申请实施例还提供了一种图像处理方法,包括:Correspondingly, referring to FIG. 10, the embodiment of the present application also provides an image processing method, including:
在步骤S201中,获取待处理图像。In step S201, an image to be processed is acquired.
在步骤S202中,将所述待处理图像输入预先训练的神经网络中,获取图像处理结果;其中,所述神经网络包括有至少一个残差块;所述残差块中的卷积层进行二进制卷积运算,所述残差块的输入值和输出值的量化比特数均为1比特。In step S202, the image to be processed is input into a pre-trained neural network to obtain an image processing result; wherein, the neural network includes at least one residual block; the convolution layer in the residual block performs a binary convolution operation, and the number of quantized bits of the input value and output value of the residual block is 1 bit.
其中,所述神经网络基于图4实施例所示的训练方法训练得到,相关之处向参见图4实施例的描述,此处不再赘述。Wherein, the neural network is trained based on the training method shown in the embodiment shown in FIG. 4 . For relevant parts, refer to the description of the embodiment shown in FIG. 4 , which will not be repeated here.
在一些实施例中,所述神经网络用于处理以下任意一种计算机视觉任务:图像分类任务、图像定位任务、目标检测任务、目标跟踪任务、语义分割任务、实例分割任务或者超分辨率重建任务。In some embodiments, the neural network is used to process any one of the following computer vision tasks: image classification task, image localization task, object detection task, object tracking task, semantic segmentation task, instance segmentation task or super-resolution reconstruction task.
相应地,请参阅图11,本申请实施例还提供了一种神经网络的训练装置,所述神经网络用于处理计算机视觉任务,所述装置包括:Correspondingly, referring to FIG. 11 , the embodiment of the present application also provides a neural network training device, the neural network is used for processing computer vision tasks, and the device includes:
用于存储可执行指令的存储器102; memory 102 for storing executable instructions;
一个或多个处理器101;one or more processors 101;
其中,所述一个或多个处理器101执行所述可执行指令时,被单独地或共同地配置成执行上述的方法。Wherein, when the one or more processors 101 execute the executable instructions, they are individually or collectively configured to execute the above method.
所述处理器101执行所述存储器102中包括的可执行指令,所述处理器101可以是中央处理单元(Central Processing Unit,CPU),还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 101 executes the executable instructions included in the memory 102. The processor 101 can be a central processing unit (Central Processing Unit, CPU), and can also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) Or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
所述存储器102存储神经网络的训练方法的可执行指令,所述存储器102可以包括至少一种类型的存储介质,存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等等。而且,装置可以与通过网络连接执行存储器的存储功能的网络存储装置协作。存储器102可以是装置的内部存储单元,例如装置的硬盘或内存。存储器102也可以是装置的外部存储设备,例如装置上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,存储器102还可以既包括装置的内部存储单元也包括外部存储设备。存储器102用于存储计算机程序以及设备所需的其他数据。存储器102还可以用于暂时地存储已经输出或者将要输出的数据。The memory 102 stores executable instructions of the training method of the neural network, and the memory 102 may include at least one type of storage medium, the storage medium includes a flash memory, a hard disk, a multimedia card, a card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. Also, the device may cooperate with a network storage device that performs a storage function of the memory through a network connection. The memory 102 may be an internal storage unit of the device, such as a hard disk or internal memory of the device. The memory 102 can also be an external storage device of the device, such as a plug-in hard disk equipped on the device, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash memory card (Flash Card) and the like. Further, the storage 102 may also include both an internal storage unit of the apparatus and an external storage device. The memory 102 is used to store computer programs and other data required by the device. The memory 102 can also be used to temporarily store data that has been output or will be output.
在一些实施例中,所述一个或多个处理器101执行所述可执行指令时,被单独地或共同地配置成:In some embodiments, when the one or more processors 101 execute the executable instructions, they are individually or jointly configured to:
获取携带有标签的图像样本;Obtain image samples with labels;
根据所述携带有标签的图像样本,对预设的神经网络进行训练;Training a preset neural network according to the labeled image samples;
其中,所述神经网络包括有至少一个残差块;所述残差块中的卷积层进行二进制卷积运算,所述残差块的输入值和输出值的量化比特数均为1比特。Wherein, the neural network includes at least one residual block; the convolution layer in the residual block performs a binary convolution operation, and the number of quantized bits of the input value and the output value of the residual block is 1 bit.
示例性的,所述残差块包括有融合层、位于主支路的至少一个卷积层、和位于跳线支路的至少一个卷积层;所述卷积层的输入值和权重值的量化比特数均为1比特;所述融合层用于将所述主支路中的卷积层的输出值和所述跳线支路中的卷积层的输出值进行融合;其中,所述主支路中的卷积层数量多于所述跳线支路中的卷积层数量。Exemplarily, the residual block includes a fusion layer, at least one convolution layer located in the main branch, and at least one convolution layer located in the jump branch; the input value of the convolution layer and the quantization bit number of the weight value are both 1 bit; the fusion layer is used to fuse the output value of the convolution layer in the main branch and the output value of the convolution layer in the jump branch; wherein, the number of convolution layers in the main branch is more than the number of convolution layers in the jump branch.
示例性的,若所述卷积层的下一层为所述融合层,则所述卷积层的输出值的量化比特数大于1比特;若所述卷积层的下一层非所述融合层,则所述卷积层的输出值的量化比特数为1比特。Exemplarily, if the next layer of the convolution layer is the fusion layer, the quantization bit number of the output value of the convolution layer is greater than 1 bit; if the next layer of the convolution layer is not the fusion layer, the quantization bit number of the output value of the convolution layer is 1 bit.
示例性的,所述融合层的输出值的量化比特数为1比特。Exemplarily, the number of quantization bits of the output value of the fusion layer is 1 bit.
示例性的,若所述卷积层的下一层为所述融合层,则所述卷积层的输出值的量化比特数为以下任一种:2比特、4比特、8比特或者16比特。Exemplarily, if the next layer of the convolutional layer is the fusion layer, the number of quantized bits of the output value of the convolutional layer is any of the following: 2 bits, 4 bits, 8 bits or 16 bits.
示例性的,所述融合层包括加法层或者拼接层;所述加法层用于将与加法层连接的至少两个卷积层的输出值相加;所述拼接层用于拼接与所述拼接层连接的至少两个卷积层的输出值。Exemplarily, the fusion layer includes an addition layer or a stitching layer; the addition layer is used to add the output values of at least two convolutional layers connected to the addition layer; the stitching layer is used to stitch the output values of at least two convolutional layers connected to the stitching layer.
示例性的,若所述融合层为加法层,所述残差块的输出值由所述加法层输出;若所述融合层为拼接层,所述残差块的输出值由所述拼接层的下一卷积层输出。Exemplarily, if the fusion layer is an addition layer, the output value of the residual block is output by the addition layer; if the fusion layer is a splicing layer, the output value of the residual block is output by the next convolutional layer of the splicing layer.
示例性的,不同的计算机视觉任务对应的所述标签不同。Exemplarily, the tags corresponding to different computer vision tasks are different.
示例性的,所述计算机视觉任务包括以下任一种或多种:图像分类任务、图像定位任务、目标检测任务、目标跟踪任务、语义分割任务、实例分割任务或者超分辨率重建任务。Exemplarily, the computer vision task includes any one or more of the following: image classification task, image localization task, object detection task, object tracking task, semantic segmentation task, instance segmentation task or super-resolution reconstruction task.
示例性的,在训练过程中,引入辅助参数和辅助算子辅助所述卷积层训练;在所述神经网络训练完成后,所述辅助参数和辅助算子被吸收到所述卷积层的参数中。Exemplarily, during the training process, auxiliary parameters and auxiliary operators are introduced to assist the training of the convolutional layer; after the training of the neural network is completed, the auxiliary parameters and auxiliary operators are absorbed into the parameters of the convolutional layer.
示例性的,在训练过程中,所述卷积层的权重对应有第一辅助参数和第二辅助参数;其中,所述第一辅助参数用于控制将所述卷积层的浮点型权重量化成1比特的量化程度;所述第二辅助参数用于表示量化后的权重的缩放程度。Exemplarily, during the training process, the weight of the convolutional layer corresponds to a first auxiliary parameter and a second auxiliary parameter; wherein, the first auxiliary parameter is used to control the degree of quantization of the floating-point weight of the convolutional layer into 1 bit; the second auxiliary parameter is used to indicate the scaling degree of the quantized weight.
示例性的,所述处理器还被配置为:在训练过程中,使用第一辅助算子和第二辅助算子对所述卷积层的浮点型权重进行量化;其中,所述第一辅助算子用于在前向传递过程中依据所述第一辅助参数将所述卷积层的浮点型权重量化成1比特;所述第二辅助算子用于确定量化后的权重的符号。示例性的,所述第一辅助算子包括Tanh函数;所述第二辅助算子包括sign函数。Exemplarily, the processor is further configured to: during the training process, use a first auxiliary operator and a second auxiliary operator to quantize the floating-point weight of the convolutional layer; wherein the first auxiliary operator is used to quantize the floating-point weight of the convolutional layer into 1 bit according to the first auxiliary parameter in the forward pass process; and the second auxiliary operator is used to determine the sign of the quantized weight. Exemplarily, the first auxiliary operator includes a Tanh function; the second auxiliary operator includes a sign function.
示例性的,所述处理器还被配置为:在所述神经网络训练完成之后,消除和/或融合所述卷积层的权重值对应的辅助参数和辅助算子;其中,所述卷积层的所述量化后的权重能够被简化为:根据所述第一辅助参数的符号和所述浮点型权重的符号确定。Exemplarily, the processor is further configured to: after the neural network training is completed, eliminate and/or fuse auxiliary parameters and auxiliary operators corresponding to the weight values of the convolutional layer; wherein, the quantized weight of the convolutional layer can be simplified as: determined according to the sign of the first auxiliary parameter and the sign of the floating-point weight.
示例性的,所述卷积层的输出值对应有第三辅助参数、第四辅助参数和至少两个第五辅助参数;所述第三辅助参数用于控制将所述卷积层的输出值量化成1比特的量化程度;所述第四辅助参数用于表示量化后的输出值的缩放程度;所述至少两个第五辅助参数为所述输出值的不同偏置。Exemplarily, the output value of the convolutional layer corresponds to a third auxiliary parameter, a fourth auxiliary parameter, and at least two fifth auxiliary parameters; the third auxiliary parameter is used to control the quantization degree of quantizing the output value of the convolutional layer into 1 bit; the fourth auxiliary parameter is used to indicate the scaling degree of the quantized output value; the at least two fifth auxiliary parameters are different offsets of the output value.
示例性的,所述处理器还被配置为:在训练过程中,使用第三辅助算子、第四辅助算子和第二辅助算子对所述卷积层的浮点型输出值进行量化;其中,所述第三辅助算子用于依据所述第三辅助参数和其中一个所述第五辅助参数对所述卷积层的浮点型输出值进行非线性化处理;第四辅助算子用于在前向传递过程中,依据另一个所述第五辅助参数将所述第三辅助算子输出的结果量化成1比特;所述第二辅助函数用于确定量化后的输出值的符号。Exemplarily, the processor is further configured to: during the training process, use a third auxiliary operator, a fourth auxiliary operator, and a second auxiliary operator to quantize the floating-point output value of the convolutional layer; wherein, the third auxiliary operator is used to perform nonlinear processing on the floating-point output value of the convolutional layer according to the third auxiliary parameter and one of the fifth auxiliary parameters; The sign of the output value.
示例性的,所述第三辅助算子包括激活函数,所述第四辅助算子包括hard-tanh函数,所述第二辅助算子包括sign函数。Exemplarily, the third auxiliary operator includes an activation function, the fourth auxiliary operator includes a hard-tanh function, and the second auxiliary operator includes a sign function.
示例性的,所述处理器还被配置为:在所述神经网络训练完成之后,消除和/或融合所述卷积层的输出值对应的辅助参数和辅助算子;其中,所述卷积层的所述量化后的输出值能够被简化为:根据所述第三辅助参数的符号和预设差值的符号确定;所述预设差值为所述浮点型输出值与预设参数的差值;所述预设参数根据所述第三辅助参数和所述至少两个第五辅助参数确定。Exemplarily, the processor is further configured to: after the neural network training is completed, eliminate and/or fuse auxiliary parameters and auxiliary operators corresponding to the output values of the convolutional layer; wherein, the quantized output value of the convolutional layer can be simplified as follows: determined according to the sign of the third auxiliary parameter and the sign of a preset difference; the preset difference is the difference between the floating-point output value and a preset parameter; the preset parameter is determined according to the third auxiliary parameter and the at least two fifth auxiliary parameters.
示例性的,所述卷积层的权重对应的第二辅助参数和所述卷积层的输出值对应的第四辅助参数能够在下一卷积层的激活函数处理过程中被吸收。Exemplarily, the second auxiliary parameter corresponding to the weight of the convolutional layer and the fourth auxiliary parameter corresponding to the output value of the convolutional layer can be absorbed during the processing of the activation function of the next convolutional layer.
示例性的,所述神经网络还包括至少一个网络块,所述网络块的输入值或者输出值的量化比特数大于1比特。Exemplarily, the neural network further includes at least one network block, and the number of quantized bits of the input value or output value of the network block is greater than 1 bit.
示例性的,所述残差块的输入值和/或输出值的通道数大于所述网络块的输入值和/或输出值的通道数。Exemplarily, the number of channels of the input value and/or output value of the residual block is greater than the number of channels of the input value and/or output value of the network block.
示例性的,所述残差块中的卷积层的卷积运算通过指定的脉动阵列进行运算;所述脉动阵列包括有多个支持1比特运算的处理单元。Exemplarily, the convolution operation of the convolution layer in the residual block is performed through a specified systolic array; the systolic array includes a plurality of processing units supporting 1-bit operation.
示例性的,所述脉动阵列的输入带宽和输出带宽相同。Exemplarily, the input bandwidth and the output bandwidth of the systolic array are the same.
示例性的,所述脉动阵列为方形阵列。Exemplarily, the pulsation array is a square array.
示例性的,所述脉动阵列包括多条1比特数据输入的输入线。Exemplarily, the systolic array includes a plurality of input lines for 1-bit data input.
示例性的,所述卷积层的权重使用NHWC格式存储。Exemplarily, the weights of the convolutional layers are stored in NHWC format.
上述装置中各个单元的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。For the implementation process of the functions and effects of each unit in the above device, please refer to the implementation process of the corresponding steps in the above method for details, and will not be repeated here.
相应地,本申请实施例还提供了一种图像处理装置,包括:Correspondingly, the embodiment of the present application also provides an image processing device, including:
用于存储可执行指令的存储器;memory for storing executable instructions;
一个或多个处理器;one or more processors;
其中,所述一个或多个处理器执行所述可执行指令时,被单独地或共同地配置成:Wherein, when the one or more processors execute the executable instructions, they are individually or jointly configured to:
获取待处理图像;Get the image to be processed;
将所述待处理图像输入预先训练的神经网络中,获取图像处理结果;其中,所述神经网络包括有至少一个残差块;所述残差块中的卷积层进行二进制卷积运算,所述残差块的输入值和输出值的量化比特数均为1比特。The image to be processed is input into a pre-trained neural network to obtain image processing results; wherein the neural network includes at least one residual block; the convolution layer in the residual block performs binary convolution operation, and the quantized bit numbers of the input value and the output value of the residual block are 1 bit.
示例性的,所述神经网络用于处理以下任意一种计算机视觉任务:图像分类任务、图像定位任务、目标检测任务、目标跟踪任务、语义分割任务、实例分割任务或者超分辨率重建任务。Exemplarily, the neural network is used to process any one of the following computer vision tasks: image classification task, image localization task, object detection task, object tracking task, semantic segmentation task, instance segmentation task or super-resolution reconstruction task.
上述装置中各个单元的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。For the implementation process of the functions and effects of each unit in the above device, please refer to the implementation process of the corresponding steps in the above method for details, and will not be repeated here.
相应地,本申请实施例还提供了一种图像处理系统,包括上述的图像处理装置和可移动平台;Correspondingly, an embodiment of the present application also provides an image processing system, including the above-mentioned image processing device and a movable platform;
所述可移动平台设置有拍摄装置,所述可移动平台用于将拍摄装置拍摄的图像发送给所述图像处理装置。The movable platform is provided with a photographing device, and the movable platform is used to send the image captured by the photographing device to the image processing device.
示例性的,所述可移动平台包括但不限于无人机、无人驾驶车辆、移动机器人、无人驾驶船只或者云台。Exemplarily, the mobile platform includes, but is not limited to, a drone, an unmanned vehicle, a mobile robot, an unmanned ship, or a cloud platform.
在一个例子中,所述图像处理装置可集成在可移动平台中,如图1或图2所示。在另一个例子中,所述图像处理装置可安装在终端设备中,所述终端设备与可移动平台通信连接,所述终端设备例如可以是可移动平台的遥控器,如图3所示。In an example, the image processing device may be integrated in a mobile platform, as shown in FIG. 1 or FIG. 2 . In another example, the image processing apparatus may be installed in a terminal device, and the terminal device is communicatively connected to the movable platform. The terminal device may be, for example, a remote controller of the movable platform, as shown in FIG. 3 .
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器,上述指令可由装置的处理器执行以完成上述方法。例如,非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、 软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium including instructions, such as a memory including instructions, which are executable by a processor of an apparatus to perform the above method. For example, the non-transitory computer readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.
一种非临时性计算机可读存储介质,当存储介质中的指令由终端的处理器执行时,使得终端能够执行上述方法。A non-transitory computer-readable storage medium, enabling the terminal to execute the above method when instructions in the storage medium are executed by a processor of the terminal.
这里描述的各种实施方式可以使用例如计算机软件、硬件或其任何组合的计算机可读介质来实施。对于硬件实施,这里描述的实施方式可以通过使用特定用途集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理装置(DSPD)、可编程逻辑装置(PLD)、现场可编程门阵列(FPGA)、处理器、控制器、微控制器、微处理器、被设计为执行这里描述的功能的电子单元中的至少一种来实施。对于软件实施,诸如过程或功能的实施方式可以与允许执行至少一种功能或操作的单独的软件模块来实施。软件代码可以由以任何适当的编程语言编写的软件应用程序(或程序)来实施,软件代码可以存储在存储器中并且由控制器执行。Various implementations described herein can be implemented using a computer readable medium such as computer software, hardware, or any combination thereof. For hardware implementation, the embodiments described herein may be implemented using at least one of an Application Specific Integrated Circuit (ASIC), Digital Signal Processor (DSP), Digital Signal Processing Device (DSPD), Programmable Logic Device (PLD), Field Programmable Gate Array (FPGA), processor, controller, microcontroller, microprocessor, electronic unit designed to perform the functions described herein. For software implementation, an embodiment such as a procedure or a function may be implemented with a separate software module that allows at least one function or operation to be performed. The software codes can be implemented by a software application (or program) written in any suitable programming language, stored in memory and executed by a controller.
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this document, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. The term "comprising", "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements but also other elements not expressly listed or which are inherent to such process, method, article or apparatus. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.
以上对本申请实施例所提供的方法和装置进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The method and device provided by the embodiment of the present application have been introduced in detail above, and specific examples have been used in this paper to illustrate the principle and implementation of the application. The description of the above embodiment is only used to help understand the method and core idea of the application; meanwhile, for those of ordinary skill in the art, according to the idea of the application, there will be changes in the specific implementation and application range. In summary, the content of this specification should not be understood as limiting the application.

Claims (32)

  1. 一种神经网络的训练方法,其特征在于,所述神经网络用于处理计算机视觉任务,所述方法包括:A training method of a neural network, characterized in that the neural network is used to process computer vision tasks, the method comprising:
    获取携带有标签的图像样本;Obtain image samples with labels;
    根据所述携带有标签的图像样本,对预设的神经网络进行训练;Training a preset neural network according to the labeled image samples;
    其中,所述神经网络包括有至少一个残差块;所述残差块中的卷积层进行二进制卷积运算,所述残差块的输入值和输出值的量化比特数均为1比特。Wherein, the neural network includes at least one residual block; the convolution layer in the residual block performs a binary convolution operation, and the number of quantized bits of the input value and the output value of the residual block is 1 bit.
  2. 根据权利要求1所述的方法,其特征在于,所述残差块包括有融合层、位于主支路的至少一个卷积层、和位于跳线支路的至少一个卷积层;所述卷积层的输入值和权重值的量化比特数均为1比特;The method according to claim 1, wherein the residual block includes a fusion layer, at least one convolutional layer located in the main branch, and at least one convolutional layer located in the jumper branch; the input value of the convolutional layer and the number of quantized bits of the weight value are both 1 bit;
    所述融合层用于将所述主支路中的卷积层的输出值和所述跳线支路中的卷积层的输出值进行融合;The fusion layer is used to fuse the output value of the convolutional layer in the main branch and the output value of the convolutional layer in the jumper branch;
    其中,所述主支路中的卷积层数量多于所述跳线支路中的卷积层数量。Wherein, the number of convolutional layers in the main branch is greater than the number of convolutional layers in the jumper branch.
  3. 根据权利要求2所述的方法,其特征在于,The method according to claim 2, characterized in that,
    若所述卷积层的下一层为所述融合层,则所述卷积层的输出值的量化比特数大于1比特;If the next layer of the convolutional layer is the fusion layer, the number of quantized bits of the output value of the convolutional layer is greater than 1 bit;
    若所述卷积层的下一层非所述融合层,则所述卷积层的输出值的量化比特数为1比特。If the next layer of the convolutional layer is not the fusion layer, the number of quantized bits of the output value of the convolutional layer is 1 bit.
  4. 根据权利要求2或3所述的方法,其特征在于,所述融合层的输出值的量化比特数为1比特。The method according to claim 2 or 3, characterized in that the number of quantized bits of the output value of the fusion layer is 1 bit.
  5. 根据权利要求3所述的方法,其特征在于,若所述卷积层的下一层为所述融合层,则所述卷积层的输出值的量化比特数为以下任一种:2比特、4比特、8比特或者16比特。The method according to claim 3, wherein if the next layer of the convolutional layer is the fusion layer, the number of quantized bits of the output value of the convolutional layer is any of the following: 2 bits, 4 bits, 8 bits or 16 bits.
  6. 根据权利要求2所述的方法,其特征在于,所述融合层包括加法层或者拼接层;The method according to claim 2, wherein the fusion layer comprises an addition layer or a stitching layer;
    所述加法层用于将与加法层连接的至少两个卷积层的输出值相加;The addition layer is used to add the output values of at least two convolutional layers connected to the addition layer;
    所述拼接层用于拼接与所述拼接层连接的至少两个卷积层的输出值。The splicing layer is used to concatenate output values of at least two convolutional layers connected to the splicing layer.
  7. 根据权利要求6所述的方法,其特征在于,若所述融合层为加法层,所述残差块的输出值由所述加法层输出;The method according to claim 6, wherein if the fusion layer is an addition layer, the output value of the residual block is output by the addition layer;
    若所述融合层为拼接层,所述残差块的输出值由所述拼接层的下一卷积层输出。If the fusion layer is a splicing layer, the output value of the residual block is output by the next convolutional layer of the splicing layer.
  8. 根据权利要求1所述的方法,其特征在于,不同的计算机视觉任务对应的所述标签不同。The method according to claim 1, wherein the labels corresponding to different computer vision tasks are different.
  9. 根据权利要求1或8所述的方法,其特征在于,所述计算机视觉任务包括以下任一种或多种:图像分类任务、图像定位任务、目标检测任务、目标跟踪任务、语义分割任务、实例分割任务或者超分辨率重建任务。The method according to claim 1 or 8, wherein the computer vision task comprises any one or more of the following: image classification task, image localization task, target detection task, target tracking task, semantic segmentation task, instance segmentation task or super-resolution reconstruction task.
  10. 根据权利要求2所述的方法,其特征在于,在训练过程中,引入辅助参数和辅助算子辅助所述卷积层训练;The method according to claim 2, characterized in that, in the training process, introducing auxiliary parameters and auxiliary operators to assist the training of the convolutional layer;
    在所述神经网络训练完成后,所述辅助参数和辅助算子被吸收到所述卷积层的参数中。After the training of the neural network is completed, the auxiliary parameters and auxiliary operators are absorbed into the parameters of the convolution layer.
  11. 根据权利要求10所述的方法,其特征在于,在训练过程中,所述卷积层的权重对应有第一辅助参数和第二辅助参数;The method according to claim 10, wherein, in the training process, the weight of the convolutional layer corresponds to a first auxiliary parameter and a second auxiliary parameter;
    其中,所述第一辅助参数用于控制将所述卷积层的浮点型权重量化成1比特的量化程度;Wherein, the first auxiliary parameter is used to control the degree of quantization of the floating-point weight of the convolutional layer into 1 bit;
    所述第二辅助参数用于表示量化后的权重的缩放程度。The second auxiliary parameter is used to represent the scaling degree of the quantized weight.
  12. 根据权利要求11所述的方法,其特征在于,所述方法还包括:The method according to claim 11, characterized in that the method further comprises:
    在训练过程中,使用第一辅助算子和第二辅助算子对所述卷积层的浮点型权重进行量化;During the training process, using the first auxiliary operator and the second auxiliary operator to quantize the floating-point weights of the convolutional layer;
    其中,所述第一辅助算子用于在前向传递过程中依据所述第一辅助参数将所述卷积层的浮点型权重量化成1比特;Wherein, the first auxiliary operator is used to quantize the floating-point weight of the convolutional layer into 1 bit according to the first auxiliary parameter in the forward pass process;
    所述第二辅助算子用于确定量化后的权重的符号。The second auxiliary operator is used to determine the sign of the quantized weight.
  13. 根据权利要求12所述的方法,其特征在于,所述第一辅助算子包括Tanh函数;所述第二辅助算子包括sign函数。The method according to claim 12, wherein the first auxiliary operator comprises a Tanh function; the second auxiliary operator comprises a sign function.
  14. 根据权利要求12所述的方法,其特征在于,还包括:The method according to claim 12, further comprising:
    在所述神经网络训练完成之后,消除和/或融合所述卷积层的权重值对应的辅助参数和辅助算子;After the neural network training is completed, eliminating and/or fusing auxiliary parameters and auxiliary operators corresponding to the weight values of the convolutional layer;
    其中,所述卷积层的所述量化后的权重能够被简化为:根据所述第一辅助参数的符号和所述浮点型权重的符号确定。Wherein, the quantized weight of the convolutional layer can be simplified as: determined according to the sign of the first auxiliary parameter and the sign of the floating-point weight.
  15. 根据权利要求8所述的方法,其特征在于,所述卷积层的输出值对应有第三辅助参数、第四辅助参数和至少两个第五辅助参数;The method according to claim 8, wherein the output value of the convolutional layer corresponds to a third auxiliary parameter, a fourth auxiliary parameter, and at least two fifth auxiliary parameters;
    所述第三辅助参数用于控制将所述卷积层的输出值量化成1比特的量化程度;The third auxiliary parameter is used to control the quantization degree of quantizing the output value of the convolutional layer into 1 bit;
    所述第四辅助参数用于表示量化后的输出值的缩放程度;The fourth auxiliary parameter is used to indicate the scaling degree of the quantized output value;
    所述至少两个第五辅助参数为所述输出值的不同偏置。The at least two fifth auxiliary parameters are different offsets of the output value.
  16. 根据权利要求15所述的方法,其特征在于,所述方法还包括:The method according to claim 15, further comprising:
    在训练过程中,使用第三辅助算子、第四辅助算子和第二辅助算子对所述卷积层的浮点型输出值进行量化;During the training process, the floating-point output value of the convolution layer is quantized by using the third auxiliary operator, the fourth auxiliary operator and the second auxiliary operator;
    其中,所述第三辅助算子用于依据所述第三辅助参数和其中一个所述第五辅助参数对所述卷积层的浮点型输出值进行非线性化处理;Wherein, the third auxiliary operator is used to perform nonlinear processing on the floating-point output value of the convolution layer according to the third auxiliary parameter and one of the fifth auxiliary parameters;
    第四辅助算子用于在前向传递过程中,依据另一个所述第五辅助参数将所述第三辅助算子输出的结果量化成1比特;The fourth auxiliary operator is used to quantize the result output by the third auxiliary operator into 1 bit according to another fifth auxiliary parameter during forward transmission;
    所述第二辅助函数用于确定量化后的输出值的符号。The second auxiliary function is used to determine the sign of the quantized output value.
  17. 根据权利要求16所述的方法,其特征在于,所述第三辅助算子包括激活函数,所述第四辅助算子包括hard-tanh函数,所述第二辅助算子包括sign函数。The method according to claim 16, wherein the third auxiliary operator includes an activation function, the fourth auxiliary operator includes a hard-tanh function, and the second auxiliary operator includes a sign function.
  18. 根据权利要求16所述的方法,其特征在于,还包括:The method according to claim 16, further comprising:
    在所述神经网络训练完成之后,消除和/或融合所述卷积层的输出值对应的辅助参数和辅助算子;After the neural network training is completed, eliminating and/or fusing auxiliary parameters and auxiliary operators corresponding to the output values of the convolutional layer;
    其中,所述卷积层的所述量化后的输出值能够被简化为:根据所述第三辅助参数的符号和预设差值的符号确定;所述预设差值为所述浮点型输出值与预设参数的差值;所述预设参数根据所述第三辅助参数和所述至少两个第五辅助参数确定。Wherein, the quantized output value of the convolutional layer can be simplified as: determined according to the sign of the third auxiliary parameter and the sign of the preset difference; the preset difference is the difference between the floating-point output value and the preset parameter; the preset parameter is determined according to the third auxiliary parameter and the at least two fifth auxiliary parameters.
  19. 根据权利要求10所述的方法,其特征在于,所述卷积层的权重对应的第二辅助参数和所述卷积层的输出值对应的第四辅助参数能够在下一卷积层的激活函数处理过程中被吸收。The method according to claim 10, wherein the second auxiliary parameter corresponding to the weight of the convolutional layer and the fourth auxiliary parameter corresponding to the output value of the convolutional layer can be absorbed during the activation function processing of the next convolutional layer.
  20. 根据权利要求1所述的方法,其特征在于,所述神经网络还包括至少一个网络块,所述网络块的输入值或者输出值的量化比特数大于1比特。The method according to claim 1, wherein the neural network further comprises at least one network block, and the number of quantized bits of the input value or output value of the network block is greater than 1 bit.
  21. 根据权利要求20所述的方法,其特征在于,所述残差块的输入值和/或输出值的通道数大于所述网络块的输入值和/或输出值的通道数。The method according to claim 20, characterized in that the number of channels of the input value and/or output value of the residual block is greater than the number of channels of the input value and/or output value of the network block.
  22. 根据权利要求1或2所述的方法,其特征在于,所述残差块中的卷积层的卷积运算通过指定的脉动阵列进行运算;The method according to claim 1 or 2, wherein the convolution operation of the convolution layer in the residual block is performed through a specified systolic array;
    所述脉动阵列包括有多个支持1比特运算的处理单元。The systolic array includes a plurality of processing units supporting 1-bit operation.
  23. 根据权利要求22所述的方法,其特征在于,所述脉动阵列的输入带宽和输出带宽相同。The method according to claim 22, wherein the input bandwidth and the output bandwidth of the systolic array are the same.
  24. 根据权利要求22或23所述的方法,其特征在于,所述脉动阵列为方形阵列。The method according to claim 22 or 23, wherein the pulsation array is a square array.
  25. 根据权利要求22所述的方法,其特征在于,所述脉动阵列包括多条1比特数据输入的输入线。The method according to claim 22, wherein the systolic array comprises a plurality of input lines for 1-bit data input.
  26. 根据权利要求22所述的方法,其特征在于,所述卷积层的权重使用NHWC格 式存储。The method according to claim 22, wherein the weights of the convolutional layers are stored in NHWC format.
  27. 一种图像处理方法,其特征在于,包括:An image processing method, characterized in that, comprising:
    获取待处理图像;Get the image to be processed;
    将所述待处理图像输入预先训练的神经网络中,获取图像处理结果;其中,所述神经网络包括有至少一个残差块;所述残差块中的卷积层进行二进制卷积运算,所述残差块的输入值和输出值的量化比特数均为1比特。The image to be processed is input into a pre-trained neural network to obtain image processing results; wherein the neural network includes at least one residual block; the convolution layer in the residual block performs binary convolution operation, and the quantized bit numbers of the input value and the output value of the residual block are 1 bit.
  28. 根据权利要求27所述的方法,其特征在于,所述神经网络用于处理以下任意一种计算机视觉任务:图像分类任务、图像定位任务、目标检测任务、目标跟踪任务、语义分割任务、实例分割任务或者超分辨率重建任务。The method according to claim 27, wherein the neural network is used to process any of the following computer vision tasks: image classification tasks, image positioning tasks, target detection tasks, target tracking tasks, semantic segmentation tasks, instance segmentation tasks or super-resolution reconstruction tasks.
  29. 一种神经网络的训练装置,其特征在于,所述神经网络用于处理计算机视觉任务,所述装置包括:A training device for a neural network, characterized in that the neural network is used for processing computer vision tasks, and the device comprises:
    用于存储可执行指令的存储器;memory for storing executable instructions;
    一个或多个处理器;one or more processors;
    其中,所述一个或多个处理器执行所述可执行指令时,被单独地或共同地配置成执行权利要求1至26任意一项所述的方法。Wherein, when the one or more processors execute the executable instructions, they are individually or jointly configured to perform the method described in any one of claims 1-26.
  30. 一种图像处理装置,其特征在于,包括:An image processing device, characterized in that it comprises:
    用于存储可执行指令的存储器;memory for storing executable instructions;
    一个或多个处理器;one or more processors;
    其中,所述一个或多个处理器执行所述可执行指令时,被单独地或共同地配置成执行权利要求27或28所述的方法。Wherein, when the one or more processors execute the executable instructions, they are individually or collectively configured to perform the method described in claim 27 or 28 .
  31. 一种图像处理系统,其特征在于,包括权利要求30所述的图像处理装置和可移动平台;An image processing system, characterized by comprising the image processing device and a movable platform according to claim 30;
    所述可移动平台设置有拍摄装置,所述可移动平台用于将拍摄装置拍摄的图像发送给所述图像处理装置。The movable platform is provided with a photographing device, and the movable platform is used to send the image captured by the photographing device to the image processing device.
  32. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有可执行指令,所述可执行指令被处理器执行时实现如权利要求1至28任一项所述的方法。A computer-readable storage medium, wherein the computer-readable storage medium stores executable instructions, and when the executable instructions are executed by a processor, the method according to any one of claims 1 to 28 is implemented.
PCT/CN2022/073246 2022-01-21 2022-01-21 Neural network training method, image processing method and device, system, and medium WO2023137710A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/073246 WO2023137710A1 (en) 2022-01-21 2022-01-21 Neural network training method, image processing method and device, system, and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/073246 WO2023137710A1 (en) 2022-01-21 2022-01-21 Neural network training method, image processing method and device, system, and medium

Publications (1)

Publication Number Publication Date
WO2023137710A1 true WO2023137710A1 (en) 2023-07-27

Family

ID=87347505

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/073246 WO2023137710A1 (en) 2022-01-21 2022-01-21 Neural network training method, image processing method and device, system, and medium

Country Status (1)

Country Link
WO (1) WO2023137710A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190095795A1 (en) * 2017-03-15 2019-03-28 Samsung Electronics Co., Ltd. System and method for designing efficient super resolution deep convolutional neural networks by cascade network training, cascade network trimming, and dilated convolutions
CN109934761A (en) * 2019-01-31 2019-06-25 中山大学 Jpeg image steganalysis method based on convolutional neural networks
CN111247797A (en) * 2019-01-23 2020-06-05 深圳市大疆创新科技有限公司 Method and apparatus for image encoding and decoding
US20200267416A1 (en) * 2017-11-08 2020-08-20 Panasonic Intellectual Property Corporation Of America Image processor and image processing method
CN111783961A (en) * 2020-07-10 2020-10-16 中国科学院自动化研究所 Activation fixed point fitting-based convolutional neural network post-training quantization method and system
CN113408715A (en) * 2020-03-17 2021-09-17 杭州海康威视数字技术股份有限公司 Fixed-point method and device for neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190095795A1 (en) * 2017-03-15 2019-03-28 Samsung Electronics Co., Ltd. System and method for designing efficient super resolution deep convolutional neural networks by cascade network training, cascade network trimming, and dilated convolutions
US20200267416A1 (en) * 2017-11-08 2020-08-20 Panasonic Intellectual Property Corporation Of America Image processor and image processing method
CN111247797A (en) * 2019-01-23 2020-06-05 深圳市大疆创新科技有限公司 Method and apparatus for image encoding and decoding
CN109934761A (en) * 2019-01-31 2019-06-25 中山大学 Jpeg image steganalysis method based on convolutional neural networks
CN113408715A (en) * 2020-03-17 2021-09-17 杭州海康威视数字技术股份有限公司 Fixed-point method and device for neural network
CN111783961A (en) * 2020-07-10 2020-10-16 中国科学院自动化研究所 Activation fixed point fitting-based convolutional neural network post-training quantization method and system

Similar Documents

Publication Publication Date Title
US11188799B2 (en) Semantic segmentation with soft cross-entropy loss
CN111368685B (en) Method and device for identifying key points, readable medium and electronic equipment
CN111369427B (en) Image processing method, image processing device, readable medium and electronic equipment
US20230237666A1 (en) Image data processing method and apparatus
KR20210048544A (en) Video image processing method and apparatus
CN115699082A (en) Defect detection method and device, storage medium and electronic equipment
KR102628898B1 (en) Method of processing image based on artificial intelligence and image processing device performing the same
CN114549369B (en) Data restoration method and device, computer and readable storage medium
US20220067888A1 (en) Image processing method and apparatus, storage medium, and electronic device
CN111402122A (en) Image mapping processing method and device, readable medium and electronic equipment
CN114723646A (en) Image data generation method with label, device, storage medium and electronic equipment
CN114863539A (en) Portrait key point detection method and system based on feature fusion
CN110008922B (en) Image processing method, device, apparatus, and medium for terminal device
CN114581336A (en) Image restoration method, device, equipment, medium and product
WO2023137710A1 (en) Neural network training method, image processing method and device, system, and medium
CN110619602B (en) Image generation method and device, electronic equipment and storage medium
CN117218346A (en) Image generation method, device, computer readable storage medium and computer equipment
CN115115552B (en) Image correction model training method, image correction device and computer equipment
CN116524261A (en) Image classification method and product based on multi-mode small sample continuous learning
KR20210038027A (en) Method for Training to Compress Neural Network and Method for Using Compressed Neural Network
CN113313720B (en) Object segmentation method and device
CN112990305B (en) Method, device and equipment for determining occlusion relationship and storage medium
CN111402164B (en) Training method and device for correction network model, text recognition method and device
CN115270981A (en) Object processing method and device, readable medium and electronic equipment
CN114792388A (en) Image description character generation method and device and computer readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22921151

Country of ref document: EP

Kind code of ref document: A1