CN113222108B

CN113222108B - Target detection processing method, device and equipment

Info

Publication number: CN113222108B
Application number: CN202110252763.5A
Authority: CN
Inventors: 曹健; 夏立超; 戴镇原; 原浩强; 赵东宇
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2021-03-09
Filing date: 2021-03-09
Publication date: 2024-04-16
Anticipated expiration: 2041-03-09
Also published as: CN113222108A

Abstract

The invention provides a processing method, a device and equipment for target detection, wherein the method comprises the following steps: acquiring a first convolution result corresponding to a first layer of convolution layer of a target detection model; performing low-bit quantization on the first convolution result to obtain a first quantization result; converting the first quantization result into an input working frame corresponding to the pulse neural network chip; the input working frame is sent to the impulse neural network chip, so that the impulse neural network chip carries out operation according to the input working frame to obtain an output working frame; according to the output working frame, a target detection result is determined, occupation of a detection model to a memory is effectively reduced, and operation of an intermediate layer part of the target detection model is realized by a pulse neural network chip, so that the neural network reasoning time is effectively reduced, data transmission between storage and calculation is greatly reduced, the data transmission time consumption and the energy consumption are effectively reduced, the data processing speed is improved, and the real-time performance of the system is further improved.

Description

Target detection processing method, device and equipment

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, and a device for processing target detection.

Background

With the rapid development of artificial intelligence, the target detection technology based on deep learning is mature gradually, but as the depth of the deep learning neural network is increased, the network scale becomes very large and the embedded device is very unfriendly, so that the neural network model needs to be compressed in order to deploy the neural network with good performance on the hardware device with limited resources.

In the prior art, the neural network is usually quantized from the float (floating point) data of 16 bits to the int (integer) data of 8 bits, but the size of the neural network of 8 bits is still larger, and more memory is occupied.

Disclosure of Invention

The embodiment of the invention provides a processing method, a device and equipment for target detection, which are used for solving the problems that the model in the prior art is large in scale and occupies more memory.

In a first aspect, an embodiment of the present invention provides a method for processing target detection, including:

acquiring a first convolution result corresponding to a first layer of convolution layer of a target detection model;

performing low-bit quantization on the first convolution result to obtain a first quantization result;

converting the first quantization result into an input working frame corresponding to the pulse neural network chip;

The input working frame is sent to the impulse neural network chip, so that the impulse neural network chip carries out operation according to the input working frame to obtain an output working frame;

and determining a target detection result according to the output working frame.

In a second aspect, an embodiment of the present invention provides a processing apparatus for object detection, including:

the acquisition module is used for acquiring a first convolution result corresponding to a first layer convolution layer of the target detection model;

the quantization module is used for carrying out low-bit quantization on the first convolution result to obtain a first quantization result;

the conversion module is used for converting the first quantization result into an input working frame corresponding to the pulse neural network chip;

the sending module is used for sending the input working frame to the impulse neural network chip so that the impulse neural network chip can operate according to the input working frame to obtain an output working frame;

and the first processing module is used for determining a target detection result according to the output working frame.

In a third aspect, an embodiment of the present invention provides an electronic device, including: a memory, a transceiver, and at least one processor;

the processor, the memory and the transceiver are interconnected by a circuit;

The processor is connected with the pulse neural network chip;

the memory stores computer-executable instructions; the transceiver is used for receiving the original image data sent by the input equipment;

the at least one processor executes the computer-executable instructions stored by the memory such that the at least one processor performs the method as described above in the first aspect and the various possible designs of the first aspect.

According to the target detection processing method, device and equipment provided by the embodiment of the invention, the size of the neural network for target detection is reduced to below 8 bits, the occupation of a detection model to a memory is effectively reduced, the quantized part of the target detection model is subjected to pulse deployment to the pulse neural network chip, the operation of the middle layer part of the target detection model is realized by the pulse neural network chip, and as a large number of calculation units are contained in the pulse neural network chip, each calculation unit can work independently at the same time, the parallel processing of data can be realized, so that the neural network reasoning time is effectively reduced, the data processing speed is improved, the instantaneity of a system is further improved, the pulse neural network chip adopts an in-memory calculation technology, and the data transmission between storage and calculation is greatly reduced by adopting the pulse neural network chip as an operation unit, so that the data transmission time consumption and the energy consumption are effectively reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for processing object detection according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a process flow of a neural network model for object detection according to an embodiment of the present invention;

FIG. 3 is a diagram showing the yolov 3-tiniy model parameters and corresponding layer sequence numbers according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a network structure of the yolov3-tiny model corresponding to FIG. 3 according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating a comparison of quantifying different numbers of layer model indicators mAP according to an embodiment of the invention;

FIG. 6 is a schematic diagram of a processing system for target detection based on a host controller according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a processing device for object detection according to an embodiment of the present invention;

FIG. 8 is a schematic diagram illustrating an exemplary configuration of a processing device for object detection according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Specific embodiments of the present invention have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

First, the terms involved in the present invention will be explained:

DoReFa-Net: is a method of convolutional neural networks that can train low bit width weights and activation values using low bit width parameter gradients, in the back propagation phase, the parameter gradients can be randomly quantized to low bit width before passing to the next convolutional layer, both forward and backward phase convolutions operate on the low bit width weights and activation values/gradients, dorfa-Net can use low bit width convolution kernels to accelerate training and reasoning, and low bit convolutions can be implemented efficiently on CPU, FPGA, ASIC and GPU.

Dark net: the method is a deep learning framework, NNPACK acceleration packet can be adopted to accelerate, so that neural network operation can be rapidly executed on embedded equipment, a network model can be supported to be disconnected, target detection network mixed precision operation, such as head-tail convolution layer operation on a main controller, middle layer operation on a pulse neural network chip (Spiking Neuron Networks, SNN for short) after low-bit quantization, NNPACK is an acceleration packet for accelerating neural network calculation, and NNPACK can improve convolution layer calculation performance on a multi-core CPU platform.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. In the following description of the embodiments, the meaning of "a plurality" is two and more, unless explicitly defined otherwise.

The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.

An embodiment of the present invention provides a processing method for target detection, which is used for target detection processing in various scenarios, where the execution subject of the present embodiment is a processing apparatus for target detection, and the apparatus may be disposed in an electronic device, and the electronic device may be any computer device, such as an embedded device.

As shown in fig. 1, a flow chart of a processing method of target detection according to the present embodiment is shown, where the method includes:

step 101, obtaining a first convolution result corresponding to a first layer convolution layer of a target detection model.

Specifically, the target detection model is a pulse model obtained by performing low-bit quantization and pulse on an initial detection model obtained by full-precision training, wherein the low-bit quantization is realized by quantized perception training, a first layer convolution layer and a last layer convolution layer of the target detection model are non-quantized and non-pulse layers, the target detection model is deployed on a processing device for target detection, an intermediate layer of the target detection model is a low-bit quantization and pulse layer and is deployed on a pulse neural network chip, when target detection is performed, the processing device for target detection acquires original image data, performs preprocessing on the original image data to obtain input feature data (also can be called as an input feature map) corresponding to the target detection model, the input feature data is a floating point feature map, the processing device for target detection inputs the input feature data into the first layer convolution layer, performs first layer convolution operation to obtain a first convolution result corresponding to the first layer convolution layer, and the first convolution result is a floating point (float) result because the first layer convolution layer is not quantized and pulsed.

Step 102, performing low-bit quantization on the first convolution result to obtain a first quantization result.

Specifically, the low-bit quantization is performed on the first convolution result, specifically, a quantization parameter obtained by using a quantization perception training is used for converting the first convolution result into a low-bit result, the low-bit result is smaller than 8 bits, for example, a floating point type first convolution result can be converted into a low-bit width result of 4 bits, 3 bits, 2 bits or the like, and the quantization perception training can be realized based on a DoReFa-Net quantization network.

And step 103, converting the first quantization result into an input working frame corresponding to the impulse neural network chip.

Specifically, the impulse neural network chip needs to be deployed in advance, and a quantized and pulsed part (namely, an intermediate layer part between the first layer convolution layer and the last layer convolution layer) of the target detection model is deployed to the impulse neural network chip, so that the impulse neural network chip can realize intermediate layer operation of the target detection model, after the impulse neural network chip is deployed, the impulse neural network chip can be applied to target detection, the impulse neural network chip is connected with a target detection processing device and can be communicated through an interface, and a specific interface can be set according to actual requirements, such as FMC interface communication by adopting an FPGA.

Because the first quantization result is low-bit-width data, in order to adapt to the pulse neural network chip, the first quantization result needs to be pulsed and converted into an input working frame corresponding to the pulse neural network chip, and particularly, the first quantization result can be converted according to an input working frame format specified by the pulse neural network chip.

And 104, transmitting the input working frame to the impulse neural network chip so that the impulse neural network chip can operate according to the input working frame to obtain an output working frame.

Specifically, after the processing device for target detection obtains an input working frame adapted to the pulse neural network chip, the input working frame can be sent to the pulse neural network chip, the pulse neural network chip completes the operation of the middle layer, an output working frame is obtained, and the output working frame is sent to the processing device for target detection.

Step 105, determining the target detection result according to the output working frame.

The target detection processing device receives an output working frame sent by the pulse neural network chip, converts the output working frame into a low-bit-width feature map, further converts the low-bit-width feature map into a floating-point feature map according to conversion parameters obtained by quantitative perception training, inputs the floating-point feature map into a final layer of convolution layer to carry out final layer convolution operation, and obtains a final layer of convolution result.

Optionally, the processing device for target detection may further send the target detection result to an output device or other devices for subsequent processing, for example, sending the target detection result to a display device for display, for example, in a door control system scene, the target detection result may also be sent to a door control system, controlling opening of a door, and so on, which may be specifically set according to actual requirements.

According to the target detection processing method, the size of the neural network for target detection is reduced to be below 8 bits, occupation of a detection model to a memory is effectively reduced, a quantification part of the target detection model is subjected to pulse deployment to a pulse neural network chip, operation of an intermediate layer part of the target detection model is realized by the pulse neural network chip, and as the pulse neural network chip contains a large number of calculation units, each calculation unit can work independently at the same time, parallel processing of data can be realized, so that the neural network reasoning time is effectively reduced, the data processing speed is improved, the instantaneity of a system is further improved, the pulse neural network chip adopts an in-memory calculation technology, the pulse neural network chip is adopted as an operation unit, data transmission between storage and calculation is greatly reduced, and therefore the data transmission time consumption and the energy consumption are effectively reduced.

In order to make the technical scheme of the invention clearer, another embodiment of the invention further supplements the method provided by the embodiment.

In order to reduce the model size and the computation load, as an implementation manner, the method, optionally, before sending the input working frame to the impulse neural network chip, further includes:

based on a DoReFa-Net quantization network, performing quantization perception training on an initial detection model obtained through full-precision training to obtain a trained quantization model, wherein a first convolution layer and a last convolution layer of the trained quantization model are not quantized, and an intermediate network layer is quantized; pulsing an intermediate network layer of the quantization model to obtain a pulsing model as a target detection model; and deploying the intermediate network layer of the target detection model to the impulse neural network chip so that the impulse neural network chip can perform operation of the intermediate network layer.

Specifically, as shown in fig. 2, a process flow diagram of a neural network model for target detection provided in this embodiment is that, in order to perform low-bit quantization (i.e., model compression) on a detection model, full-precision training is required to be performed on a detection network established in advance to obtain an initial detection model meeting a preset index requirement, where the preset index may refer to an mAP (mean average precision, average precision mean), that is, in target detection of multiple classes, each class may draw a curve according to recovery and precision, where AP is an area under the curve, and mAP is an average value of multiple classes of APs, after obtaining the initial detection model, based on a dorfa-Net quantization network, performing quantization perception training on the initial detection model to obtain a trained quantization model, since the first layer and the last layer of convolution layer are relatively sensitive to the mAP, therefore, the first layer convolution layer and the last layer convolution layer do not carry out low-bit quantization, but keep a full-precision state, carry out low-bit quantization on the middle layer, quantize to a low-bit wide state below 8 bits, and set specific low-bit wide values according to practical requirements, the embodiment of the invention takes quantization to 4 bits as an example, quantizes the weight and activation of the middle layer to 4 bits, obtains a trained quantization model, and further needs to pulse the quantization model to obtain a pulse model as a target detection model in order to deploy the middle layer part of the quantization model to the pulse neural network chip, so that the target detection model can be further deployed, the first layer convolution layer and the last layer convolution layer are deployed on a processing device for target detection, the pulse middle layer part is deployed on the pulse neural network chip, thereby ensuring that a sensitive convolution layer keeps full-precision operation, and the middle layer performs operation on a pulse neural network chip, so that the model scale is reduced on the basis of ensuring the target detection precision; the pulse neural network chip is deployed, the pulse model can be converted into a configuration frame and sent to the pulse neural network chip, and the pulse neural network chip carries out chip configuration according to the configuration frame to obtain the pulse neural network chip with the configuration completed.

The model compression is carried out based on the DoReFa-Net quantization network, so that the weight and the activation of the model part layers can be quantized to a low-bit wide scale lower than 8 bits, and the model scale and the operand are greatly reduced.

In order to further reduce the model scale and the computation load, as an implementation manner, on the basis of the above embodiment, optionally, before performing quantization perception training on the initial detection model obtained by full-precision training based on the dorfa-Net quantization network, and obtaining a trained quantization model, the method further includes: performing network clipping on the initial detection model obtained through full-precision training to obtain a clipped detection model; correspondingly, based on the DoReFa-Net quantization network, performing quantization perception training on an initial detection model obtained by full-precision training to obtain a trained quantization model, wherein the method comprises the following steps: and based on the DoReFa-Net quantization network, performing quantization perception training on the cut detection model to obtain a trained quantization model.

Specifically, the model compression of the initial detection model can also be realized by network clipping, then low-bit quantization is performed to realize further compression, the network clipping can be realized by a mode of proportionally reducing the convolution number or channel pruning, for example, the channel number (i.e. filters number) of each layer of the initial detection model is clipped to be 1/2, 1/4 or 1/8, a proper clipping mode (such as clipping to be 1/4) is selected as a clipping mode of the final initial detection model through comparing index mAP, the network clipping is performed on the initial detection model to obtain a clipped detection model, and then the quantized perception training is performed on the clipped detection model based on the DoReFa-Net quantization network to obtain a trained quantization model.

The model is compressed by combining network clipping and DoReFa-Net quantization network, so that the model scale and the operation amount are further reduced.

As another implementation manner, on the basis of the foregoing embodiment, optionally, low-bit quantization is performed on the first convolution result to obtain a first quantized result, including:

and carrying out low-bit quantization on the first convolution result based on the trained quantization parameters obtained by the quantization perception training to obtain a first quantization result.

Specifically, in the quantization perception training stage, quantization parameters for quantizing the first convolution result into a low-bit result are obtained and stored, and in practical application, the low-bit quantization can be performed on the first convolution result based on the stored quantization parameters, so as to obtain the first quantization result.

As another implementation manner, on the basis of the foregoing embodiment, optionally, converting the first quantization result into an input working frame corresponding to the pulsed neural network chip includes:

and converting the first quantization result into an input working frame corresponding to the pulse neural network chip according to a preset input working frame format.

Specifically, when data (such as a first quantization result) needs to be sent to the pulse neural network chip, in order to enable the sent data to adapt to the pulse neural network chip, the data needs to be encoded according to a preset input work frame format corresponding to the pulse neural network chip, and specifically, all non-0 feature points in the first quantization result are encoded into corresponding input work frames according to the preset input work frame format.

As another implementation manner, on the basis of the foregoing embodiment, optionally, obtaining a first convolution result corresponding to a first convolution layer of the target detection model includes:

acquiring original image data; preprocessing original image data to obtain input characteristic data corresponding to a target detection model; and carrying out first-layer convolution operation on the input characteristic data to obtain a first convolution result corresponding to the first-layer convolution layer.

Specifically, the processing device for target detection may receive and acquire the original image data from an input device, for example, may be any image capturing device, the processing device for target detection may also acquire the original image data from a storage area where the original image data is stored in advance, and may specifically be set according to actual requirements, which is not limited in this embodiment, after the original image data is acquired, since the size format and other attributes of the original image data may not meet the input requirements of the target detection model, preprocessing, for example, image scaling and/or other related processing, may be required, specifically, according to the actual requirements, input feature data meeting the input requirements of the target detection model may be obtained through preprocessing, and further the input feature data is input to the first layer convolution layer to perform the first layer convolution operation to obtain the first convolution result, and subsequent processing flows are performed on the first convolution result based on the foregoing processing procedure, which is not described herein.

Optionally, performing a first layer convolution operation on the input feature data to obtain a first convolution result corresponding to the first layer convolution layer, including:

and performing a first-layer convolution operation on the input characteristic data under the deep learning framework of the Darknet to obtain a first convolution result corresponding to the first-layer convolution layer.

Specifically, in order to realize the disconnection deployment and the mixed precision operation of the target detection model, the target detection model is realized based on a dark learning framework of the dark net, so that the output of a certain layer and the direct input of data from the middle layer can be obtained, and the faster operation speed on the embedded equipment can be ensured.

Optionally, an NNPACK acceleration package can be adopted to accelerate the deep learning framework of the dark net, so that the operation speed of the target detection model on the embedded device is further improved.

As another implementation manner, on the basis of the foregoing embodiment, optionally, determining, according to the output working frame, the target detection result includes:

converting the output working frame into a first feature map; performing inverse quantization on the first feature map to obtain a numerical feature map; carrying out the final layer convolution operation on the numerical feature map to obtain a final layer convolution result; and carrying out post-processing on the final layer of convolution result to obtain a target detection result.

Specifically, since the last layer of convolution layer of the target detection model is disposed on the processing device of the target detection and is full-precision (floating point type), the output work frame output by the pulse neural network chip cannot be directly input to the last layer of convolution layer, so that after the processing device of the target detection receives the output work frame, the output work frame needs to be converted into a feature map conforming to the input requirement of the last layer of convolution layer, the specific conversion process is to decode the output work frame according to the preset output work frame format of the pulse neural network chip to obtain the first feature map corresponding to the output work frame, the first feature map is a low-bit feature map, the first feature map needs to be further dequantized, the first feature map is a numerical feature map, the numerical feature map can be further input to the last layer of convolution layer to perform the final convolution operation, so as to obtain the final layer convolution result, namely, the final layer convolution result of the target detection model implicitly includes the type, position, degree and other information of one or more candidate frame areas detected implicitly, the like, the first feature map can be obtained by decoding the output by the final layer according to the preset convolution frame format, the final operation can be obtained by the final operation after the final decoding operation of the final operation of the target detection device, the final operation can obtain a final vector-value-required to be decoded, the final operation-order vector can be decoded according to the final operation of the final decoding result, and the final operation can be obtained by decoding the final operation of the final vector, information such as position confidence; and after decoding, performing non-maximum suppression operation on the decoded result, wherein the non-maximum suppression operation is to sort a large number of candidate frames (namely position information obtained by decoding) of the same target object according to the confidence level, delete the candidate frames with larger overlapping area and lower confidence level, and remove redundant candidate frames through the non-maximum suppression operation, thereby obtaining the optimal target detection result.

In order to more clearly illustrate the technical solution of the present invention, the present invention provides an exemplary embodiment, in which the initial detection model of the target detection model is, for example, a yolov3-tiny model, as shown in fig. 3, a schematic diagram of parameters and corresponding layer sequence numbers of the yolov3-tiny model provided for this embodiment, as shown in fig. 4, a schematic diagram of a network structure of the yolov3-tiny model corresponding to fig. 3 provided for this embodiment, in which layers represent layers, 0-23 represent layer numbers, in which 0-15 and 17-22 represent the above initial detection model, 16 and 23 are yolo layers, decoding operations responsible for model output, that is, 16 and 23 layers belong to post-processing portions, filters represent channels, size represents size, input represents input, output represents output, conv (i.e., conv or convoiunal) represents convolution layers, max (i.e. maxpooling or Maxpool) represents the maximum pooling layer, route 17 and 20 represents the routing layer, the routing layer is used to combine the outputs of the previous layers as a new layer, for example, 17 layers route has a parameter 13, which represents taking the output (convolution result) of 13 layers as a new layer, route 17 in fig. 4 is represented by an output line connected with 13 layers, route 20 in fig. 4 is represented by an output line connected with 19 layers, an output line connected with 8 layers and a concat layer together, has two parameters 19 and 8, which represents the output result of the two layers of convolution layers, as a new layer, upsample represents the upsampling layer, and the specific network architecture of the yolov3-tiny model is the prior art and will not be repeated here; taking 1 class as an example, the target type is to complete the prediction of a target frame and a type, the number of required feature graphs (i.e. input feature graphs) is 3 x (4+1+1) =18, wherein 3 represents 3 anchors (i.e. points or frames) corresponding to the yolo layer, 4 represents 4 coordinates of the frame, 1 represents whether a target (object) exists, the other 1 represents the number of types, the output result of the target detection model comprises tx, ty, tw, th, objectness, classn-prob, tx and ty represent offset of the central point of the frame and the upper left corner of an original graph (i.e. the input feature graph of the target detection model) cell corresponding to the feature point, tw and th represent length and width (the length and width of the anchor used for clustering), and obctness represents confidence and class-prob represents probability of belonging to a certain type; the function of the yolo layer is to decode the information, firstly, the yolo layer does not remove duplication of 3 anchor boxes, if all three anchor boxes meet the condition, the information is written into the output, and the quantity information of the detection frames with a memory address specially filling the condition is output, wherein the meeting condition is that when the probability of the target object judging information is larger than the threshold, one target object with the highest confidence is selected from the types as the type information.

The following applies the yolov3-tiny model to unmanned aerial vehicle detection, and describes the model compression flow in detail, the model compression flow specifically includes:

1. full-precision training unmanned aerial vehicle detection model

Specifically, before training a model, clustering real frames (group trunk) in an unmanned aerial vehicle training set by using a K-means algorithm, calculating to obtain 9 anchor boxes suitable for a current data set, adapting to the target size in the data set, and then performing full-precision training to obtain an initial detection model.

2. Network clipping

And cutting the number of channels (i.e. the number of filters) of each layer of network yolov3-tiny of the initial detection model into 1/2, 1/4 or 1/8 of the original number, comparing mAP, and selecting a proper model size to obtain a cut detection model.

3. Model low bit quantization

The first layer convolution layer and the last layer convolution layer of the cut detection model are not quantized, the weight and activation of the middle layer are all quantized to 4 bits, and the weight and activation of the middle layer can be quantized to other bit widths according to actual requirements, such as 2 bits or binarization, tri-value and the like, and 4 bits are taken as an example.

As shown in fig. 5, a comparison diagram of different quantization layer model indexes mAP is provided in this embodiment, in which, the numbers of the first row represent the layer numbers of the convolution layers, the gray portion represents the weights and activations of the convolution layers to perform low-bit quantization, so that the first convolution layer of the model is relatively sensitive, and therefore, the first convolution layer is not quantized, and the last convolution layer (15 and 22 are both the last convolution layer) involves the model output, so that the last convolution layer is not quantized, and the weights and activations of the convolution layer of the middle layer are all quantized to 4 bits, thereby realizing that the model scale is compressed and the operation amount of the model is reduced on the basis of ensuring the target detection precision.

As shown in the following table 1, for the model mAP comparison result obtained by combining network clipping and low-bit quantization provided in this embodiment, the model scale, the operand and the model mAP are comprehensively considered, a 1/4 channel number, the middle 4 bits of the head-tail full-precision (i.e., the weights and the activation of the first layer and the tail layer of the convolution layers adopt floating point data, the weights and the activation of the middle layer of the convolution layers adopt low-bit data of 4 bits), the yolov3-tiny quantization model is selected to perform subsequent work, and all 4 bits refer to that all the convolution layers are quantized to 4 bits, and the specific 4bit quantization is realized by adopting a dorfa-Net quantization network.

TABLE 1

	All 4 bits (bit)	Middle 4 bits of head-tail full precision
			yolov3-tiny	0.980	0.974
yolov3-tiny,1/2	0.949	0.945
			yolov3-tiny,1/4	0.863	0.919

In an exemplary embodiment, the processing device for target detection is disposed in an embedded device and implemented by a main controller of the embedded device, which may specifically be an ARM implementation, as shown in fig. 6, and is a schematic structural diagram of a processing system based on target detection of the main controller provided in this embodiment, where the main controller is connected to a pulse neural network chip, the main controller is responsible for preprocessing, first layer convolution operation, low-bit quantization of a first layer convolution result, and pulsation to generate an input working frame required by the pulse neural network chip, the pulse neural network chip performs an operation of an intermediate layer according to the input working frame to obtain an output working frame and sends the output working frame to the main controller, and the main controller is then responsible for analysis, inverse quantization, tail layer convolution layer operation, and post-processing of the output working frame to obtain a final detection result, which is not repeated in a specific processing procedure. The ARM and the pulse neural network chip are combined to realize the mixed precision operation of the same model, the calculation resources are fully utilized, the precision is reserved by the sensitive layer through partial low-bit quantization, the model scale is effectively reduced on the basis of guaranteeing the target detection precision, and the model redundancy is reduced.

As shown in Table 2, the comparison results of different frameworks are selected for the main controller neural network provided in this embodiment, specifically, the results obtained by deploying yolov3-tiny models with different sizes on raspberry derivatives, fps represents the number of frames processed per second, pyTorch is an open source Python machine learning library, and based on Torch, the comparison results show that the smaller the model scale, the smaller the calculated amount, the faster the processing speed of Darknet, and the more obvious the acceleration effect, so that the full-precision part (namely the first layer convolution layer and the tail layer convolution layer) of the Darknet neural network framework calculation model can effectively improve the data processing speed.

TABLE 2

	PyTorch	Dark net (NNPACK acceleration)
			yolov3-tiny	1.25fps	1.05fps
yolov3-tiny,1/2	2.68fps	2.74fps
			yolov3-tiny,1/4	4.11fps	6.21fps
yolov3-tiny,1/8	6.16fps	11.95fps

It should be noted that, in this embodiment, each of the embodiments may be implemented separately, or may be implemented in any combination without conflict, without limiting the invention.

According to the target detection processing method provided by the embodiment, model compression is performed on the basis of the DoReFa-Net quantization network, the weight and activation of the model part layers can be quantized to a low bit width scale lower than 8 bits, and the model scale and the operation amount are greatly reduced; model compression is performed by combining network clipping and a DoReFa-Net quantization network, so that the model scale and the operation amount are further reduced; the target detection model is realized through the Darknet deep learning framework, so that the disconnection deployment and the mixed precision operation of the target detection model can be realized, and the faster operation speed on the embedded equipment can be ensured; and the NNPACK acceleration package can be further adopted to accelerate the dark deep learning framework, so that the operation speed of the target detection model on the embedded equipment is further improved.

Still another embodiment of the present invention provides a processing apparatus for object detection, configured to perform the method of the foregoing embodiment.

Fig. 7 is a schematic structural diagram of a processing device for object detection according to the present embodiment. The apparatus 30 includes: the device comprises an acquisition module 31, a quantization module 32, a conversion module 33, a transmission module 34 and a first processing module 35.

The acquisition module is used for acquiring a first convolution result corresponding to a first layer convolution layer of the target detection model; the quantization module is used for carrying out low-bit quantization on the first convolution result to obtain a first quantization result; the conversion module is used for converting the first quantization result into an input working frame corresponding to the pulse neural network chip; the transmitting module is used for transmitting the input working frame to the pulse neural network chip so that the pulse neural network chip can operate according to the input working frame to obtain an output working frame; and the first processing module is used for determining a target detection result according to the output working frame.

Specifically, the acquiring module may acquire original image data from an input device through an interface between a processing device for target detection and the input device (such as a camera), preprocess the original image data to obtain input feature data corresponding to a target detection model, perform a first layer convolution operation on the input feature data to obtain a first convolution result corresponding to the first layer convolution layer, send the first convolution result to the quantizing module, perform low-bit quantization on the first convolution result, obtain a first quantization result, send the first quantization result to the converting module, convert the first quantization result into an input work frame corresponding to the pulse neural network chip, send the input work frame to the sending module, send the input work frame to the pulse neural network chip, perform an operation on the pulse neural network chip according to the input work frame, obtain an output work frame, send the output work frame to the first processing module of the processing device for target detection, and determine the target detection result according to the output work frame.

The specific manner in which the respective modules perform the operations in the apparatus of the present embodiment has been described in detail in the embodiments related to the method, and the same technical effects can be achieved, which will not be described in detail herein.

In order to make the device of the present invention clearer, a further embodiment of the present invention provides a further supplementary explanation of the device provided in the above embodiment.

As an implementation manner, as shown in fig. 8, an exemplary schematic structural diagram of the processing apparatus for object detection provided in this embodiment, on the basis of the foregoing embodiment, the apparatus may optionally further include a first training module 36, a second training module 37, a second processing module 38, and a deployment module 39.

The first training module is used for carrying out full-precision training on a pre-established target detection network to obtain an initial detection model; the second training module is used for carrying out quantization perception training on the initial detection model obtained through full-precision training based on the DoReFa-Net quantization network to obtain a trained quantization model, wherein the first layer convolution layer and the last layer convolution layer of the trained quantization model do not carry out quantization, and the middle network layer carries out quantization; the second processing module is used for pulsing an intermediate network layer of the quantization model to obtain a pulsed model as a target detection model; the deployment module is used for deploying the intermediate network layer of the target detection model to the impulse neural network chip so that the impulse neural network chip can operate the intermediate network layer.

Specifically, the first training module may perform full-precision training on a pre-established target detection network based on pre-prepared first training data to obtain an initial detection model, and send the initial detection model to the second training module, the second training module performs quantitative perception training on the initial detection model based on the second training data to obtain a trained quantization model, and send the quantization model to the second processing module, the second processing module performs pulse processing on an intermediate network layer of the quantization model to obtain a pulse model as the target detection model, and sends the target detection model to the deployment module, and the deployment module may generate a configuration frame meeting the configuration requirement of the pulse neural network chip on the intermediate network layer of the target detection model according to the configuration requirement of the pulse neural network chip, and send the configuration frame to the pulse neural network chip.

Optionally, the training to deployment function may be implemented by the first processing module or the first processing module in combination with the acquisition module, the quantization module, the conversion module, and the sending module, which may be specifically set according to actual requirements.

Optionally, the second training module may be further configured to perform network clipping on the initial detection model obtained by the full-precision training to obtain a clipped detection model; correspondingly, the second training module is further used for carrying out quantization perception training on the cut detection model based on the DoReFa-Net quantization network to obtain a trained quantization model.

Specifically, after the second training module receives the initial detection model, the second training module can also perform model compression through network clipping to obtain a clipped detection model, and then perform quantization perception training on the clipped detection model based on the DoReFa-Net quantization network to obtain a trained quantization model.

As another implementation manner, on the basis of the foregoing embodiment, optionally, the quantization module is specifically configured to: and carrying out low-bit quantization on the first convolution result based on the trained quantization parameters obtained by the quantization perception training to obtain a first quantization result.

As another implementation manner, on the basis of the foregoing embodiment, optionally, the conversion module is specifically configured to: and converting the first quantization result into an input working frame corresponding to the pulse neural network chip according to a preset input working frame format.

As another implementation manner, on the basis of the foregoing embodiment, optionally, the obtaining module is specifically configured to:

Optionally, the acquisition module may further include an acquisition sub-module, a preprocessing sub-module, and a first layer convolution sub-module. The acquisition sub-module is used for acquiring original image data; the preprocessing sub-module is used for preprocessing the original image data to obtain input characteristic data corresponding to the target detection model; the first layer convolution submodule is used for carrying out first layer convolution operation on the input characteristic data to obtain a first convolution result corresponding to the first layer convolution layer.

Optionally, the acquiring module is specifically configured to: and performing a first-layer convolution operation on the input characteristic data under the deep learning framework of the Darknet to obtain a first convolution result corresponding to the first-layer convolution layer.

As another implementation manner, on the basis of the foregoing embodiment, optionally, the first processing module is specifically configured to:

Optionally, the first processing module may further include a decoding submodule, a dequantization submodule, a tail layer convolution submodule, and a post-processing submodule.

The decoding submodule is used for converting the output working frame into a first characteristic diagram and sending the first characteristic diagram to the inverse quantization submodule; the inverse quantization sub-module performs inverse quantization on the first feature map to obtain a numerical feature map and sends the numerical feature map to the tail layer convolution sub-module; the tail layer convolution submodule carries out the final layer convolution operation on the numerical characteristic diagram to obtain a final layer convolution result and sends the final layer convolution result to the post-processing submodule; and the post-processing sub-module carries out post-processing on the convolution result of the last layer to obtain a target detection result.

Still another embodiment of the present invention provides an electronic device configured to perform the method provided in the foregoing embodiment. The electronic device may be any implementable computer device, such as an embedded device.

Fig. 9 is a schematic structural diagram of an electronic device according to the present embodiment. The electronic device 50 includes: a memory 51, a transceiver 52, and at least one processor 53.

The processor, the memory and the transceiver are interconnected through a circuit; the processor is connected with the impulse neural network chip; the memory stores computer-executable instructions; a transceiver for receiving the original image data sent by the input device; at least one processor executes computer-executable instructions stored in a memory, causing the at least one processor to perform the method as provided in any one of the embodiments above.

Specifically, the electronic device may be connected to an input device, for example, a camera, where the camera captures raw image data and sends the raw image data to the electronic device, and a transceiver of the electronic device receives the raw image data sent by the input device and sends the raw image data to a processor, where the processor reads and executes a computer-executed instruction stored in a memory, and implements the method provided in any of the embodiments above by interacting with a pulse neural network chip.

The electronic device of the present invention can be applied to any target detection scene such as face detection (entrance guard verification, payment authentication, etc.), football detection at football stadium, marine vessel detection, etc.

It should be noted that, the electronic device of this embodiment can implement the method provided in any of the foregoing embodiments, and can achieve the same technical effects, which is not described herein again.

Yet another embodiment of the present invention provides a computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, implement a method as provided in any of the above embodiments.

It should be noted that, the computer readable storage medium of the present embodiment can implement the method provided in any of the above embodiments, and can achieve the same technical effects, which is not described herein.

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It is to be understood that the invention is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method of processing target detection, comprising:

Determining a target detection result according to the output working frame;

wherein, before sending the input working frame to the impulse neural network chip, the method further comprises:

based on a DoReFa-Net quantization network, performing quantization perception training on an initial detection model obtained through full-precision training to obtain a trained quantization model, wherein a first convolution layer and a last convolution layer of the trained quantization model are not quantized, and an intermediate network layer is quantized;

pulsing an intermediate network layer of the quantization model to obtain a pulsing model serving as the target detection model;

and deploying an intermediate network layer of the target detection model to the impulse neural network chip so that the impulse neural network chip can perform operation of the intermediate network layer.

2. The method of claim 1, wherein prior to performing a quantized perceptual training on an initial detection model obtained by full precision training based on a dorfa-Net quantization network to obtain a trained quantization model, the method further comprises:

performing network clipping on the initial detection model obtained through full-precision training to obtain a clipped detection model;

Correspondingly, the DoReFa-Net quantization network-based quantized sensing training is performed on an initial detection model obtained through full-precision training to obtain a trained quantized model, and the method comprises the following steps:

and based on the DoReFa-Net quantization network, performing quantization perception training on the cut detection model to obtain a trained quantization model.

3. The method of claim 1, wherein low-bit quantizing the first convolution result to obtain a first quantized result comprises:

and carrying out low-bit quantization on the first convolution result based on the trained quantization parameter obtained by the quantized perception training to obtain the first quantization result, wherein the quantization parameter is a parameter which is obtained and stored in the quantized perception training stage and used for quantizing the first convolution result into a low-bit result.

4. The method of claim 1, wherein converting the first quantization result into an input working frame corresponding to a pulsed neural network chip, comprises:

5. The method of claim 1, wherein obtaining a first convolution result corresponding to a first layer of the target detection model comprises:

Acquiring original image data;

preprocessing the original image data to obtain input characteristic data corresponding to the target detection model;

and performing first-layer convolution operation on the input characteristic data to obtain a first convolution result corresponding to the first-layer convolution layer.

6. The method of claim 5, wherein performing a first layer convolution operation on the input feature data to obtain a first convolution result corresponding to the first layer convolution layer, comprises:

and performing a first-layer convolution operation on the input characteristic data under a Darknet deep learning framework to obtain a first convolution result corresponding to the first-layer convolution layer.

7. The method according to any one of claims 1-6, wherein determining a target detection result from the output working frame comprises:

converting the output work frame into a first feature map;

performing inverse quantization on the first feature map to obtain a numerical feature map;

performing a final layer of convolution operation on the numerical feature map to obtain a final layer of convolution result;

and carrying out post-processing on the final layer of convolution result to obtain the target detection result.

8. A processing apparatus for object detection, comprising:

the first processing module is used for determining a target detection result according to the output working frame;

the first training module is used for carrying out full-precision training on a pre-established target detection network to obtain an initial detection model;

the second training module is used for carrying out quantization perception training on the initial detection model obtained through full-precision training based on the DoReFa-Net quantization network to obtain a trained quantization model, wherein the first layer convolution layer and the last layer convolution layer of the trained quantization model do not carry out quantization, and the middle network layer carries out quantization;

the second processing module is used for pulsing an intermediate network layer of the quantization model to obtain a pulsed model as a target detection model;

The deployment module is used for deploying the intermediate network layer of the target detection model to the impulse neural network chip so that the impulse neural network chip can operate the intermediate network layer.

9. An electronic device, comprising: a memory, a transceiver, and at least one processor;

the processor, the memory and the transceiver are interconnected by a circuit;

the processor is connected with the pulse neural network chip;

the at least one processor executing computer-executable instructions stored in the memory causes the at least one processor to perform the method of any one of claims 1-7.