CN112966815A

CN112966815A - Target detection method, system and equipment based on impulse neural network

Info

Publication number: CN112966815A
Application number: CN202110344988.3A
Authority: CN
Inventors: 谭杰; 李经纬
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2021-03-31
Filing date: 2021-03-31
Publication date: 2021-06-15

Abstract

The invention belongs to the field of computer vision and neural computation, and particularly relates to a target detection method, a system and equipment based on a pulse neural network, aiming at solving the problems that the traditional deep learning model is large in calculation amount, large in occupied resource and difficult to apply to edge equipment, so that the target detection cannot ensure the accuracy and simultaneously remarkably reduce the resource consumption. The invention comprises the following steps: converting pixel point data of an image to be detected into pulse string data; converting the weight value of the pre-trained YOLOv3 network into a weight value suitable for the impulse neural network, and constructing a YOLOv3 impulse neural network based on the weight value; inputting the pulse train data into a YOLOv3 pulse neural network to obtain the category and coordinate information of the target object in the image to be detected. The invention converts the traditional convolution neural network into the pulse neural network, improves the execution efficiency, reduces the energy consumption and promotes the research and development of a new generation of artificial intelligent chips on the premise of ensuring the detection accuracy.

Description

Target detection method, system and equipment based on impulse neural network

Technical Field

The invention belongs to the field of computer vision and neural computation, and particularly relates to a target detection method, a system and equipment based on a pulse neural network.

Background

Target detection is an important research direction in the field of computer vision, and has specific application in many fields such as automatic driving, video monitoring, industrial detection, aerospace and the like. Before the development of deep learning technology, researchers manually design classifiers to extract objects in images, so that accuracy is low. With the development of deep learning technology and neural networks and the abundance of computing resources, researchers provide a large number of novel networks, so that the accuracy of target detection is greatly improved, and the rapid development of the field is promoted. Typical detection networks are the YOLO series and the RCNN series. The YOLO series can directly predict the category and position information of the target, and the method has high speed and relatively low accuracy; the RCNN series firstly pick out the suggested area and further acquire the category and position information on the suggested area. In contrast, the method has a high accuracy but a relatively low speed. The methods all have some problems, and the models are often large in calculation amount, large in occupied resources and difficult to apply to edge equipment.

With the development of related technologies such as artificial intelligence and neural computation, brain-like computation is widely concerned by researchers, and a third generation neural network, namely a pulse neural network, is generated. The traditional deep neural network is inspired by the working mechanism of the human brain, but is far from the real situation. The traditional deep neural network transmits analog quantity, and in the human brain, information is transmitted in a pulse mode, so that the pulse neural network is closer to the working mechanism of the human brain.

The impulse neural network has been the research focus in the field of artificial intelligence due to its many advantages of high computational efficiency, low energy consumption, small occupied resource, easy hardware implementation, etc. Through research on the pulse neural network, the development of the field of artificial intelligence can be further promoted, and the research and development of a novel artificial intelligence chip with a non-von Neumann architecture can also be promoted. At present, the impulse neural network has succeeded in the fields of image classification and image segmentation, but has few applications in complex visual tasks. The main problems are that the neuron function of the impulse neural network is not microminiature, the neuron function is difficult to train by using a back propagation method, and other reasonable and efficient training methods are not available, so that the research of the impulse neural network is bottleneck.

In general, the existing target detection network cannot effectively utilize the advantages of the impulse neural network, such as high computational efficiency, low energy consumption, less occupied resources, easy hardware implementation and the like, so that the resource consumption cannot be obviously reduced while the accuracy is ensured.

Disclosure of Invention

In order to solve the above problems in the prior art, that is, the conventional deep learning model has a large calculation amount and occupies a large amount of resources, and is difficult to apply to edge devices, and therefore, the target detection cannot ensure the accuracy and simultaneously significantly reduce the resource consumption, the present invention provides a target detection method based on a pulse neural network, which includes:

step S10, acquiring an image to be detected, and converting pixel data of the image to be detected into pulse train data; acquiring a weight of a pre-trained YOLOv3 network, and converting the weight into a weight suitable for an impulse neural network;

step S20, constructing a YOLOv3 impulse neural network based on the weight value suitable for the impulse neural network;

and step S30, inputting the pulse train data into the YOLOv3 pulse neural network to obtain the category and coordinate information of the target object in the image to be detected.

In some preferred embodiments, in step S10, the pixel data of the image to be detected is converted into pulse train data, and the method includes:

recording the theoretical maximum value of the pixel point data of the image to be detected as M, recording the current pixel point data value as x, and obtaining the first pulse emission time t as M-x;

carrying out normalization of the first pulse emission time:

wherein, t_maxAnd t_minRespectively at the time of transmission of the first pulseThe maximum value and the minimum value, T is the maximum value of the pulse emission time after normalization;

and acquiring the rest pulse emission time by following the poisson process based on the normalized first pulse emission time to obtain the pulse string data after the pixel point data of the image to be detected is converted.

In some preferred embodiments, the weights of the pre-trained YOLOv3 network include a convolutional layer weight, a leak-ReLU layer weight, a shortcut path weight, a concat layer weight, and an upsample layer weight of the YOLOv3 network.

In some preferred embodiments, the convolutional layer weight converted into the weight suitable for the impulse neural network is:

wherein, W^lAnd b^lConvolution kernel and offset, λ, of the l-th layer, respectively^lIs the maximum output of the l-th layer.

In some preferred embodiments, the method for converting the leave-ReLU layer weights into weights suitable for the impulse neural network is as follows:

wherein, V_memIs a membrane potential, V_thIs the forward threshold of the membrane potential, V_thIs the negative threshold of the membrane potential, and alpha is a preset hyperparameter.

In some preferred embodiments, the method for converting the shortcut path weight into a weight suitable for the impulse neural network includes:

y＝λ(W·X+b+X)

wherein X is the input tensor of a certain layer in the pre-trained YOLOv3 network, and W is the input tensor of the certain layerThe convolution kernel weight of a layer, b is the offset of the layer, y is the output of the layer after the shortcut path is added,

is a predetermined scale factor, X_maxIs the maximum value of X, y_maxIs the maximum value in y.

In some preferred embodiments, the concat layer weight and the upsampling layer weight are converted into weights suitable for the impulse neural network by:

the concat layer weight is consistent with the concat layer weight of the pre-trained YOLOv3 network, and splicing is carried out according to the dimension of the channel;

and the weight of the upsampling layer is consistent with the weight of the upsampling layer of the pre-trained YOLOv3 network, and upsampling is carried out according to a nearest neighbor interpolation mode.

In another aspect of the present invention, a target detection system based on a spiking neural network is provided, which includes the following modules:

the weight storage module 100 is configured to obtain weights of the pre-trained YOLOv3 network and import the weights into the weight storage module;

a weight value conversion module 200 configured to convert the weight value stored by the weight value storage module 100 into a weight value suitable for the impulse neural network;

an image input module 300 configured to acquire and input an image to be detected;

an image conversion module 400 configured to convert input pixel point data of an image to be detected into burst data;

the SNN module 500 is configured to construct a YOLOv3 impulse neural network based on the weight suitable for the impulse neural network, and input the pulse train data into the YOLOv3 impulse neural network to obtain the category and coordinate information of a target object in an image to be detected;

and the output display module 600 is configured to output and display the image to be detected and the category and coordinate information of the target object in the image to be detected.

In a third aspect of the present invention, an electronic device is provided, including:

at least one processor; and

a memory communicatively coupled to at least one of the processors; wherein the content of the first and second substances,

the memory stores instructions executable by the processor for execution by the processor to implement the spiking neural network-based target detection method described above.

In a fourth aspect of the present invention, a computer-readable storage medium is provided, which stores computer instructions for being executed by the computer to implement the above-mentioned target detection method based on a spiking neural network.

The invention has the beneficial effects that:

(1) the invention relates to a target detection method based on a pulse neural network, which comprises the steps of firstly constructing an original convolution neural network, pre-training the convolution neural network until the network can accurately identify an object to be detected in an image, then deriving weight data of the convolution neural network and converting the weight data into a weight suitable for the pulse neural network, constructing the pulse neural network capable of carrying out target detection based on the weight, and converting input pixel point data of the image to be detected into pulse string data, so that the object type and coordinate information in the image can be detected by using the converted pulse neural network.

(2) The target detection method based on the impulse neural network avoids the problems that the neuron function of the impulse neural network is not microminiature, the training is difficult to carry out by using a back propagation method, and other reasonable and efficient training methods do not exist.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is a schematic flow chart of a target detection method based on a spiking neural network according to the present invention;

FIG. 2 is a schematic structural diagram of a pre-trained YOLOv3 network according to an embodiment of the method for detecting targets based on the impulse neural network;

FIG. 3 is a schematic structural diagram of a pulse neural network constructed by YOLOv3 based on weights suitable for the pulse neural network according to an embodiment of the target detection method based on the pulse neural network of the present invention;

FIG. 4 is a diagram illustrating an example of the detection result of an embodiment of the method for detecting a target based on a spiking neural network according to the present invention;

FIG. 5 is a block diagram of an object detection system based on a spiking neural network according to the present invention.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

The invention relates to a target detection method based on a pulse neural network, which comprises the following steps:

In order to more clearly describe the target detection method based on the impulse neural network of the present invention, the following describes each step in the embodiment of the present invention in detail with reference to fig. 1.

The target detection method based on the impulse neural network of the first embodiment of the present invention includes steps S10-S30, each of which is described in detail as follows:

and step S10, acquiring an image to be detected, and converting pixel data of the image to be detected into pulse train data.

In an embodiment of the present invention, a camera configured as SNA-825ABC is used to read external real image data, and read pixel data of an original image is converted into a pulse train, where the method includes:

and (2) recording the theoretical maximum value of the pixel point data of the image to be detected as M, and recording the current pixel point data value as x, so as to obtain the first pulse emission moment, as shown in formula (1):

t＝M-x (1)

and carrying out normalization on the first pulse emission time, as shown in formula (2):

wherein, t_maxAnd t_minThe maximum value and the minimum value of the first pulse emission time are respectively, and T is the maximum value of the normalized pulse emission time;

After normalization operation, the first pulse emission time is distributed between [0, T ], and after the first pulse emission time, the following pulse emission time follows the Poisson process, so that the pulse string data after pixel point conversion of the original image is obtained.

Weights for the pre-trained YOLOv3 network are obtained and converted to weights suitable for the impulse neural network.

First, a configuration file is constructed. In order to obtain pre-trained YOLOv3 network model weight data, a network needs to be built first.

FIG. 2 is a schematic structural diagram of a pre-trained YOLOv3 network according to an embodiment of the method for detecting targets based on a spiking neural network of the present invention, according to the structural diagram of the pre-trained YOLOv3 network shown in fig. 2, a configuration file of the network structure is written, the configuration file first writes the hyper-parameters involved in the training process, in one embodiment of the present invention, the batch size is 16, the image width and height fill is scaled to 416, the momentum coefficient momentum is 0.9, the attenuation coefficient is 0.0005, the saturation ratio is 1.5, the exposure is 1.5, the hue is 0.1, the learning rate is 0.001, then, according to a schematic diagram of a YOLOv3 network structure shown in fig. 2, parameter information of each module is written in sequence in a depth-first search mode, and the number of filter cores, the size of the filter cores, the moving step length of the filter cores, whether padding is needed, whether batch normalization is needed, and the type of an activation function in the convolutional layer are written; the shortcut path needs to specify which convolutional layer to overlay with and what type of function to activate; the concat layer needs to be written to which convolutional layer to splice; the up-sampling layer needs to write the step size of up-sampling; the last output layer needs to keep track of the number of the output layer and the associated parameters for non-maximum suppression.

Then, a network is built according to the configuration file.

And reading the configuration file constructed by the process according to a dictionary mode, wherein each module is equivalent to a dictionary. After all the module structures in the configuration file are read in sequence, a dictionary set of all the module information is generated. The dictionary contains the connection mode information of each module, so that the configuration file information written according to a depth-first search mode can be reversely analyzed, and a final convolutional neural network structure YOLOv3 network is built.

Then, the preprocessing of the YOLOv3 network training data.

The achromatic images in the original dataset are removed and the remaining total images are padded to 416 width and height, while the labels are also subjected to the same padding and scaling process, thus obtaining a pre-processed training dataset.

Finally, the YOLOv3 network is pre-trained.

And loading the built YOLOv3 network, defining a network loss function, importing the preprocessed training data set obtained in the process, and training according to the hyper-parameters defined at the beginning of the configuration file. Finally, model weight data of the original convolutional neural network (pre-trained YOLOv3 network) is obtained.

Converting the weight of the pre-trained YOLOv3 network into a weight suitable for the impulse neural network, wherein the weight of the pre-trained YOLOv3 network comprises a convolutional layer weight, a leakage-ReLU layer weight, a shortcut path weight, a concat layer weight and an upsampling layer weight of a YOLOv3 network.

The convolution layer weight, and the method for converting it into the weight suitable for the impulse neural network is shown in formula (3) and formula (4):

It should be noted that, as described above, the

And

has undergone a batch normalization process.

The method for converting the leak-ReLU layer weight into the weight suitable for the impulse neural network is shown as formula (5):

It can be seen from the above equation that if the membrane potential is greater than the forward threshold, a positive pulse is emitted; if the membrane potential is less than the negative threshold, a negative pulse is emitted. In one embodiment of the present invention, when α is 0.1, the absolute value of the membrane potential corresponding to 10 times the transmitted positive pulse is accumulated if a negative pulse is to be transmitted.

In an original convolutional neural network (a pre-trained YOLOv3 network), a shortcut path is directly connected and superposed with values of preceding and following convolutional layers, for example, if an input tensor of a certain layer in the original convolutional neural network is X, a convolutional kernel weight of a corresponding layer is W, an offset is b, and an output is y, then a calculation mode of the shortcut path weight after the shortcut path is added is as shown in formula (6):

y＝λ(W·X+b+X) (6)

wherein the content of the first and second substances,

The method for converting the concat layer weight and the up-sampling layer weight into the weight suitable for the impulse neural network comprises the following steps:

And step S20, constructing a YOLOv3 impulse neural network based on the weight values suitable for the impulse neural network.

And importing the converted weight values to obtain a YOLOv3 pulse neural network which can be used for target detection.

As shown in fig. 3, a structural schematic diagram of a YOLOv3 impulse neural network constructed based on weights suitable for the impulse neural network according to an embodiment of the target detection method based on the impulse neural network of the present invention is shown, where CBL represents conv + leak-relu, res unit represents a residual module after a short path is added, res unit n represents n res unit modules, res1, res2, res4, res8, and resn represent 1, 2, 4, 8, and n res units added after the CBL module, and y1, y2, and y3 represent output results of different layers.

And taking the pulse train obtained by converting the image to be detected as the input of a YOLOv3 pulse neural network, and outputting the category and coordinate information of the target object. In an embodiment of the present invention, taking the COCO data set as an example, the data output by the network includes three types of tensors, the sizes of which are respectively 13 × 13 × 255, 26 × 26 × 255 and 52 × 52 × 255, and the three types of tensors are respectively used for detecting three types of large, medium and small objects.

The three types of tensors described above, the third dimension 255 of the size, is obtained by the following equation (7):

3×(5+80)＝255 (7)

each grid unit may correspond to 3 prediction boxes, each prediction box needs 5 parameters, namely, a center coordinate (x, y), a prediction box width and height (w, h) and a confidence, and the probability of 80 categories of the COCO dataset needs to be predicted.

As shown in fig. 4, which is an exemplary diagram of a detection result of an embodiment of the target detection method based on the impulse neural network of the present invention, it can be seen that the present invention can well detect each object type in the image, and can provide a more accurate target object bounding box (i.e., coordinate information).

Although the foregoing embodiments describe the steps in the above sequential order, those skilled in the art will understand that, in order to achieve the effect of the present embodiments, the steps may not be executed in such an order, and may be executed simultaneously (in parallel) or in an inverse order, and these simple variations are within the scope of the present invention.

The target detection system based on the impulse neural network of the second embodiment of the present invention, as shown in fig. 5, includes the following modules:

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.

It should be noted that, the target detection system based on the impulse neural network provided in the above embodiment is only illustrated by the division of the above functional modules, and in practical applications, the above functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the above embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the above described functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.

An electronic apparatus according to a third embodiment of the present invention includes:

at least one processor; and

A computer-readable storage medium of a fourth embodiment of the present invention stores computer instructions for execution by the computer to implement the above-mentioned target detection method based on a spiking neural network.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.

The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A target detection method based on a pulse neural network is characterized by comprising the following steps:

2. The method for detecting an object based on the impulse neural network as claimed in claim 1, wherein the step S10 is performed by converting pixel data of the image to be detected into pulse train data, and the method comprises:

carrying out normalization of the first pulse emission time:

3. The method of claim 1, wherein the weights of the pre-trained YOLOv3 network comprise convolutional layer weights, leak-ReLU layer weights, shortcut path weights, concat layer weights, and upsample layer weights of the YOLOv3 network.

4. The method of claim 3, wherein the convolutional layer weight is converted into weight suitable for the impulse neural network by:

5. The method for object detection based on the spiking neural network according to claim 3, wherein the method for converting the leak-ReLU layer weight into the weight suitable for the spiking neural network comprises:

6. The method for detecting the target based on the impulse neural network as claimed in claim 3, wherein the method for converting the shortcut path weight into the weight suitable for the impulse neural network comprises:

y＝λ(W·X+b+X)

wherein X is the input tensor of a certain layer in the pre-trained YOLOv3 network, W is the convolution kernel weight of the layer, b is the bias of the layer, y is the output of the layer after the shortcut path is added,

7. The method for detecting the target based on the impulse neural network of claim 3, wherein the concat layer weight and the upsampling layer weight are converted into weights suitable for the impulse neural network by:

8. An object detection system based on a pulse neural network is characterized by comprising the following modules:

9. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the processor for execution by the processor to implement the method for spiking neural network-based object detection of any of claims 1-7.

10. A computer-readable storage medium storing computer instructions for execution by the computer to implement the method for spiking neural network-based object detection according to any one of claims 1-7.