CN109740619A

CN109740619A - Neural network terminal operating method and device for target identification

Info

Publication number: CN109740619A
Application number: CN201811609115.5A
Authority: CN
Inventors: 胡博; 赖永安; 唐翊洪; 孟博; 高楠
Original assignee: Beijing Aerospace Flying Equipment Technology Co Ltd
Current assignee: Beijing Aerospace Flying Equipment Technology Co Ltd
Priority date: 2018-12-27
Filing date: 2018-12-27
Publication date: 2019-05-10
Anticipated expiration: 2038-12-27
Also published as: CN109740619B

Abstract

The invention discloses a kind of neural network terminal operating methods and device for target identification, wherein this method comprises: obtaining trained depth convolutional neural networks model；The model parameter of depth convolutional neural networks is stored into external memory DDR, wherein the model parameter includes connecting layer parameter entirely and by pretreated convolution layer parameter；The model framework of depth convolutional neural networks is stored into system level chip FPGA, wherein the convolutional layer in model framework is dispensed into programmed logic module PL, and the pond layer, full articulamentum and active coating in model framework are dispensed into control unit module PS；Target image is handled using depth convolutional neural networks model, identifies target.The present invention solves the technical issues of processor in the related technology is unable to satisfy depth convolutional neural networks service requirement in target identification.

Description

Neural network terminal operating method and device for target identification

Technical field

The invention belongs to intelligent algorithm fields, are related to a kind of neural network terminal operating method and dress for target identification It sets.

Background technique

Infrared imaging is widely used application in military target seeker, wherein is moving close to strike process for bomb The identification of middle target is critically important.Target by during close the construction profile of target imaging vary widely, therefore adopt With traditional image procossing, it is difficult to extract features intrinsic out to carry out identifying processing.Using depth convolutional neural networks method into Row automatic target detection can obtain very high discrimination, but since parameter amount is very big, model parameter storage is very big asks Topic, and there is also huge challenges in terminal transplanting.

It in the related technology, is by the way that depth convolutional neural networks model by beta pruning, is carried out compression of parameters with suitable mostly Answer the condition of current hardware.But the optimization method has the limit, and as depth convolutional neural networks develop, network model Depth and width will become larger, but the resource of programmed logic module PL (Programmable Logic), such as block arbitrary access The increased speed such as memory BRAM (block random access memory), trigger FF (Flip Flop) are but much It does not catch up with.Therefore during target identification, it is unpractical that depth convolutional neural networks are fully achieved in PL.

For above-mentioned problem, currently no effective solution has been proposed.

Summary of the invention

The present invention provides a kind of neural network terminal operating methods and device for target identification, at least to solve phase Hardware processor is unable to satisfy the technical issues of depth convolutional neural networks service requirement in target identification in the technology of pass.

The technical solution of the invention is as follows: a kind of neural network terminal operating method for target identification, comprising: obtain Take trained depth convolutional neural networks model；The model parameter of the depth convolutional neural networks is stored into outside Memory DDR, wherein the model parameter includes connecting layer parameter entirely and by pretreated convolution layer parameter；By the depth The model framework of degree convolutional neural networks is stored into system level chip FPGA, wherein the convolution Layer assignment in the model framework Enter programmed logic module PL, the pond layer, full articulamentum and active coating in the model framework are dispensed into control unit module PS； Target image is handled using the depth convolutional neural networks model, identifies the target.

Optionally, the target image is handled using the depth convolutional neural networks model, identifies the mesh Mark, comprising: extract the convolution layer parameter to the PL, according to the convolutional layer and the target image, in the PL into Row convolutional layer calculates, and obtains the characteristic of the target image；Full connection layer parameter is extracted to the PS, according to the pond Layer, the full articulamentum, the active coating and the characteristic, are calculated in the PS, obtain and export the mesh Target recognition result.

Optionally, the convolution layer parameter is extracted to the PL, according to the convolutional layer and the target image, described Convolutional layer calculating is carried out in PL, obtains the characteristic of the target image, comprising: design convolutional layer computing module, the volume Lamination computing module is for carrying out convolution kernel calculating, wherein the convolutional layer has multiple, each convolutional layer of the target image In have multiple convolution kernels, the convolutional layer computing module has multiple；According to the PL, distribute different convolutional layer computing modules into Row parallel computation；According to the target image and multiple convolutional layer computing modules, the characteristic of the target image is obtained.

Optionally, convolutional layer computing module is designed, comprising: using two-dimensional fast fourier transform 2DFFT to the target Image is calculated, and the frequency domain image data of the target image is obtained；The frequency domain image data and the convolutional layer are joined Number carries out complex multiplication calculating；At data after being calculated using two-dimentional inverse fast Fourier transform 2DIFFT complex multiplication Reason, and to treated, result carries out real part extraction, and exports.

Optionally, the target image is calculated using 2DFFT, comprising: be directed to original image image IMG (x, y), successively It chooses data in each row and does 1DFFT processing:

For previous step calculated result, successively chooses data in each column and does 1DFFT processing again:

Data after being calculated using 2DIFFT complex multiplication are handled, comprising: the data after calculating complex multiplication IMG_K (x, y) successively chooses data in each row and does 1DIFFT processing:

For previous step calculated result, successively chooses data in each column and does 1DIFFT processing again:

Wherein, K₁、P₁、K₂、P₂It is parameter after frequency-domain transform.

Optionally, the pretreatment are as follows: 2DFFT calculating is carried out to reel lamination parameter.

Optionally, every calculating for having carried out a convolutional layer computing module in the PL, is notified by interruption system The scheduling that the PS is calculated next time.

Optionally, the first convolutional layer parameters precision is INT32, and the second convolutional layer parameters precision is INT16, third convolutional layer Parameters precision is INT8, and Volume Four lamination and other convolutional layer parameters precisions are INT4.

According to another aspect of the present invention, it is also proposed that another technical solution: a kind of mind for target identification Through network terminal running gear, comprising: module is obtained, for obtaining trained depth convolutional neural networks model；The One access module, for storing the model parameter of the depth convolutional neural networks into external memory DDR, wherein described Model parameter includes connecting layer parameter entirely and by pretreated convolution layer parameter；Second access module is used for the depth The model framework of convolutional neural networks is stored into system level chip FPGA, wherein the convolutional layer in the model framework is dispensed into Programmed logic module PL, pond layer, full articulamentum and active coating in the model framework are dispensed into control unit module PS；Know Other module identifies the target for handling using the depth convolutional neural networks model target image.

According to another aspect of the present invention, it is also proposed that a kind of processor, the processor is for running program, wherein Described program executes the neural network terminal operating method for target identification of above-mentioned any one when running.

Neural network terminal operating method for target identification of the invention, based on to the calculating of depth convolutional neural networks The characteristics of analyze, computationally intensive convolutional layer in network model is calculated and is built in PL, by the smaller pond layer of calculation amount, entirely Articulamentum and active coating calculating are built in PS, meanwhile, network query function scheduling also is carried out using PS, and before transplantation to convolutional layer Parameter is pre-processed, with the convolutional layer calculation amount in PL after reduction transplanting.It is different from and is joined in the related technology by beta pruning The mode for fully achieving entire depth convolutional network in PL after number compression, the present invention solve because of depth convolutional Neural net BRAM resource is few greatly, in PL for network parameter amount, and caused hardware processor is unable to satisfy depth convolutional neural networks in target identification The technical issues of service requirement, the precision for realizing target identification during to target motion change improves and speed is accelerated Technical effect.

Detailed description of the invention

Fig. 1 is the flow chart of the neural network terminal operating method according to an embodiment of the present invention for target identification；

Fig. 2 is the acceleration design diagram of convolutional layer computing module according to an embodiment of the present invention；

Fig. 3 is the neural network terminal operating system schematic according to an embodiment of the present invention for target identification；

Fig. 4 is the neural network terminal operating schematic device according to an embodiment of the present invention for target identification.

Specific embodiment

Scheme in order to enable those skilled in the art to better understand the present invention describes the present invention below in conjunction with attached drawing and implements Example.

According to embodiments of the present invention, a kind of method for providing neural network terminal operating for target identification is implemented Example, it should be noted that, in some cases, can be to be different from this although logical order is shown in flow charts The sequence at place executes shown or described step.

Fig. 1 is the flow chart of the neural network terminal operating method according to an embodiment of the present invention for target identification, such as Shown in Fig. 1, this method comprises the following steps:

Step S101 obtains trained depth convolutional neural networks model；

Step S102 stores the model parameter of depth convolutional neural networks into external memory DDR (Double Data Rate SDRAM), wherein model parameter includes connecting layer parameter entirely and by pretreated convolution layer parameter；

Step S103 stores the model framework of depth convolutional neural networks into system level chip FPGA (Field Programmable Gate Array), wherein the convolutional layer in model framework is dispensed into programmed logic module PL, model framework In pond layer, full articulamentum and active coating be dispensed into control unit module PS (Processing System)；

Step S104 is handled target image using depth convolutional neural networks model, identifies target.

It should be noted that the realization for depth convolutional neural networks, in alternative processor terminal (such as CPU, GPU, FPGA, ASIC), calculated by emulation, accounts for system to depth convolutional neural networks each layer calculating time before can obtaining The percentage of total predicted time, wherein convolutional layer occupies system-computed total time ratio highest, full articulamentum, pond layer and sharp The layer accounting time living is less, therefore it can be concluded that be directed to the optimization of depth convolutional neural networks, is mainly for convolutional layer expansion It can.

Based on being analyzed the characteristics of calculating depth convolutional neural networks, through the above steps, may be implemented of the invention real It applies in example, convolutional layer computationally intensive in network model is calculated and is built in PL, by the smaller pond layer of calculation amount, full connection Layer and active coating calculating are built in PS, meanwhile, the embodiment of the present invention also carries out network query function scheduling using PS, and before transplantation Convolution layer parameter is pre-processed, with the convolutional layer calculation amount in PL after reduction transplanting.

It is different from the related technology by after beta pruning progress compression of parameters that entire depth convolutional network is complete real in PL Existing mode, the embodiment of the present invention solves because depth convolutional neural networks parameter amount is big, BRAM resource is few in PL, caused Hardware processor is unable to satisfy the technical issues of depth convolutional neural networks service requirement in target identification, realizes to target The technical effect that the precision of target identification improves during motion change and speed is accelerated.

Optionally, target image is handled using depth convolutional neural networks model, identifies target, may include: Convolution layer parameter is extracted to PL, according to convolutional layer and target image, convolutional layer calculating is carried out in PL, obtains the spy of target image Levy data；Full connection layer parameter is extracted to be counted in PS to PS according to pond layer, full articulamentum, active coating and characteristic It calculates, obtains and export the recognition result of target.I.e. by carrying out convolutional layer calculating in PL, pond layer, Quan Lian are carried out in PS The mode for connecing layer and active coating calculating solves depth convolutional neural networks extension bring Transplanting Problem, and further realizes Identification accelerates.

It is further preferred that extracting convolution layer parameter to PL, according to convolutional layer and target image, convolutional layer is carried out in PL It calculates, obtains the characteristic of target image, comprising: design convolutional layer computing module, convolutional layer computing module is for being rolled up Product assesses calculation, wherein convolutional layer has multiple, there is multiple convolution kernels, convolutional layer computing module in each convolutional layer of target image Have multiple；According to PL, distributes different convolutional layer computing modules and carry out parallel computation；According to target image and multiple convolutional layer meters Module is calculated, the characteristic of target image is obtained.

Infrared imaging is widely used in military target seeker, primarily directed to the more aobvious protrusion of demand of the identification of target, For the identification of infrared target, since the resolution ratio of target seeker infrared image itself is just very low, for the knowledge of infrared target Other demand is very big.During the neural network terminal operating for target identification for small size infrared target, due to small The features such as size infrared image, size is small, single channel, Cheng Qian is crossed in input processor predictably terminal imagination, it first can be to input Target image is standardized, during leaning on close-target from the distant to the near again due to bomb, size of the target in visual field (how much is shared pixel) is different, so being directed to realistic objective size feature, determines modular size, by input candidate regions target contracting It is put into formulation size, then inputs convolutional neural networks and carries out target identification.

Then in conjunction with the design size for having trained convolution kernel in model parameter in model, carries out corresponding convolutional layer and calculate mould Block design.For small size infrared target, convolutional layer can extract object at 3-5 layers well in depth convolutional neural networks The feature of body, it is contemplated that the reusability of computing module in same layer convolutional layer, therefore convolutional layer calculates mould in each convolutional layer Block needs at least to design one, then carries out respective classes convolutional layer computing module for the size of different convolutional layers input picture Design, while according to processor resource handling characteristics, same category of convolutional layer computing module can carry out multiple designs, with Just in calculating same layer convolutional layer, multistage flowing water parallel computation processing is may be implemented in multiple convolution kernels in algorithm level, is kept away The idle waste of logical resource is exempted from.

In order to further carry out calculating acceleration in algorithm level, it is preferred that design convolutional layer computing module, may include: Target image is counted using two-dimensional fast fourier transform 2DFFT (2D-Fast Fourier Transformation) It calculates, obtains the frequency domain image data of target image；Frequency domain image data and convolution layer parameter are subjected to complex multiplication calculating；Using After two-dimentional inverse fast Fourier transform 2DIFFT (2D-Inverse Fast Fourier Transform) calculates complex multiplication Data handled, and real part extraction is carried out to treated result, and export.

Fig. 2 is the acceleration design diagram of convolutional layer computing module according to an embodiment of the present invention, as shown in Fig. 2, this hair Bright embodiment pre-processes convolution layer parameter first before transplanting to network model, after transplanting, in input target figure When picture, convolutional layer computing module does 2DFFT calculating to target image.Complex multiplication is done in conjunction with pretreated convolution layer parameter It calculates, the result after calculating is finally done into 2DIFFT calculating, real part extraction is carried out for final calculation result, as each convolution The output of layer computing module.

Wherein, 2DFFT calculating the pretreatment behavior of above-mentioned convolution layer parameter: is carried out to reel lamination parameter.By moving Convolution layer parameter is pre-processed before planting, it is possible to reduce the convolutional layer calculation amount after transplanting in PL, and then to whole network mould Type computing capability is further speeded up.

Meanwhile accelerating convolutional layer calculating that can not only reduce calculation amount by FFT, the logical resource in less PL is occupied, And it can accelerate to calculate, reduce and calculate the time.

Wherein, original image image is IMG (x, y), (x=0,1 ..., M-1；Y=0,1 ..., N-1), pretreated convolution Layer parameter complex matrix be Kernel (x, y), (x=0,1 ..., 2M-1；Y=0,1 ..., N-1), x, y respectively indicate X-Y scheme The position of transverse and longitudinal coordinate as in.Wherein, calculating process are as follows:

The first step successively chooses data in each row and does 1DFFT processing for original image image IMG (x, y):

Second step successively chooses data in each column and does 1DFFT processing again for the calculated result of the first step:

The calculated result of second step and Kernel (x, y) are multiplied by corresponding position and do complex multiplication by third step:

IMG_K (x, y)=IMG₂(x,y)*Kernel(x,y)；

4th step, the result after third step is calculated successively choose data in each row and do 1DIFFT processing:

5th step successively chooses data in each column for four-step calculation result and does 1DIFFT processing again:

6th step takes output result IMG in the 5th step₄The real part of (x, y) exports.

Preferably, every calculating for having carried out a convolutional layer computing module in PL, by interruption system notify PS into The scheduling that row calculates next time.In the calculating process of each convolutional layer computing module, the embodiment of the present invention can be according to processing Device resources of chip capacity feature carries out convolutional layer computing module to parallel computation and is allocated.Fig. 3 is according to embodiments of the present invention The neural network terminal operating system schematic for target identification, as shown in figure 3, each convolutional layer computing module is based on AXI_Lite interface by PS control and scheduling, after the completion of each calculating of each convolutional layer computing module, i.e., to each volume After the completion of convolution kernel in lamination computing module calculates, PL can notify PS, PS to complete new calculating scheduling by interrupting system, And the multistage pipeline computing of an IP progress new round for corresponding waiting is controlled by GP mouthfuls, wherein each convolutional layer computing module Comprising CNN accelerating part and the control section CNN, ANN is full articulamentum computing module, can be according to chip logic resource service condition The calculating of full articulamentum is transplanted to calculate in PL and is completed, the HP mouthfuls of high speed bus interfaces communicated between PS and PL.

In order to promote generalization ability of the depth convolutional neural networks when extracting characteristic feature, it is preferred that in network structure, First convolutional layer parameters precision can be INT32, and the second convolutional layer parameters precision can be INT16, third convolution layer parameter essence Degree can be INT8, and Volume Four lamination and other convolutional layer parameters precisions are INT4.Depth is rolled up in the embodiment of the present invention Product neural network parameter precision is calculated by the way of (INT32 → INT16 → INT8 → INT4) accordingly using successively successively decreasing, Continue to expand convolutional layer if necessary, then with the Extended Precision of minimum INT4.After parameter floating data turns fixed point, can not only it subtract Few parameter storage, occupies the time delay that less storage resource and output transmission are come, and point parameter is calculated relative to floating point parameters Calculating can occupy less logical resource and calculation delay is reduced.

Wherein, due to small size single channel infrared image results of intermediate calculations carry out FFT accelerate calculate during caching compared with It is small, it is possible to select according to the actual situation, convolutional layer accelerates result after calculating to be stored in BRAM, or by HP1 and HP2 interface is buffered in external memory.When result data amount is larger, such as high definition image application, due to centre Store results are larger in calculating process, cannot be stored in BRAM resource, need to dump in external memory, therefore the storage side The flexible design of formula can satisfy practical application request.

Further, the processor chips in the embodiment of the present invention can select the XC7Z030-2FBG484I of XLINX.

According to embodiments of the present invention, the device for additionally providing a kind of neural network terminal operating for target identification is implemented Example, Fig. 4 is the neural network terminal operating schematic device according to an embodiment of the present invention for target identification, as shown in figure 4, The device includes: to obtain module 41, the first access module 42, the second access module 43, identification module 44, wherein

Module 41 is obtained, for obtaining trained depth convolutional neural networks model；

First access module 42 is connected to and obtains module 41, for storing the model parameter of depth convolutional neural networks Into external memory DDR, wherein model parameter includes connecting layer parameter entirely and by pretreated convolution layer parameter；

Second access module 43 is connected to the first access module 42, for by the model framework of depth convolutional neural networks It stores into system level chip FPGA, wherein the convolutional layer in model framework is dispensed into programmed logic module PL, in model framework Pond layer, full articulamentum and active coating are dispensed into control unit module PS；

Identification module 44 is connected to the second access module 43, for using depth convolutional neural networks model to target figure As being handled, target is identified.

According to embodiments of the present invention, a kind of processor is additionally provided, the processor is for running program, wherein program fortune The neural network terminal operating method for target identification of above-mentioned any one is executed when row.

The content that description in the present invention is not described in detail belongs to the well-known technique of those skilled in the art.

Claims

1. a kind of neural network terminal operating method for target identification characterized by comprising

Obtain trained depth convolutional neural networks model；

The model parameter of the depth convolutional neural networks is stored into external memory DDR, wherein the model parameter includes It is complete to connect layer parameter and pass through pretreated convolution layer parameter；

The model framework of the depth convolutional neural networks is stored into system level chip FPGA, wherein in the model framework Convolutional layer be dispensed into programmed logic module PL, the pond layer, full articulamentum and active coating in the model framework are dispensed into control Unit module PS processed；

Target image is handled using the depth convolutional neural networks model, identifies the target.

2. the method according to claim 1, wherein using the depth convolutional neural networks model to the mesh Logo image is handled, and identifies the target, comprising:

It extracts the convolution layer parameter and is rolled up in the PL to the PL according to the convolutional layer and the target image Lamination calculates, and obtains the characteristic of the target image；

Full connection layer parameter is extracted to the PS, according to the pond layer, the full articulamentum, the active coating and the feature Data are calculated in the PS, obtain and export the recognition result of the target.

3. according to the method described in claim 2, it is characterized in that, extracting the convolution layer parameter to the PL, according to described Convolutional layer and the target image carry out convolutional layer calculating in the PL, obtain the characteristic of the target image, wrap It includes:

Convolutional layer computing module is designed, the convolutional layer computing module is for carrying out convolution kernel calculating, wherein the convolutional layer has It is multiple, there are multiple convolution kernels in each convolutional layer of the target image, the convolutional layer computing module has multiple；

According to the PL, distributes different convolutional layer computing modules and carry out parallel computation；

According to the target image and multiple convolutional layer computing modules, the characteristic of the target image is obtained.

4. according to the method described in claim 3, it is characterized in that, design convolutional layer computing module, comprising:

The target image is calculated using two-dimensional fast fourier transform 2DFFT, obtains the frequency domain of the target image Image data；

The frequency domain image data and the convolution layer parameter are subjected to complex multiplication calculating；

Data after being calculated using two-dimentional inverse fast Fourier transform 2DIFFT complex multiplication are handled, and to treated As a result real part extraction is carried out, and is exported.

5. according to the method described in claim 4, it is characterized in that,

The target image is calculated using 2DFFT, comprising:

For original image image IMG (x, y), successively chooses data in each row and does 1DFFT processing:

Data after being calculated using 2DIFFT complex multiplication are handled, comprising:

Data IMG_K (x, y) after calculating complex multiplication successively chooses data in each row and does 1DIFFT processing:

6. the method according to claim 1, wherein the pretreatment are as follows: carry out 2DFFT to reel lamination parameter It calculates.

7. according to the method described in claim 3, it is characterized in that, every in the PL carried out a convolutional layer computing module Calculating, the scheduling for notifying the PS to be calculated next time by interruption system.

8. according to the method described in claim 3, it is characterized in that, the first convolutional layer parameters precision is INT32, the second convolutional layer Parameters precision is INT16, and third convolutional layer parameters precision is INT8, and Volume Four lamination and other convolutional layer parameters precisions are INT4。

9. a kind of neural network terminal operating device for target identification characterized by comprising

Module is obtained, for obtaining trained depth convolutional neural networks model；

First access module, for storing the model parameter of the depth convolutional neural networks into external memory DDR, In, the model parameter includes connecting layer parameter entirely and by pretreated convolution layer parameter；

Second access module, for storing the model framework of the depth convolutional neural networks into system level chip FPGA, In, the convolutional layer in the model framework is dispensed into programmed logic module PL, pond layer, full articulamentum in the model framework Control unit module PS is dispensed into active coating；

Identification module identifies the target for handling using the depth convolutional neural networks model target image.

10. a kind of processor, which is characterized in that the processor is for running program, wherein right of execution when described program is run Benefit require 1 to 8 described in any one the neural network terminal operating method for target identification.