CN106529517A

CN106529517A - Image processing method and image processing device

Info

Publication number: CN106529517A
Application number: CN201611255019.6A
Authority: CN
Inventors: 曹宇辉; 梁喆; 张宇翔; 温和; 周舒畅; 周昕宇
Original assignee: Beijing Megvii Technology Co Ltd; Beijing Aperture Science and Technology Ltd
Current assignee: Beijing Kuangshi Technology Co Ltd; Beijing Megvii Technology Co Ltd; Beijing Aperture Science and Technology Ltd
Priority date: 2016-12-30
Filing date: 2016-12-30
Publication date: 2017-03-22
Anticipated expiration: 2036-12-30
Also published as: CN106529517B

Abstract

The invention provides an image processing method and an image processing device, which are realized by using a field programmable gate array (FPGA), of a neural network algorithm for image processing. The image processing method comprises the steps that a first convolution calculation unit performs a first layer of convolution calculation on input image data so as to generate a first layer of feature data; the first layer of feature data is stored into a storage unit; and a second convolution calculation unit reads the first layer of feature data from the storage unit, and executes a preset number of layers of convolution calculation and generates a convolution calculation result for the input image data, wherein the corresponding calculation result is stored into the storage unit at the end of each layer of convolution calculation, and the first convolution calculation unit and the second convolution calculation unit are configured by the field programmable gate array.

Description

Image processing method and image processing equipment

Technical field

It relates to image processing field, more specifically, it relates to utilize field programmable gate array (FPGA) Realize the image processing method and image processing equipment of neural network algorithm for image procossing.

Background technology

Target detection is a basic research topic in Computer Image Processing field, and which is in recognition of face, safety Many aspects such as monitoring and dynamic tracing all have wide practical use.Target detection is referred to for any one frame or successive frame The wherein specific target (such as face) of image, detection and identification, and the position of target, size information is returned, such as output bag Enclose the bounding box of target.Neutral net is a kind of extensive, instrument of multi-parameters optimization.By substantial amounts of training data, nerve Network can learn the hiding feature for being difficult to summarize in data, so as to complete the task of multinomial complexity, such as Face datection, picture Classification, object detection, action are followed the trail of, natural language translation etc..Neutral net is extensively applied by artificial intelligence circle.Currently, it is all It is convolutional neural networks (CNN) as most widely used in the target detection of pedestrian detection.Existing image processing method is usual Objectives detection function is realized using single chip, often computation capability is limited due to these chips, it is impossible to adapt to The Face datection algorithm of existing use neural fusion, causes the face Limited Number that can capture and efficiency of algorithm is low.

Used as a kind of general-purpose chip, which realizes by way of by Algorithm mapping to hardware parallel field programmable gate array Calculate, so as to data throughput is high, under equal amount of calculation the characteristics of low in energy consumption, low price.At present, can be compiled using scene Journey gate array is primarily present two shortcomings come the scheme for realizing CNN algorithms：First is to use digital signal processor (DSP) in a large number Resource carries out parallel computation, due to the DSP quantity of on-site programmable gate array internal extremely limited (typically hundreds of), so The deficiency of DSP resources limits the parallel computation amount in unit period；Second is mostly using C language etc. in higher synthesises instrument On write, then be optimized with low-level hardware language, and adopt 8 bit fixed points or floating number to represent parameter, it is final realize it is general CNN algorithms.Although this framework mode can reconstruct CNN networks, field programmable gate array resource is wasted, it is impossible to completely Play the computing capability of field programmable gate array.

Accordingly, it is desirable to provide a kind of resource for making full use of field programmable gate array realizes the nerve for target detection The image processing method and image processing equipment of network algorithm.

The content of the invention

Propose the present invention in view of the above problems.The invention provides a kind of realize using using field programmable gate array In the image processing method and image processing equipment of the neural network algorithm of image procossing.

According to one embodiment of the disclosure, there is provided a kind of image processing method, including：By the first convolution computing unit Ground floor convolutional calculation is performed to input image data, ground floor characteristic is generated；The ground floor characteristic is stored To memory element；And the ground floor characteristic is read by the second convolution computing unit from the memory element, perform pre- The convolutional calculation of given layer number, generates for the convolutional calculation result of the input image data, wherein in every layer of convolutional calculation knot By the storage of corresponding result of calculation to the memory element after beam；Wherein, the first convolution computing unit and the volume Two Product computing unit is configured by field programmable gate array.

Additionally, the image processing method of the one embodiment according to the disclosure, the wherein described convolution for performing predetermined number of layers Calculating includes：Next layer characteristic to be calculated and pre- is read by the second convolution computing unit from the memory element Determine convolutional calculation parameter, perform intermediate layer convolutional calculation, to generate intermediate layer characteristic；It is incremented by the counting in the intermediate layer Value, and judge whether the count value reaches predetermined value, wherein in the case where the count value does not reach the predetermined value, By intermediate layer characteristic storage to the memory element, and return performs centre by the second convolution computing unit Layer convolutional calculation；In the case where the count value reaches the predetermined value, the intermediate layer characteristic is exported as described Convolutional calculation result.

Additionally, the image processing method of the one embodiment according to the disclosure, wherein described input image data has the One data width, the ground floor characteristic have the second data width, and wherein described second data width is less than described the One data width.

Additionally, the image processing method of the one embodiment according to the disclosure, execution intermediate layer convolutional calculation is also wrapped Include：According to the structure setting of the neutral net realized by the field programmable gate array, the second convolution computing unit is defeated Enter port number and output channel number；And lead to more than the input of the second convolution computing unit in the input channel number that need to be processed When road number and/or the output channel number that need to be processed are more than the output channel number of the second convolution computing unit, according to the need The input channel number of process, the input channel number of the second convolution computing unit, the output channel number that need to be processed and The output channel number of the second convolution computing unit, controls execution cycle of each intermediate layer convolutional calculation and described Storage cycle of the intermediate layer characteristic to the memory element.

Additionally, the image processing method of the one embodiment according to the disclosure, wherein described intermediate layer convolutional calculation includes Anti- pond process, convolutional calculation process and pondization are processed, and execution intermediate layer convolutional calculation also includes：For it is each it is described in Interbed convolutional calculation, before convolutional calculation process, optionally performs the anti-pondization and processes；And for each institute Intermediate layer convolutional calculation is stated, after convolutional calculation process, the pondization is optionally performed and is processed.

Additionally, the described image processing method of the one embodiment according to the disclosure, wherein described intermediate layer convolutional calculation Also include that Combined Treatment and selection are processed, execution intermediate layer convolutional calculation also includes：For each intermediate layer convolution Calculate, before convolutional calculation process, perform the Combined Treatment, with the characteristic number that the next layer for combining reading is to be calculated According to；For each intermediate layer convolutional calculation, after convolutional calculation process, perform the selection and process, to select For the data channel of the intermediate layer characteristic of storage to the memory element.

Additionally, the image processing method of the one embodiment according to the disclosure, wherein, the memory element bandwidth not When sufficient, the convolutional calculation that automatic pause is performed by the second convolution computing unit simultaneously retains scene.

Exemplarily, above-mentioned image processing method is realized by photographic head, and the photographic head includes the first convolution meter Calculate unit, the second convolution computing unit and the memory element.

According to another embodiment of the disclosure, there is provided a kind of image processing equipment, including：First convolutional calculation list Unit, for performing ground floor convolutional calculation to input image data, generates ground floor characteristic；Memory element, for storing The ground floor characteristic；Second convolution computing unit, for reading the ground floor characteristic from the memory element, The convolutional calculation of predetermined number of layers is performed, is generated for the convolutional calculation result of the input image data；And control unit, use In control the first convolution computing unit, the memory element and the second convolution computing unit；Wherein, the storage is single Unit is additionally operable to store the result of calculation of every layer of convolutional calculation that the second convolution computing unit is performed, first convolutional calculation Unit and the second convolution computing unit are configured by field programmable gate array.

Additionally, according to the image processing equipment of another embodiment of the disclosure, wherein described control unit control is described Second convolution computing unit reads next layer characteristic to be calculated and predetermined convolutional calculation parameter from the memory element, Intermediate layer convolutional calculation is performed, to generate intermediate layer characteristic；And described control unit is incremented by the counting in the intermediate layer Value, and judge whether the count value reaches predetermined value, wherein in the case where the count value does not reach the predetermined value, By intermediate layer characteristic storage to the memory element, and return in control the second convolution computing unit execution Interbed convolutional calculation；In the case where the count value reaches the predetermined value, the intermediate layer characteristic is exported as institute State convolutional calculation result.

Additionally, according to the image processing equipment of another embodiment of the disclosure, wherein described input image data has First data width, the ground floor characteristic have the second data width, and wherein described second data width is less than described First data width.

Additionally, according to the image processing equipment of another embodiment of the disclosure, wherein described control unit is according to by institute State the structure of the neutral net of field programmable gate array realization, arrange the second convolution computing unit input channel number and Output channel number；And the input channel number that need to be processed more than the second convolution computing unit input channel number and/or When the output channel number that need to be processed is more than the output channel number of the second convolution computing unit, according to the input that need to be processed Port number, the input channel number of the second convolution computing unit, the output channel number that need to be processed and the volume Two The output channel number of product computing unit, controls the execution cycle of each intermediate layer convolutional calculation and the intermediate layer feature Storage cycle of the data to the memory element.

Additionally, according to the image processing equipment of another embodiment of the disclosure, wherein described volume Two accumulates computing unit Subelement, convolutional calculation are processed including anti-pondization and processes subelement and pondization process subelement, for each intermediate layer is rolled up Product is calculated, and before convolutional calculation process, the described control unit control anti-pondization processes subelement and optionally holds The row anti-pondization is processed；And for each intermediate layer convolutional calculation, after convolutional calculation process, the control Unit processed controls the pondization process subelement and optionally performs the pondization process.

Additionally, according to the image processing equipment of another embodiment of the disclosure, wherein described volume Two accumulates computing unit Also include Combined Treatment subelement and select to process subelement, for each intermediate layer convolutional calculation, in the convolution meter Before calculation is processed, described control unit controls the Combined Treatment subelement and performs the Combined Treatment, to combine under reading One layer of characteristic to be calculated；For each intermediate layer convolutional calculation, after convolutional calculation process, the control Unit control processed is described to be selected to process the subelement execution selection process, to select to arrive the institute of the memory element for storage State the data channel of intermediate layer characteristic.

Additionally, according to the image processing equipment of another embodiment of the disclosure, wherein, described control unit is additionally operable to When the bandwidth of the memory element is not enough, the convolutional calculation that automatic pause is performed by the second convolution computing unit is simultaneously protected Stay scene.

Additionally, according to the image processing equipment of another embodiment of the disclosure, wherein, described control unit is described existing Arm processor inside field programmable gate array, and/or the first convolution computing unit includes the field-programmable gate array At least a portion look-up table in row, and/or during the second convolution computing unit includes the field programmable gate array At least a portion look-up table, and/or the memory element is the memorizer for being placed on the field programmable gate array.

In a specific example, the image processing equipment is photographic head.

Image processing method and image processing equipment, are realized using field programmable gate array in accordance with an embodiment of the present disclosure Low bit CNN algorithm, realization make full use of the resource of field programmable gate array to improve computing power；The convolution unit wherein realized Increased can the port number of static configuration, data bit width number, can dynamic configuration cycle-index and mode, coordinate adjustable parameters Control unit, can quickly realize the CNN models of different frameworks.

It is understood that foregoing general description and detailed description below are both exemplary, and it is intended to In further illustrating for the claimed technology of offer.

Description of the drawings

The embodiment of the present disclosure is described in more detail by combining accompanying drawing, above-mentioned and other purposes of the disclosure, Feature and advantage will be apparent from.Accompanying drawing is used for providing further understanding the embodiment of the present disclosure, and constitutes explanation A part for book, together with the embodiment of the present disclosure is used for explaining the disclosure, does not constitute restriction of this disclosure.In the accompanying drawings, Identical reference number typically represents same parts or step.

Fig. 1 is the block diagram for illustrating the image processing equipment according to the embodiment of the present disclosure.

Fig. 2 is the flow chart for illustrating the image processing method according to the embodiment of the present disclosure.

Fig. 3 is the flow chart for further illustrating the image processing method according to the embodiment of the present disclosure.

Fig. 4 is the block diagram for further illustrating the image processing equipment according to the embodiment of the present disclosure.

Fig. 5 is that the intermediate layer convolutional calculation further illustrated in the image processing method according to the embodiment of the present disclosure is processed Flow chart.

Fig. 6 is the schematic diagram for illustrating the image processing system according to the embodiment of the present disclosure.

Specific embodiment

In order that the purpose of the disclosure, technical scheme and advantage become apparent from, root is described below with reference to accompanying drawings in detail According to the example embodiment of the disclosure.Obviously, described embodiment is only a part of this disclosure embodiment, rather than this public affairs The whole embodiments opened, it should be appreciated that the disclosure is not limited by example embodiment described herein.Described in the disclosure The embodiment of the present disclosure, those skilled in the art's all other embodiment resulting in the case where creative work is not paid All should fall within the protection domain of the disclosure.Hereinafter, each embodiment of the disclosure will be described in detail with reference to the attached drawings.

First, see figures.1.and.2 image processing equipment and its image processing method of the general introduction according to the embodiment of the present disclosure.

Fig. 1 is the block diagram for illustrating the image processing equipment according to the embodiment of the present disclosure.Image processing equipment as shown in Figure 1 10 can be only fitted in the photographic head for the video monitoring for performing special scenes.Or, image processing equipment 10 is photographic head, Image processing equipment 10 can also include other intrawares, such as camera lens, imageing sensor etc. in addition to component disclosed in Fig. 1. Or, image processing equipment 10 as shown in Figure 1 is can be only fitted to for the shooting from the video monitoring for performing special scenes In the identification of video data performance objective and the server of image procossing that head is provided.Alternately, image procossing as shown in Figure 1 Equipment 10 can be the special image processing equipment for performance objective identification configured between photographic head and server.In this public affairs Open in each embodiment, face, pedestrian, vehicle etc., here can be included but is not limited to by the target that image processing equipment 10 is detected It is not defined.In certain embodiments, illustrate so that the target for detecting is face as an example.

Specifically, include that the first convolution computing unit 101, storage are single according to the image processing equipment 10 of the embodiment of the present disclosure First 102, second convolution computing unit 103 and control unit 104.First convolution computing unit 101 and the second convolution computing unit 103 can be configured by least a portion look-up table (LUT) in field programmable gate array.Memory element 102 can be external Configure in the memorizer of field programmable gate array.Control unit 104 can be by the ARM process of on-site programmable gate array internal Device is configured.It is easily understood that above-mentioned configuration nonrestrictive, but including being provided using other of field programmable gate array The configuration mode in source.

First convolution computing unit 101 generates ground floor special for performing ground floor convolutional calculation to input image data Levy data.As described above, in one embodiment of the disclosure, the first convolution computing unit 101 is by field programmable gate array In at least a portion LUT configuration.Input image data is by such as charge-coupled image sensor (CCD) or CMOS (Complementary Metal Oxide Semiconductor) half The video data of the imageing sensor collection of conductor device (CMOS).For example, input image data be 1080P, 30fps or The video data of 720P, 60fps form.Additionally, in one embodiment of the disclosure, being realized using field programmable gate array Low bit convolutional neural networks (BCNN) algorithm, it is 2 bits wherein to carry out the characteristic pattern of convolution, the bit number of weight.Due to defeated Enter the trichroism data of RGB that view data is usually 8 bits, while video input is image stream at the uniform velocity, so the first convolution meter Unit 101 is calculated by ground floor convolutional calculation being performed to input image data, and 2 bit datas that obtaining middle convolutional layer needs are made For ground floor characteristic.In one embodiment of the disclosure, input image data has the first data width, and ground floor is special Data are levied with the second data width, wherein the second data width is less than the first data width.By ensureing the second data width Less than the first data width, it is ensured that what ground floor convolutional calculation was performed is that the convolution based on low bit convolutional neural networks is transported Calculate, so as to improve.

Memory element 102 is used for storing ground floor characteristic.As described above, in one embodiment of the disclosure, depositing Storage unit 102 can be to be configured by the memorizer for being placed on field programmable gate array, and such as memory element 102 can be external In the DDR3 memorizeies of field programmable gate array.

Second convolution computing unit 103 performs predetermined number of layers for reading ground floor characteristic from memory element 102 Convolutional calculation, generates for the convolutional calculation result of input image data.As described above, in one embodiment of the disclosure, Second convolution computing unit 103 is configured by least a portion LUT in field programmable gate array.Utilizing field programmable gate In the case that array realizes low bit convolutional neural networks algorithm, 18 layers of convolutional calculation are always co-existed in, wherein except by so Other 17 layers of convolutional calculation outside ground floor convolutional calculation that one convolution computing unit 101 is performed are all by the second convolution computing unit 103 perform.After second convolution computing unit 103 often performs one layer of convolutional calculation, just using result of calculation as intermediate layer characteristic number According to storage in memory element 102.When second convolution computing unit 103 performs next layer of convolutional calculation, just from memory element 102 Next layer characteristic to be calculated and predetermined convolutional calculation parameter is read, predetermined convolutional calculation parameter is including but not limited to rolled up The parameters such as product weighted value.In one embodiment of the disclosure, predetermined convolutional calculation parameter is via advance after training process It is stored in memory element 102.After second convolution computing unit 103 has performed the convolutional calculation of predetermined number of layers, it is input into The target thermodynamic chart of view data, that is, the probability that there is target in being displayed in each pixel of input picture.By the second convolution meter The end layer convolutional calculation result (for example, target thermodynamic chart) for calculating the acquisition of unit 103 can be defeated via output unit (not shown) Go out.

Control unit 104 is used for controlling the first convolution computing unit 101, memory element 102 and the second convolution computing unit 103.As described above, in one embodiment of the disclosure, control unit 104 can be by on-site programmable gate array internal Arm processor is configured.

Specifically, in one embodiment of the disclosure, control unit 104 can control the first convolution computing unit 101 The time started of ground floor convolutional calculation is performed to input image data, and obtains its dwell time.Control unit 104 simultaneously can With the second convolution computing unit 103 for controlling perform the number of repetition of convolutional calculation of predetermined number of layers, the beginning and ending time, convolution it is logical Road quantity.Control unit 104 can also control the output format of end layer convolutional calculation result and beginning and ending time.Control unit 104 are additionally operable to when the bandwidth of memory element 102 is not enough, the volume that automatic pause is performed by the second convolution computing unit 103 Product is calculated and retains scene.By automatic pause and retain execute-in-place, memory cell data can be avoided to overflow, it is ensured that convolution What is operated is carried out in order.

Additionally, control unit 104 further can read the convolutional calculation result of end layer from memory element 102, Carry out the calculating of next step.Due to the very little of the data volume after convolution, serial meter can be carried out by control unit 104 completely Calculate.And the calculating process of control unit 104 and the first convolution computing unit 101 and the convolution of the second convolution computing unit 103 Calculate, and view data input, the output of convolutional calculation result can carry out parallel, with time-consuming spending.

More specifically, for the control of the number of channels of convolution, control unit 104 can be according to by field-programmable The structure of the neutral net that gate array is realized, arranges the input channel number and output channel number of the second convolution computing unit 103；And And the input channel number that need to be processed more than the second convolution computing unit 103 input channel number and/or need to process it is defeated Go out port number more than the second convolution computing unit output channel it is several 103 when, according to the input channel number that need to be processed, The input channel number of the second convolution computing unit 103, the output channel number that need to be processed and the second convolution meter The output channel number of unit 103 is calculated, the execution cycle of each intermediate layer convolutional calculation and the intermediate layer feature is controlled Storage cycle of the data to the memory element 102.For example, in one embodiment of the disclosure, the BCNN algorithms of employing lead to Road number includes various numerical value such as 32,64,128,256.In order to save field programmable gate array resource, convolutional calculation module it is logical Road number static state setting is the input of 32 tunnels, the output of 16 tunnels.It is the data for calculating 32 tunnel input channels in each clock cycle, and changes For 16 passage outputs.For the passage of other quantity, control unit 104 by the execution step number of dynamic configuration each passage come Realize.For example, when being the input of 32 paths, the output of 32 tunnels, then with two clock cycle, 104 dynamic adjustment input of control unit Retention time is clapped for two, makes first clock cycle identical with the input data of second clock cycle, but first clock week The coefficient of the input of phase and low 16 tunnel output channel does computing, and the input of second clock cycle with high 16 tunnel output channel is Number does computing；If the input of 64 paths, the output of 32 tunnels, the input of 64 tunnels of Ze Jiang is divided into the input of 32 tunnels of Liang Ge, with four clocks Cycle complete calculate, the first two clock cycle as described above, calculate low 32 tunnel input channel 32 tunnel output results, afterwards two when 32 tunnel output results of the high 32 tunnel input channel of clock computation of Period.Control unit 104 always according to network structure, in different convolution Layer arranges different execution step numbers, and determines the time of read-write memory cell 102 according to per layer of execution time.Second convolution meter After calculation unit 103 has performed the calculating of data, this result of calculation is written to into memory element 102 just, while reading next Individual data to be calculated.Operation is written and read just after the completion of without waiting all of result of calculation all.The present embodiment to volume The control mode of long-pending number of channels, it is possible to achieve process various input and output in the case where Artificial Neural Network Structures are fixed The convolutional calculation of port number, relative to using the various I/O channel number convolution of various neural network models difference alignment processing The mode of calculating, can save time and the cost of the different neural network models of training, based on significant advantage.

Fig. 2 is the flow chart for illustrating the image processing method according to the embodiment of the present disclosure.Shown in Fig. 2 according to disclosure reality The image processing method 20 for applying example is performed by image processing equipment 10 as shown in Figure 1.As shown in Figure 2 implements according to the disclosure The image processing method 20 of example is comprised the following steps.

In step s 201, ground floor convolutional calculation is performed to input image data by the first convolution computing unit, is generated Ground floor characteristic.In one embodiment of the disclosure, configured by least a portion LUT in field programmable gate array The first convolution computing unit 101 pairs aoxidized by such as charge-coupled image sensor (CCD) or complementary metal as input image data The video data of the imageing sensor collection of thing semiconductor device (CMOS) performs ground floor convolutional calculation, obtains middle convolutional layer 2 bit datas for needing are used as ground floor characteristic number.Hereafter, process and enter step S202.

In step S202, by the storage of ground floor characteristic to memory element 102.In one embodiment of the disclosure In, the memory element 102 configured by the memorizer for being placed on field programmable gate array stores ground floor characteristic.Hereafter, Process enters step S203.

In step S203, ground floor characteristic is read by the second convolution computing unit from memory element, perform predetermined The convolutional calculation of the number of plies, generates for the convolutional calculation result of input image data.In one embodiment of the disclosure, by existing The second convolution computing unit 103 of at least a portion LUT configuration in field programmable gate array performs 17 layers of convolutional calculation, obtains The target thermodynamic chart of input image data.

Above step S201 to S203 is in the arm processor configuration control unit 104 by on-site programmable gate array internal Control under perform.

By referring to image processing equipment and its image processing method according to the embodiment of the present disclosure that Fig. 1 and Fig. 2 is described, Low bit CNN algorithm is realized using field programmable gate array, realization makes full use of the resource of field programmable gate array to improve Computing power；The convolution unit wherein realized increased can the port number of static configuration, data bit width number, can dynamic configuration circulation Number of times and mode, coordinate the control unit of adjustable parameters, can quickly realize the CNN models of different frameworks.

In a specific example, it is photographic head according to the image processing equipment of the disclosure, the image processing equipment is except the Outside the components such as one convolution computing unit, memory element, the second convolution computing unit, control unit, can also include such as camera lens, The other assemblies such as imageing sensor, imageing sensor can be used for forming the input data.By utilizing existing in photographic head Field programmable gate array realizes low bit CNN algorithm, can improve the image-capable of photographic head itself, realizes some images Process operation (such as Face datection, facial image intercepting etc.) locally to complete in photographic head, take relative to relying in prior art Business device realizes the mode that associated picture is processed, and can mitigate the computing pressure of server.

Hereinafter, Fig. 3 to Fig. 5 will be referred to further and describes image processing equipment and its figure according to the embodiment of the present disclosure in detail As processing method.

Fig. 3 is the flow chart for further illustrating the image processing method according to the embodiment of the present disclosure.Basis as shown in Figure 3 The image processing method 30 of the embodiment of the present disclosure is comprised the following steps.

Step S301 and S302 shown in Fig. 3 is identical with S201 the step of description with reference to Fig. 2 and S202 respectively, and here will be saved Slightly its repeated description.

In step s 302 ground floor characteristic is stored to after memory element, process enters step S303.In step In rapid S303, next layer characteristic to be calculated and predetermined convolution meter is read by the second convolution computing unit from memory element Parameter is calculated, intermediate layer convolutional calculation is performed, intermediate layer characteristic is generated.In one embodiment of the disclosure, by volume Two Product computing unit reads the result of calculation that next layer characteristic to be calculated must not be last layer convolutional calculation.Predetermined convolution Calculating parameter is joined via convolution weighted value being stored in advance in after training process in memory element 102 etc. Number.

Additionally, the intermediate layer convolutional calculation performed in step S302 is also included according to by field programmable gate array realization The input channel number and output channel number of the second convolution of structure setting computing unit 103 of neutral net；And need to process The input channel number and/or the output channel number that need to be processed that input channel number is more than the second convolution computing unit is more than described During the output channel number of the second convolution computing unit 103, according to the input channel number that need to be processed, second convolutional calculation The output of the input channel number of unit 103, the output channel number that need to be processed and the second convolution computing unit 103 is led to Road number, the execution cycle of each intermediate layer convolutional calculation of control and the intermediate layer characteristic are to the memory element 102 storage cycle.For example, in one embodiment of the disclosure, the BCNN algorithm channel numbers of employing include 32,64,128, Various numerical value such as 256.In order to save field programmable gate array resource, the port number static state setting of convolutional calculation module is 32 tunnels Input, the output of 16 tunnels.It is the data for calculating 32 tunnel input channels in each clock cycle, and is converted to 16 passage outputs.It is right In the passage of other quantity, control unit 104 is realized by the execution step number of dynamic configuration each passage.For example, when being 32 Paths input, the output of 32 tunnels, then with two clock cycle, the 104 dynamic adjustment input retention time of control unit is clapped for two, is made First clock cycle is identical with the input data of second clock cycle, but the input of first clock cycle is defeated with low 16 road The coefficient for going out passage does computing, and the coefficient of the input of second clock cycle and high 16 tunnel output channel does computing；If 64 Paths input, the output of 32 tunnels, the input of 64 tunnels of Ze Jiang are divided into the input of 32 tunnels of Liang Ge, complete to calculate with four clock cycle, front two The individual clock cycle as described above, calculate 32 tunnel output results of low 32 tunnel input channel, afterwards two clock cycle calculate high 32 tunnel 32 tunnel output results of input channel.Hereafter, process and enter step S304.

In step s 304, it is incremented by the count value in intermediate layer.This post processing enters step S305.

In step S305, judge whether current intermediate layer count value reaches predetermined value.For example, at one of the disclosure In embodiment, in the case where BCNN algorithms are realized, the predetermined value is 17.

If obtaining negative decision in step S305, i.e., current intermediate layer count value does not also reach predetermined value, then locate Reason enters step S306.In step S306, current intermediate layer characteristic is stored in memory element.It is stored in storage Intermediate layer characteristic in unit 102 is for the reading in subsequent intermediate layer convolutional calculation step and further performs volume Product is calculated.Hereafter, the processing returns to step S303 to perform the intermediate layer convolutional calculation of a lower floor.

If on the contrary, obtaining positive result in step S305, i.e., current intermediate layer count value reaches predetermined value, then Process enters step S307.In step S307, result of the final intermediate layer characteristic for calculating of output as convolutional calculation.

In one embodiment of the disclosure, the result of convolutional calculation is the target thermodynamic chart of input image data.At this In disclosed another embodiment, the result of convolutional calculation can be read and be entered by the arm processor of configuration control unit 104 Row is further to be processed.Due to the very little of the data volume after convolution, serial meter can be carried out by control unit 104 completely Calculate.And the calculating process of control unit 104 and the first convolution computing unit 101 and the convolution of the second convolution computing unit 103 Calculate, and view data input, the output of convolutional calculation result can carry out parallel, with time-consuming spending.Additionally, at this In disclosed another embodiment, can also export to be superimposed with original input image data and obtain via image processing method 30 The output image of the target position information for taking is used for showing over the display.

Fig. 4 is the block diagram for further illustrating the image processing equipment according to the embodiment of the present disclosure.The first volume illustrated in Fig. 4 Product computing unit 101, memory element 102 and control unit 104 the first convolution computing unit respectively with reference Fig. 1 descriptions 101st, memory element 102 and control unit 104 are identical.

Compared with above-mentioned Fig. 1, the internal structure of the second convolution computing unit 103 that Fig. 4 is further illustrated.As shown in figure 4, Second convolution computing unit 103 includes that anti-pondization processes subelement 1031, Combined Treatment subelement 1032, convolutional calculation process Unit 1033, pondization processes subelement 1034 and selects to process subelement 1035.For each intermediate layer convolutional calculation, in convolution Before calculating is processed, control unit 104 controls the anti-pondization process subelement 1031 and optionally performs anti-pondization process；With And for each intermediate layer convolutional calculation, after convolutional calculation process, the control pondization of control unit 104 processes subelement 1034 Optionally perform pondization to process.In one embodiment of the disclosure, anti-pondization processes subelement 1031 and pondization processes son Unit 1034 is 2x2 windows, and by 104 dynamic control of control unit, whether which bypasses.Additionally, for each intermediate layer convolution meter Calculate, before convolutional calculation process, the control Combined Treatment of control unit 104 subelement 1032 performs the Combined Treatment, with group Close the next layer for reading characteristic to be calculated.For each intermediate layer convolutional calculation, after convolutional calculation process, control 104 control selections of unit process subelement 1035 and perform selection process, to select the intermediate layer for storage to memory element 102 The data channel of characteristic.That is, different characteristic patterns can be recombinated by Combined Treatment subelement 1032, output is new Characteristic pattern, its data input way can be selected, in one embodiment of the disclosure, its data input way be 2.Select Process subelement 1035 to select to be stored back into the data channel of memory element 102, its data input way can be selected, at this In disclosed one embodiment, its data input way is 2.Additionally, in one embodiment of the disclosure, convolutional calculation process Convolution of the subelement 1033 using 3x3 windows, while synchronous be input to eigenvalue and the ginseng that convolutional calculation processes subelement 1033 Numerical value, to facilitate calculating.It is in the whole data flow of the second convolution computing unit 103, not enough in the bandwidth of memory element 102 When automatic pause currently calculate and retain scene, so as to ensure that the bandwidth availability ratio of memory element 102 to greatest extent, improve Calculating speed.

By adopting the design structure of the second convolution computing unit 103 as shown in Figure 4, the convolutional layer for making configuration different becomes Must be very easy to.For example, a certain layer convolutional calculation does not need pond function, and just the dynamically-adjusting parameter in control unit 104, makes Obtain pondization process subelement 1034 to bypass with closing function, and the hardware structure without the need for changing field programmable gate array again, Modification can just be completed.If a certain convolutional layer needs the convolutional calculation of parallel two 3x3 window sizes, at the scene programmable gate In the case that array resource is allowed, need to only replicate a convolutional calculation process subelement 1033 and be incorporated in network, and dynamic is matched somebody with somebody Put parameter.

As shown in figure 5, before starting to be processed by 103 execution of the second convolution computing unit from step S503, being with reference to Fig. 3 The step of description S301 and S302, here will omit its repeated description.In step S503, by the second convolution computing unit from depositing Storage unit reads next layer characteristic to be calculated and predetermined convolutional calculation parameter.Hereafter, process and enter step S504.

In step S504, according to structure setting the second convolution meter of the neutral net realized by field programmable gate array Calculate the input channel number and output channel number of unit.As described above, for example, in one embodiment of the disclosure, employing BCNN algorithm channel numbers include various numerical value such as 32,64,128,256.In order to save field programmable gate array resource, convolution meter The port number static state setting for calculating module is the input of 32 tunnels, the output of 16 tunnels.It is in each clock cycle, to calculate 32 tunnel input channels Data, and be converted to 16 passage outputs.For the passage of other quantity, control unit 104 is by each passage of dynamic configuration Execution step number realizing.Hereafter, process and enter step S505.

In step S505, the execution cycle of each intermediate layer convolutional calculation of control and intermediate layer characteristic are to storage The storage cycle of unit.In one embodiment of the disclosure, it is more than the second convolution meter in the input channel number that need to be processed The input channel number for calculating unit and/or the output channel number that need to be processed lead to more than the output of the second convolution computing unit 103 During road number, according to the input channel number that need to be processed, the input channel number of the second convolution computing unit 103, the need The output channel number of the output channel number of process and the second convolution computing unit 103, controls each intermediate layer volume What product was calculated performs cycle and the intermediate layer characteristic to the storage cycle of the memory element 102.Hereafter, process into To step S506.

In step S506, optionally perform anti-pondization and process.In one embodiment of the disclosure, a certain layer convolution Calculating does not need anti-pond function, just the dynamically-adjusting parameter in control unit 104 so that anti-pondization is processed by subelement 1031 Road is with closing function.Hereafter, process and enter step S507.

In step s 507, Combined Treatment is performed, with the characteristic that the next layer for combining reading is to be calculated.In the disclosure One embodiment in, different characteristic patterns can be recombinated by Combined Treatment subelement 1032, export new characteristic pattern, its number Can select according to input way, in one embodiment of the disclosure, its data input way is 2.Hereafter, process and enter step S508。

In step S508, intermediate layer convolutional calculation is performed, generate intermediate layer characteristic.Hereafter, process and enter step S509。

In step S509, optionally perform pondization and process.In one embodiment of the disclosure, a certain layer convolution meter Calculate and do not need pond function, just the dynamically-adjusting parameter in control unit 104 so that pondization processes subelement 1034 and bypasses to close Close function.Hereafter, process and enter step S510.

In step S510, perform selection and process, to select the intermediate layer characteristic for storage to memory element Data channel.In one embodiment of the disclosure, select process subelement 1035 select to be stored back into memory element 102 Data channel, its data input way can be selected, and in one embodiment of the disclosure, its data input way is 2.Pass through Step S503 to S510, completes an intermediate layer convolutional calculation by the second convolution computing unit 103, and this post processing enters step In S304, to judge to be return to step S503 to continue next layer of centre according to whether the count value in intermediate layer reaches predetermined value Layer convolutional calculation, is also to determine and has completed end layer convolutional calculation so as to export result of calculation.

Fig. 6 is the schematic diagram for illustrating the image processing system according to the embodiment of the present disclosure.As shown in fig. 6, according to the disclosure The image processing system 600 of embodiment includes photographic head 5, image processing equipment 6, server 7 and display 8.Easy to understand It is that the configuration shown in Fig. 6 is only illustrative, image processing equipment 6 is for example can be only fitted in photographic head 5 or server 7.

Optical pickocff in photographic head 5 is used for the raw image data of acquisition monitoring scene, and as input View data is supplied to image processing equipment 6.

At least include field programmable gate array 70 in image processing equipment 6 and be located at outside field programmable gate array 70 Memorizer 80.Field programmable gate array 70 further includes that arm processor 701, the first look-up table resource 702 and second are looked into Look for table resource 703.It is to be appreciated that the first look-up table resource 702 as indicated with 6, memorizer 80, second look-up table resource 703 and arm processor 701 be respectively with reference to Fig. 1 description the first convolution computing unit 101, memory element 102, the second convolution The particular hardware implementation of computing unit 103 and control unit 104, which performs the figure with reference to Fig. 2, Fig. 3 and Fig. 5 description respectively As each step of processing method.

As needed, image processing equipment 6 can perform the target detection after convolutional calculation to input image data As a result it is supplied to server 7 to perform further process or be supplied to display 8 to perform display.

In one embodiment of the disclosure, image processing equipment 6 can be further configured with coding unit, for being based on Raw image data and the object detection results generate the coded image data corresponding to the target.Image processing equipment 6 H.264 the coded image data of the final target for obtaining (including the jpeg image only including target and/or is labeled with into target Or H.265 video) back-end server 7 is wire or wirelessly transferred to by network carries out such as face character analysis, face Identification, face U.S. face, at least one process in human face cartoon, or be transferred to display 8 and perform display.

More than, referring to figs. 1 to Fig. 6 describe in accordance with an embodiment of the present disclosure image processing method, image processing equipment with And image processing system, image processing method and image processing equipment in accordance with an embodiment of the present disclosure, using field programmable gate Look-up table (LUT) resource in array realizes low bit CNN algorithm, and realization makes full use of the resource of field programmable gate array to carry High computing power；The convolution unit wherein realized increased can the port number of static configuration, data bit width number, can dynamic configuration follow Ring number of times and mode, coordinate the control unit of adjustable parameters, can quickly realize the CNN models of different frameworks.

The ultimate principle of the disclosure is described above in association with specific embodiment, however, it is desirable to, it is noted that in the disclosure The advantage that refers to, advantage, effect etc. are only exemplary rather than limiting, it is impossible to think that these advantages, advantage, effect etc. are the disclosure Each embodiment is prerequisite.In addition, detail disclosed above is merely to the effect of example and the work for readily appreciating With, and it is unrestricted, above-mentioned details is not intended to limit the disclosure for realizing using above-mentioned concrete details.

The device that is related in the disclosure, device, equipment, the block diagram of system only illustratively the example of property and are not intended to Requirement or hint must be attached, arrange, be configured according to the mode that square frame is illustrated.As it would be recognized by those skilled in the art that , can be connected, be arranged by any-mode, configure these devices, device, equipment, system.Such as " including ", "comprising", " tool Have " etc. word be open vocabulary, refer to " including but not limited to ", and can be with its used interchangeably.Vocabulary used herein above "or" and " and " refer to vocabulary "and/or", and can be with its used interchangeably, unless it be not such that context is explicitly indicated.Here made Vocabulary " such as " refers to phrase " such as, but not limited to ", and can be with its used interchangeably.

In addition, as used herein, the "or" used in the enumerating of the item started with " at least one " indicates detached Enumerate, so that enumerating for such as " at least one of A, B or C " means A or B or C, or AB or AC or BC, or ABC (i.e. A and B And C).Additionally, wording " example " does not mean that the example of description is preferred or more preferable than other examples.

It may also be noted that in the system and method for the disclosure, each part or each step can be to decompose and/or weigh Combination nova.These decompose and/or reconfigure the equivalents that should be regarded as the disclosure.

Can carry out to the various of technology described herein without departing from the technology instructed defined by the appended claims Change, replace and change.Additionally, the scope of the claim of the disclosure is not limited to process described above, machine, manufacture, thing The specific aspect of the composition of part, means, method and action.Can utilize carry out to corresponding aspect described herein it is essentially identical Function realizes the process there is currently or to be developed after a while of essentially identical result, machine, manufacture, the group of event Into, means, method or action.Thus, claims are included in the such process in the range of which, machine, manufacture, event Composition, means, method or action.

The above description of disclosed aspect is provided so that any person skilled in the art can make or using this It is open.Various modifications in terms of these are readily apparent to those skilled in the art, and here definition General Principle can apply in terms of other without deviating from the scope of the present disclosure.Therefore, the disclosure is not intended to be limited to Aspect shown in this, but according to the widest range consistent with the feature of principle disclosed herein and novelty.

In order to purpose of illustration and description has been presented for above description.Additionally, this description is not intended to the reality of the disclosure Apply example and be restricted to form disclosed herein.Although already discussed above multiple exemplary aspects and embodiment, this area skill Art personnel will be recognized that its some modification, modification, change, addition and sub-portfolio.

Claims

1. a kind of image processing method, including：

Ground floor convolutional calculation is performed to input image data by the first convolution computing unit, ground floor characteristic is generated；

By ground floor characteristic storage to memory element；And

The ground floor characteristic is read by the second convolution computing unit from the memory element, perform the convolution of predetermined number of layers Calculate, generate for the convolutional calculation result of the input image data, will be corresponding wherein after every layer of convolutional calculation terminates Result of calculation storage is to the memory element；

Wherein, the first convolution computing unit and the second convolution computing unit are configured by field programmable gate array.

2. image processing method as claimed in claim 1, the wherein described convolutional calculation for performing predetermined number of layers include：

Next layer characteristic to be calculated and predetermined volume is read by the second convolution computing unit from the memory element Product calculating parameter, performs intermediate layer convolutional calculation, to generate intermediate layer characteristic；

It is incremented by the count value in the intermediate layer, and judges whether the count value reaches predetermined value, wherein

It is in the case where the count value does not reach the predetermined value, intermediate layer characteristic storage is single to the storage Unit, and return by the second convolution computing unit execution intermediate layer convolutional calculation；

In the case where the count value reaches the predetermined value, the intermediate layer characteristic is exported as the convolutional calculation As a result.

3. image processing method as claimed in claim 1 or 2, wherein described input image data have the first data width, The ground floor characteristic has the second data width, and wherein described second data width is less than first data width.

4. image processing method as claimed in claim 2, execution intermediate layer convolutional calculation also include：

The second convolution computing unit according to the structure setting of the neutral net realized by the field programmable gate array Input channel number and output channel number；And

It is more than the input channel number of the second convolution computing unit and/or the output that need to be processed in the input channel number that need to be processed Port number more than the second convolution computing unit output channel number when, according to the input channel number that need to be processed, described The input channel number of the second convolution computing unit, the output channel number that need to be processed and the second convolution computing unit Output channel number, the execution cycle and the intermediate layer characteristic for controlling each intermediate layer convolutional calculation are deposited to described The storage cycle of storage unit.

5. image processing method as claimed in claim 2, wherein described intermediate layer convolutional calculation include anti-pond process, convolution Calculate and process and pondization process, execution intermediate layer convolutional calculation also includes：

For each intermediate layer convolutional calculation, before convolutional calculation process, the anti-pond is optionally performed Process；And

For each intermediate layer convolutional calculation, after convolutional calculation process, the pond Hua Chu is optionally performed Reason.

6., such as claim 2 or 5 described image processing methods, wherein described intermediate layer convolutional calculation also includes Combined Treatment and choosing Process is selected, execution intermediate layer convolutional calculation also includes：

For each intermediate layer convolutional calculation, before convolutional calculation process, the Combined Treatment is performed, to combine The next layer characteristic to be calculated for reading；

For each intermediate layer convolutional calculation, after convolutional calculation process, perform the selection and process, to select For the data channel of the intermediate layer characteristic of storage to the memory element.

7. image processing method as claimed in claim 1, wherein, when the bandwidth of the memory element is not enough, automatic pause By the second convolution computing unit perform the convolutional calculation and retain scene.

8. image processing method as claimed in claim 1, wherein, methods described is realized by photographic head, and the photographic head includes The first convolution computing unit, the second convolution computing unit and the memory element.

9. a kind of image processing equipment, including：

First convolution computing unit, for performing ground floor convolutional calculation to input image data, generates ground floor characteristic；

Memory element, for storing the ground floor characteristic；

Second convolution computing unit, for reading the ground floor characteristic from the memory element, performs predetermined number of layers Convolutional calculation, generates for the convolutional calculation result of the input image data；And

Control unit, for controlling the first convolution computing unit, the memory element and the second convolution computing unit；

Wherein, the memory element is additionally operable to store the calculating knot of every layer of convolutional calculation that the second convolution computing unit is performed Really, the first convolution computing unit and the second convolution computing unit are configured by field programmable gate array.

10. image processing equipment as claimed in claim 9, wherein described control unit control the second convolution computing unit Next layer characteristic to be calculated and predetermined convolutional calculation parameter are read from the memory element, intermediate layer convolution meter is performed Calculate, to generate intermediate layer characteristic；And

Described control unit is incremented by the count value in the intermediate layer, and judges whether the count value reaches predetermined value, wherein

It is in the case where the count value does not reach the predetermined value, intermediate layer characteristic storage is single to the storage Unit, and return control the second convolution computing unit execution intermediate layer convolutional calculation；

11. image processing equipments as described in claim 9 or 10, wherein described input image data have the first data width Degree, the ground floor characteristic have the second data width, and wherein described second data width is less than the first data width Degree.

12. image processing equipments as claimed in claim 10, wherein described control unit is according to by the field programmable gate The structure of the neutral net that array is realized, arranges the input channel number and output channel number of the second convolution computing unit；With And

13. image processing equipments as claimed in claim 10, wherein described volume Two product computing unit include that anti-pondization is processed Subelement, convolutional calculation process subelement and pondization processes subelement,

For each intermediate layer convolutional calculation, before convolutional calculation process, described control unit control is described anti- Pondization processes subelement and optionally performs the anti-pondization process；And

For each intermediate layer convolutional calculation, after convolutional calculation process, described control unit controls the pond Change process subelement and optionally perform the pondization process.

14. image processing equipments as described in claim 10 or 13, wherein described volume Two product computing unit also include joint Process subelement and select to process subelement,

For each intermediate layer convolutional calculation, before convolutional calculation process, described control unit control is described Close and process the subelement execution Combined Treatment, with the characteristic that the next layer for combining reading is to be calculated；

For each intermediate layer convolutional calculation, after convolutional calculation process, described control unit controls the choosing Select process subelement and perform the selection process, to select the intermediate layer characteristic for storage to the memory element Data channel.

15. image processing equipments as claimed in claim 9, wherein, described control unit is additionally operable in the memory element When bandwidth is not enough, the convolutional calculation that automatic pause is performed by the second convolution computing unit simultaneously retains scene.

16. image processing equipments as claimed in claim 9, wherein, described control unit is the field programmable gate array Internal arm processor, and/or,

The first convolution computing unit includes at least a portion look-up table in the field programmable gate array, and/or,

The second convolution computing unit includes at least a portion look-up table in the field programmable gate array, and/or,

The memory element is the memorizer for being placed on the field programmable gate array.

17. image processing equipments as claimed in claim 9, wherein, described image processing equipment is photographic head.