CN115456860A

CN115456860A - Image enhancement method and device based on FPGA, helmet, equipment and medium

Info

Publication number: CN115456860A
Application number: CN202211400612.0A
Authority: CN
Inventors: 夏春秋; 陈世淼
Original assignee: Shenzhen Vision Technology Co Ltd
Current assignee: Shenzhen Vision Technology Co Ltd
Priority date: 2022-11-09
Filing date: 2022-11-09
Publication date: 2022-12-09
Anticipated expiration: 2042-11-09
Also published as: CN115456860B

Abstract

The invention provides an image enhancement method and device based on an FPGA (field programmable gate array), a helmet, equipment and a medium. The image enhancement method based on the FPGA comprises the following steps: acquiring a target image; acquiring an image enhancement neural network model, and enhancing a target image through the image enhancement neural network model to obtain an enhanced target image; wherein the enhancement processing comprises: performing convolution loop processing based on the FPGA according to the target image and the image enhancement neural network model until the loop times of the convolution loop processing are equal to the number of layers of all convolution layers in the image enhancement neural network model to obtain a target characteristic diagram; and enhancing the target image according to the target characteristic graph to obtain the enhanced target image. The invention is beneficial to improving the speed of image enhancement processing.

Description

Image enhancement method and device based on FPGA, helmet, equipment and medium

Technical Field

The invention relates to the field of information data processing, in particular to an image enhancement method and device based on an FPGA (field programmable gate array), a helmet, equipment and a medium.

Background

The FPGA has the characteristics of low power consumption, high parallel computation degree, flexible programming, short development period and the like, and is widely applied to small-sized equipment.

Convolutional Neural Networks (CNN) based on deep learning are increasingly widely used in the fields of target detection, face recognition, automatic driving, character recognition, and the like. With the development of deep learning technology, the real-time requirement on image processing in complex scenes is increasingly increased.

Disclosure of Invention

The invention provides an image enhancement method, an image enhancement device, a helmet, equipment and a medium based on an FPGA (field programmable gate array), which are beneficial to improving the processing speed of image enhancement.

In a first aspect, the present invention provides an image enhancement method based on an FPGA, including:

acquiring a target image; and acquiring an image enhancement neural network model, and enhancing the target image through the image enhancement neural network model to obtain an enhanced target image.

The enhancement processing includes: performing convolution loop processing based on the FPGA according to the target image and the image enhancement neural network model until the loop times of the convolution loop processing are equal to the number of layers of all convolution layers in the image enhancement neural network model to obtain a target characteristic diagram; enhancing the target image according to the target feature map to obtain an enhanced target image;

wherein, the FPGA includes: the device comprises a serial configuration module, a parallel computation module and a register module, wherein the serial configuration module is connected with the parallel computation module, the serial configuration module is connected with the register module, and the parallel computation module is connected with the register module;

the convolution cycle processing based on the FPGA comprises the following steps: responding to a starting signal or an interrupt request signal of the previous cycle through the serial configuration module, acquiring a convolution kernel of the current cycle from the image enhancement neural network model, acquiring a feature map of the current cycle according to the target image or the target feature map of the previous cycle, and configuring the register module according to the convolution kernel of the current cycle and the feature map of the current cycle; reading the convolution kernel of the current cycle and the feature map of the current cycle from the register module through the parallel computing module, performing convolution computation and nonlinear activation according to the convolution kernel of the current cycle and the feature map of the current cycle, obtaining a target feature map of the current cycle and outputting an interrupt request signal of the current cycle.

In one embodiment of the present invention, the image enhancement method based on an FPGA, wherein the performing convolution calculation according to the convolution kernel of the current loop and the feature map of the current loop includes:

binarizing the weight value of the convolution kernel in the image enhancement neural network model, and obtaining a binarized convolution kernel according to a scaling factor and the binarized weight value, wherein the scaling factor is the average value of the absolute values of the weight values in the convolution kernel;

and performing convolution calculation according to the binarized convolution kernel and the feature map of the current cycle.

In one embodiment, the method for enhancing an image based on an FPGA, wherein the obtaining a convolution kernel of a current cycle from the image-enhancing neural network model includes:

counting the interrupt request signals through the serial configuration module to obtain interrupt times;

and acquiring a corresponding convolution layer from the image enhancement neural network model according to the interruption times, and determining the convolution kernel of the current cycle according to the corresponding convolution layer.

In one embodiment of the present invention, the image enhancement method based on FPGA, wherein the configuring the register module according to the convolution kernel of the current loop and the feature map of the current loop, includes:

writing the weight values in the convolution kernel of the current cycle into the register module in sequence according to a preset weight value sequence, wherein the preset weight value sequence is as follows: sequencing weight values from an initial weight value according to a convolution kernel serial number, a channel serial number, a line serial number and a line serial number, wherein the convolution kernel serial number of the weight value is enabled to be self-added with 1 when the convolution kernel serial number of the weight value is smaller than the maximum convolution kernel serial number of the weight value, the channel serial number of the weight value is enabled to be self-added with 1 when the convolution kernel serial number of the weight value is increased to the maximum convolution kernel serial number of the weight value, the channel serial number of the weight value is enabled to be self-added with 1 when the channel serial number of the weight value is increased to the maximum channel serial number of the weight value, the channel serial number of the weight value is enabled to be self-added with 1, the line serial number of the weight value is reset, the channel serial number of the weight value is reset, and the convolution kernel serial number of the weight value is reset when the line serial number of the weight value is increased to the maximum line serial number of the weight value;

writing the characteristic values in the characteristic diagram of the current cycle into the register module according to a preset characteristic value sequence, wherein the preset characteristic value sequence is as follows: sorting the characteristic values from the initial characteristic values according to channel serial numbers, column serial numbers, convolution kernel serial numbers and row serial numbers, enabling the channel serial numbers of the characteristic values to be added by 1 when the channel serial numbers of the characteristic values are smaller than the maximum channel serial numbers of the characteristic values, enabling the column serial numbers of the characteristic values to be added by 1 when the channel serial numbers of the characteristic values are increased to the maximum channel serial numbers of the characteristic values, and resetting the channel serial numbers of the characteristic values, when the characteristic value sequence number is added to the maximum sequence number of the characteristic value, the characteristic diagram sequence number of the characteristic value is enabled to be self-added with 1 and reset the characteristic value sequence number and the characteristic value channel sequence number, and when the characteristic diagram sequence number of the characteristic value is added to the maximum characteristic diagram sequence number of the characteristic value, the characteristic value sequence number is enabled to be self-added with 1 and reset the characteristic value sequence number, the characteristic value sequence number and the characteristic value channel sequence number are reset.

In one embodiment of the method for enhancing an image based on an FPGA, the reading the convolution kernel of the current loop and the feature map of the current loop from the register module, and performing convolution calculation according to the convolution kernel of the current loop and the feature map of the current loop includes:

a storage step: storing the weight values in the convolution kernels of the current loop and the characteristic values in the characteristic diagram of the current loop in a register unit to be written in the register module, wherein the register module comprises: the register unit to be written in is connected with the weight value cyclic shift register unit, and the register unit to be written in is connected with the characteristic value cyclic shift register unit;

a reading step: sequentially reading a row of weight values in the convolution kernel of the current cycle and writing the row of weight values into the weight value circular shift register unit, and sequentially reading the characteristic values of the characteristic diagram of the current cycle and writing the characteristic values into the characteristic value circular shift register unit, wherein the number of the characteristic values in the characteristic value circular shift register unit is the product of the width of the convolution kernel of the current cycle and the number of the convolution kernel channels of the current cycle, and the characteristic values in the characteristic value circular shift register unit and the weight values in the weight value circular shift register unit have a convolution corresponding relationship;

a calculation step: performing one or more shift cycles on the weight values in the weight value circular shift register unit, when the number of shift cycles of the weight value circular shift register unit is n +1, performing one shift cycle on the characteristic value in the characteristic value circular shift register unit, when the shift cycle number of the weight value cyclic shift register unit is m +1, acquiring a new characteristic value from the register unit to be written and writing the new characteristic value into the characteristic value cyclic shift register unit, when the number of times of obtaining the new feature value is a value obtained by dividing the difference between the width of the feature map and the width of the convolution kernel by the step length and then adding 1, emptying the data in the weight value cyclic shift register unit and returning to the reading step, wherein n is an integer multiple of the number of convolution kernels of the current loop, m is an integer multiple of the number of weight values in the weight value cyclic shift register unit, the number of the new feature values is determined according to a convolution step length, and said new eigenvalue is written and the original eigenvalues of the same number as the new eigenvalue are deleted after the connection between the input and output terminals of the register element is disconnected by the selector when the new eigenvalue is written into the eigenvalue circular shift register unit, said eigenvalue circular shift register cell comprising said selector and said register elements, the output of said selector comprising a first input and a second input, the output end of the register element comprises a first output end and a second output end, the output end of the selector is connected with the input end of the register element, and the first input end of the selector is connected with the first output end of the register element;

before the shift cycle is performed on the weight values in the weight value cyclic shift register unit, reading the weight value of the target position in the weight value cyclic shift register unit to obtain a read weight value, reading the characteristic value of the target position in the characteristic value cyclic shift register unit to obtain a read characteristic value, multiplying the read weight value and the read characteristic value to obtain a product result, storing the product result, and accumulating the product result according to a convolution principle.

In one embodiment, the image enhancement method based on the FPGA further includes:

configuring, by the serial configuration module, the enhanced target image in the register module;

reading the enhanced target image in the register module through the parallel computing module, and generating digital image information;

and displaying the enhanced target image according to the digital image information.

In one embodiment of the image enhancement method based on an FPGA, the performing convolution calculation according to the convolution kernel of the current loop and the feature map of the current loop includes:

and performing multiplication processing on the convolution kernel of the current cycle and the feature map of the current cycle through a multiplication processing unit array, wherein the multiplication processing unit array is a unit array used for performing multiplication calculation in a parallel calculation module.

splitting the characteristic diagram according to the size of the multiplication processing unit array and the size of the characteristic diagram of the current cycle to obtain the split characteristic diagram;

performing batch convolution calculation according to the split characteristic diagram to obtain a plurality of batch convolution results;

and obtaining the convolution calculation result of the current cycle according to the plurality of batches of convolution results.

In one embodiment, the performing convolution calculation and nonlinear activation according to the convolution kernel of the current cycle and the feature map of the current cycle to obtain the target feature map of the current cycle includes:

performing convolution calculation and nonlinear activation according to the convolution kernel of the current cycle and the feature map of the current cycle to obtain a target feature map before pooling of the current cycle;

obtaining a plurality of first characteristic values in the target characteristic diagram before the current circulation pooling, and converting the plurality of first characteristic values into one or a plurality of first period characteristic values according to a preset calculation rule, wherein the distance between the plurality of first characteristic values is smaller than a first preset distance, and the number of the first period characteristic values is smaller than the number of the first characteristic values;

obtaining a plurality of second characteristic values in the target characteristic diagram before the current cycle pooling, and converting the plurality of second characteristic values into one or more second cycle characteristic values according to a preset calculation rule, wherein the distance between the plurality of second characteristic values is smaller than a second preset distance, the distance between the plurality of first characteristic values and the plurality of second characteristic values is smaller than a third preset distance, and the number of the second cycle characteristic values is smaller than the number of the second characteristic values.

Converting the first cycle characteristic value and the second cycle characteristic value into one or more third cycle characteristic values according to a preset calculation rule, wherein the number of the third cycle characteristic values is less than the sum of the number of the first characteristic values and the number of the second characteristic values;

and obtaining the target characteristic diagram after the current cycle pooling according to the third cycle characteristic value.

In a second aspect, an FPGA device is provided, comprising: the device comprises a serial configuration module, a parallel computation module and a register module, wherein the serial configuration module is connected with the parallel computation module, the serial configuration module is connected with the register module, and the parallel computation module is connected with the register module;

the serial configuration module is used for responding to a start signal or an interrupt request signal of the previous cycle, acquiring a convolution kernel of the current cycle from the image enhancement neural network model, acquiring a feature map of the current cycle according to the target image or the target feature map of the previous cycle, and configuring the register module according to the convolution kernel of the current cycle and the feature map of the current cycle;

the parallel computing module is used for reading the convolution kernel of the current cycle and the feature map of the current cycle from the register module, performing convolution computation and nonlinear activation according to the convolution kernel of the current cycle and the feature map of the current cycle, obtaining a target feature map of the current cycle and outputting an interrupt request signal of the current cycle;

and the register module is used for storing the convolution kernel of the current cycle and the characteristic diagram of the current cycle.

In a third aspect, there is provided a helmet comprising:

the camera is used for acquiring a target image;

the enhancement processor is used for acquiring an image enhancement neural network model, and enhancing the target image through the image enhancement neural network model to obtain an enhanced target image;

the display screen is used for displaying the enhanced target image;

wherein the enhancement processor comprises: an FPGA;

the FPGA comprises: the device comprises a serial configuration module, a parallel computation module and a register module, wherein the serial configuration module is connected with the parallel computation module, the serial configuration module is connected with the register module, and the parallel computation module is connected with the register module;

the enhancement processing includes:

performing convolution loop processing based on the FPGA according to the target image and the image enhancement neural network model until the loop times of the convolution loop processing are equal to the number of layers of all convolution layers in the image enhancement neural network model to obtain a target characteristic diagram;

enhancing the target image according to the target feature map to obtain an enhanced target image;

the convolution cycle processing based on the FPGA comprises the following steps:

responding to a starting signal or an interrupt request signal of the previous cycle through the serial configuration module, acquiring a convolution kernel of the current cycle from the image enhancement neural network model, acquiring a feature map of the current cycle according to the target image or the target feature map of the previous cycle, and configuring the register module according to the convolution kernel of the current cycle and the feature map of the current cycle;

reading the convolution kernel of the current cycle and the feature map of the current cycle from the register module through the parallel computing module, performing convolution computation and nonlinear activation according to the convolution kernel of the current cycle and the feature map of the current cycle, obtaining a target feature map of the current cycle and outputting an interrupt request signal of the current cycle.

In a fourth aspect, an electronic device is provided, comprising a memory and a processor, the memory storing a computer program operable on the processor, wherein the processor when executing the program implements the steps of the FPGA-based image enhancement method as described above, wherein the processor comprises: an FPGA;

the FPGA comprises: the device comprises a serial configuration module, a parallel computation module and a register module, wherein the serial configuration module is connected with the parallel computation module, the serial configuration module is connected with the register module, and the parallel computation module is connected with the register module.

In a fifth aspect, a storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the FPGA-based image enhancement method as described above, wherein the processor comprises: an FPGA;

The convolution processing process of the image enhancement neural network model is divided into two parts by the image enhancement method based on the FPGA, wherein the configuration part of the register module is realized by the control of the serial configuration module, the convolution calculation is realized by the calculation of the parallel calculation module, and the characteristic that the serial configuration module is good at controlling and the parallel calculation module is good at parallel calculation can be utilized, so that the processing speed of the image enhancement neural network model on the target image is improved.

Drawings

Various additional advantages and benefits of the present invention will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

The invention will be further described with reference to the accompanying drawings and examples, in which:

FIG. 1 is a flow diagram of an image enhancement method for an FPGA of one embodiment of the present invention;

FIG. 2 is a schematic flow chart of an enhancement process in an image enhancement method of an FPGA according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a process of sorting weight values and eigenvalues according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating weight values and eigenvalue readings during convolution calculations according to one embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating the reading of the weight values and the feature values after the weight values are circularly shifted in the convolution calculation process according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating the read of feature values and weight values after cyclic shift of both the weight values and the feature values during convolution calculation according to an embodiment of the present invention;

FIG. 7 is a diagram illustrating the reading of the penultimate weight value and the reading of the feature value in a row of a convolution kernel during convolution calculation according to an embodiment of the present invention;

FIG. 8 is a diagram illustrating the reading of the last weight value and the reading of feature values in a row of a convolution kernel in the convolution calculation process according to an embodiment of the present invention;

FIG. 9 is a diagram illustrating weight values and eigenvalues written and read when convolution kernels slip during convolution calculation according to an embodiment of the present invention;

FIG. 10 is a diagram illustrating the writing and reading of weight values and eigenvalues before sliding line feed for a convolution kernel in a convolution calculation process according to the present invention;

FIG. 11 is a schematic structural diagram of a PFGA apparatus according to an embodiment of the present invention;

fig. 12 is a schematic view of the structure of a helmet according to an embodiment of the invention;

fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

It should be noted that the embodiments and features of the embodiments in the present application can be combined with each other without conflict, and the present invention is further described in detail with reference to the drawings and specific embodiments.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.

Example one

In some application scenes, at night, a user needs to carry the observation device with him to observe a target area, and the portability of the observation device and the real-time performance of image processing are highly required. The FPGA is a small-sized processor, can be arranged in wearable equipment to process information data, and provides convenience for a user to carry. Meanwhile, in order to meet the real-time requirement of the user on image processing, the embodiment provides the image enhancement method based on the FPGA, and the image processing speed of the FPGA is favorably improved.

Fig. 1 is a schematic flowchart of an image enhancement method of an FPGA in this embodiment, and fig. 2 is a schematic flowchart of enhancement processing in the image enhancement method of the FPGA. Referring to fig. 1 and 2, the image enhancement method based on FPGA includes: step 10 and step 20.

And step 10, acquiring a target image.

The target image is an image before image enhancement, for example, an image directly captured by a camera.

And 20, acquiring an image enhancement neural network model, and enhancing the target image through the image enhancement neural network model to obtain an enhanced target image.

The image enhancement neural network model is a convolutional neural network model based on deep learning, such as a Yolov2 neural network model. And inputting the target image into an image enhancement neural network model for enhancement processing, and outputting the processed enhanced target image by the image enhancement neural network model.

Optionally, the image-enhanced neural network model is a convolutional neural network model with color restoration, and can perform color restoration on an infrared image or a black-and-white image shot at night. The color reduction neural network model is particularly suitable for image enhancement of infrared images or black and white images shot at night, for example, the color reduction of the infrared images or the black and white images shot at night can be realized, and the infrared images or the black and white images shot at night are reduced into corresponding scene pictures under the condition of visible light in the daytime. In addition, the image enhancement neural network model is a convolution neural network model for image sharpening, and can sharpen blurred images.

The enhancement treatment comprises the following steps: step 21 and step 22.

And 21, performing convolution cycle processing based on the FPGA according to the target image and the image enhancement neural network model until the cycle number of the convolution cycle processing is equal to the number of layers of all convolution layers in the image enhancement neural network model to obtain a target characteristic diagram.

The convolution circulation processing is realized through the FPGA, and the FPGA has the characteristics of small volume and low energy consumption, and is favorable for applying the image enhancement method to portable equipment. Wherein, FPGA includes: the device comprises a serial configuration module, a parallel computation module and a register module, wherein the serial configuration module is connected with the parallel computation module, the serial configuration module is connected with the register module, and the parallel computation module is connected with the register module.

The serial configuration module is a module which mainly adopts a serial processing mode on the FPGA and is good at process control and calculation of a complex algorithm, for example, the serial configuration module can be a Cortex-M3 soft core IP, and the soft core IP is laid out and wired into an FPGA development board. The parallel computing module is a module which mainly adopts a pipeline parallelism and data parallelism processing mode on the FPGA and is suitable for a large number of convolution multiply-add operations, for example, the parallel computing module based on the original computing framework of the FPGA is adopted. The register module is an element with a rapid storage function on the FPGA, and can realize data access. In other words, serial configuration modules take more serial data processing and/or are more adept at process control and complex algorithmic computations than parallel computing modules.

And in the convolution circulation process, performing convolution processing according to all the convolution layers in the image enhancement neural network model in sequence, and when the circulation times of the convolution circulation processing are equal to the number of layers of all the convolution layers in the image enhancement neural network model, namely all the convolution layers in the image enhancement neural network model complete the convolution processing, so that the subsequent processing steps can be carried out.

And step 22, enhancing the target image according to the target characteristic diagram to obtain an enhanced target image.

The target feature map is obtained by extracting and collecting specific information in the convolution process of the image enhancement neural network model, so that the target feature map contains relevant information of image enhancement, such as color information and the like, and the target image can be enhanced according to the target feature map.

Convolution cyclic processing based on FPGA includes: step 211 and step 212.

And step 211, responding to the start signal or the interrupt request signal of the previous cycle through the serial configuration module, acquiring a convolution kernel of the current cycle from the image enhancement neural network model, acquiring a feature map of the current cycle according to the target image or the target feature map of the previous cycle, and configuring the register module according to the convolution kernel of the current cycle and the feature map of the current cycle.

The register module is configured by utilizing the characteristic that the serial configuration module is good at process control and calculation of a complex algorithm. The starting signal is a signal generated when convolution starts, and the interruption request signal is a request signal sent by the parallel computing module after the parallel computing module completes a convolution computing task of one layer of convolution layer.

And 212, reading the convolution kernel of the current cycle and the feature map of the current cycle from the register module through the parallel computing module, performing convolution computation and nonlinear activation according to the convolution kernel of the current cycle and the feature map of the current cycle, obtaining a target feature map of the current cycle, and outputting an interrupt request signal of the current cycle.

And performing convolution calculation according to the convolution kernel of the current cycle and the characteristic diagram of the current cycle by utilizing the characteristic that the parallel calculation module is good at parallel calculation. Step 212 may also complete operations such as maximal pooling and padding (padding) in addition to convolution calculation and nonlinear activation, so as to obtain the target feature map. The target feature map is a feature map obtained after one convolution cycle processing, and the target feature map has a plurality of feature values processed by the current convolution cycle.

And continuously circulating the step 211 and the step 212 to realize convolution processing according to the convolution layers in the image enhancement neural network model in sequence until the convolution processing of all the convolution layers in the image enhancement neural network model is finished, and outputting a final target characteristic image.

Optionally, in step 20, the serial configuration module enhances the target image according to the target feature map, so as to obtain an enhanced target image.

By utilizing the characteristic that the serial configuration module is good at the calculation of complex algorithms, the serial configuration module executes the image enhancement algorithm so as to enhance the target image according to the target characteristic diagram, and the image enhancement speed can be improved.

In the embodiment, the convolution processing process of the image enhancement neural network model is divided into two parts, wherein the configuration part of the register module is realized by the control of the serial configuration module, the convolution calculation is realized by the calculation of the parallel calculation module, and the characteristics that the serial configuration module is good at the control and the parallel calculation module is good at the parallel calculation can be utilized, so that the processing speed of the image enhancement neural network model on the target image is improved.

It should be noted that the FPGA is a processor, and other electronic components having functions or effects similar to those of the FPGA all belong to the protection scope of the present application. Similarly, a register is a device with a storage function, and other electronic elements with functions or effects similar to those of the register belong to the protection scope of the present application.

In one embodiment, the image enhancement method based on the FPGA further includes: step 30, step 40 and step 50.

And step 30, configuring the enhanced target image in a register module through a serial configuration module.

The enhanced target image is written into the register module by utilizing the characteristic that the serial configuration module is good at process control, so that the display information configuration of the register module is completed, and the speed of the register configuration is improved.

Optionally, the enhanced target image is configured in a display register element in the register module, wherein the display register element is used for storing image information to be displayed.

And step 40, reading the enhanced target image in the register module through the parallel computing module, and generating digital image information.

The parallel computing module is good at the characteristic of pipeline parallel processing, and is beneficial to quickly reading and processing the enhanced target image information in the register module to form the digitalized image information which can be read by a display screen, such as multimedia high definition information (HDMI information).

And 50, displaying the enhanced target image according to the digital image information.

The enhanced target image is displayed by a display device, such as a display screen.

In a specific embodiment, the serial configuration module configures the register module, includes information required by the first convolution layer calculation in the image-enhanced neural network model, such as target image information and convolution kernel information, and outputs a convolution calculation start signal. The parallel computing module monitors a convolution computing starting signal, starts to read the characteristic value and the weight value from the register module, and completes convolution computing, activation of nonlinear activation, maximum pooling, padding, reordering and then writing into a register (namely a characteristic value cyclic shift register unit) where the characteristic value is located; the serial configuration module completes the configuration of second convolution layer information in the image enhancement neural network model, and the parallel calculation module repeats the convolution calculation and other processes; and sequentially and circularly processing other convolutional layers in the image enhancement neural network model according to the data processing modes of the first convolutional layer and the second convolutional layer until all convolutional layers in the image enhancement neural network model are calculated. At the moment, the serial configuration module reads information such as a target characteristic diagram and the like calculated by the parallel calculation module, performs enhancement algorithm processing, finally writes the enhanced image into the display module register module, and displays the enhanced image through the HDMI display screen.

Therefore, the serial configuration module is mainly responsible for starting the parallel computation module, configuring the relevant register module before the start of convolution computation of each layer, reading target feature map data for enhancement algorithm processing when computation of all layers is finished, and finally configuring the register module of the display module. The parallel computing module is mainly responsible for tasks such as data moving, computation of a convolutional layer, activation of nonlinear activation, maximum pooling, padding and the like. The embodiment exerts the advantages of the serial configuration module and the serial configuration module, and is beneficial to improving the running speed of the convolutional neural network.

Referring to fig. 2, in the flow of enhancement processing in the image enhancement method of the FPGA, the serial configuration module responds to the start signal and then configures the register module according to the weight value and the feature value required by the convolution calculation in the current round; generating a start signal when the configuration of the register module is completed; the parallel computing module is in a state of waiting for a starting signal, and when the starting signal is received, data such as a weight value and a characteristic value are read from the register module; the parallel computing module performs computing such as convolution computing, activation, pooling, padding and other computing tasks according to the weight values and the characteristic values; the parallel computing module judges whether the current computing task is completed, if the current computing task is completed, an interrupt request signal is triggered, and the serial configuration module responds to the interrupt request signal and enters an interrupt subprogram; if the interrupt request signal is the interrupt request signal after the convolution calculation of the last convolution layer is completed, entering the flow of the enhancement algorithm through an interrupt subprogram; enhancing the target image according to the output target characteristic image through an enhancement algorithm to obtain an enhanced image; the serial configuration module configures a display register according to the enhanced image; and the parallel computing module reads the enhanced image from the display register to generate an HDMI signal and performs HDMI display. And when the parallel computing module judges whether the current computing task is finished or not, if the current computing task is not finished, returning to the step of reading data, and continuously reading other residual weight values and characteristic values in the register module until all the weight values and the characteristic values in the register module finish the convolution computing task. After the subprogram is interrupted, if the interrupt request signal is not the interrupt request signal after the convolution calculation of the last convolution layer is completed, that is, the convolution calculation of all convolution layers is not completed yet, the original weight value and the feature value in the register module are cleared, the weight value and the feature value related to the next convolution layer are read, and the weight value and the feature value are written into the register module, so that the register module is configured.

In one embodiment, the image enhancement method based on the FPGA, wherein performing convolution calculation according to a convolution kernel of a current cycle and a feature map of the current cycle, includes: step 221 and step 222.

And 221, binarizing the weight value of the convolution kernel in the image enhancement neural network model, and obtaining a binarized convolution kernel according to a scaling factor and the binarized weight value, wherein the scaling factor is an average value of absolute values of the weight values in the convolution kernel.

In the convolution process, the weight value of the convolution kernel can be represented by binarization according to the scaling factor, such as representing the weight value by-1 or 1.

Optionally, the binarized convolution kernel is obtained according to a product of the scaling factor and the binarized weight value. The scaling factor is the average value of the absolute values of the weighted values in the convolution kernel.

For example, let W = α B, where W is the convolution kernel for binarization, α is the scaling factor, and B is the weight value for binarization.

And step 222, performing convolution calculation according to the binarized convolution kernel and the feature map of the current cycle.

The weight value of the convolution kernel is binarized, convolution calculation is performed according to the binarized weight value, the complexity of data calculation can be reduced, dependence on the storage capacity of the FPGA is reduced, the efficiency of convolution calculation is improved, and a good image enhancement effect is achieved.

In one embodiment, the method for enhancing an image based on an FPGA, in which a convolution kernel of a current cycle is obtained from an image enhancement neural network model, includes: step 231 and step 232.

And 231, counting the interrupt request signals through the serial configuration module to obtain the interrupt times.

The number of interruptions can be used to determine the number of current convolution cycles. Specifically, the serial configuration module responds to the interrupt request signal and counts the interrupt request signal to obtain the number of interrupts. Alternatively, the interrupt request signals are counted by an interrupt counter in the serial configuration module.

And 232, acquiring a corresponding convolution layer from the image enhancement neural network model according to the interruption times, and determining the convolution kernel of the current cycle according to the corresponding convolution layer.

The convolution cycle is to sequentially perform the cyclic convolution processing on the convolution layers of the image enhancement neural network model, so that the number of convolution cycles can be determined according to the interruption number, the convolution layer corresponding to the current convolution cycle is determined, and one or more convolution kernels of the current convolution cycle are obtained from the convolution layer.

Specifically, in the process that the serial configuration module configures the register module according to the interrupt request signal, a global variable i (interrupt frequency) may be set, i is self-added by 1 whenever the interrupt request signal initiated by the interrupt source is processed, then a jump is determined to the corresponding convolution processing according to the value of i, and further the convolution information of the corresponding convolution layer is configured in the register module through the interrupt frequency.

For example, after each convolution layer of an input feature map is calculated, the parallel calculation module sends an interrupt request signal to the serial configuration module, and before the calculation of the second layer of convolution layer is started and before the calculation of the third layer of convolution layer is started, \8230 \ 8230 \ and until the calculation of the last layer of convolution layer is started, the serial configuration module needs to configure the convolution information of the register module according to the interrupt request signal. Because the interrupt process is to initiate an interrupt request to the serial configuration module through the parallel computing module (the interrupt number of the parallel computing module is defined as 16, and the priority is 0), in order to determine which computing process the monitored interrupt request signal is sent after the completion of, and then transfer to the corresponding interrupt processing subprogram, a global variable i can be set, i is self-added with 1 each time the interrupt request signal initiated by the parallel computing module is processed, and then the interrupt request signal can be determined to jump to the corresponding processing program according to the value of i, that is, enter the processing program of the configuration register module.

In one embodiment, the image enhancement method based on the FPGA, in which the register module is configured according to the convolution kernel of the current loop and the feature map of the current loop, includes: step 241 and step 242.

Step 241, writing the weight values in the convolution kernel of the current cycle into a register module in sequence according to a preset weight value sequence, wherein the preset weight value sequence is as follows: the method comprises the steps of sequencing weighted values from an initial weighted value according to a convolution kernel serial number, a channel serial number, a line serial number and a row serial number, enabling the convolution kernel serial number of the weighted value to be added with 1 when the convolution kernel serial number of the weighted value is smaller than the maximum convolution kernel serial number of the weighted value, enabling the channel serial number of the weighted value to be added with 1 when the convolution kernel serial number of the weighted value is increased to the maximum convolution kernel serial number of the weighted value, resetting the channel serial number of the weighted value and the convolution kernel serial number of the weighted value, enabling the line serial number of the weighted value to be added with 1 when the channel serial number of the weighted value is increased to the maximum line serial number of the weighted value, resetting the line serial number of the weighted value, the channel serial number of the weighted value and the convolution kernel serial number of the weighted value.

The weighted values are arranged in a certain sequence and placed in a circular shift register module, and the weighted values can be read in sequence in the convolution calculation process, so that the efficiency of convolution calculation is fully exerted. As shown in fig. 3, fig. 3 illustrates a sorting method of the weight values and feature values corresponding to 2 × 2 × 2 convolution kernels and 23 × 3 × 2 input feature maps, wherein the flow direction of the dotted line indicates the sorting order, and the rightmost end of the arranged data (lower stripe) indicates the first data, wherein the left four 2 × 2 squares represent 2 × 2 × 2 convolution kernels, there are two convolution kernels, namely, a first convolution kernel (No. 1 convolution kernel) and a second convolution kernel (No. 2 convolution kernel), two channels, namely, a first convolution kernel channel (ch.1) and a second convolution kernel channel (ch.2), and the lower two 1 × 8 squares represent the sorted weight values (row.1 and row.2 on the left side in fig. 3); the middle four 3 × 3 squares represent 23 × 3 × 2 input profiles, with two profiles, namely a first profile (No. 1 profile) and a second profile (No. 2 profile), two channels, namely a first profile channel (profile ch.1) and a second profile channel (profile ch.2), the three 1 × 8 squares below which represent sorted feature values (left row.1, row.2 in fig. 3 and the two 1 × 8 squares below which represent sorted weight values (left row.1 and row.3 in fig. 3), and the four 2 × 2 squares on the right represent convolution-calculated target profiles, including two profiles and two channels, namely a first target profile channel (target profile ch.1), a second target profile channel (target profile ch.2), a first target profile channel (target profile ch.1) and a second target profile channel (target profile ch.2).

The weight values are sorted according to the priority of the convolution kernel, the channel and the data in the row. That is, the weighted values are sorted according to the serial number of the convolution kernel, the serial number of the channel, the serial number of the column and the serial number of the row.

The sequence number of the convolution kernel, the sequence number of the channel, the sequence number of the column and the sequence number of the row are the sequence of the convolution kernel, the channel, the column of the convolution kernel and the row of the convolution kernel according to the sequence of the convolution calculation. For example, in one convolution cycle, there are 2 convolution kernels, and according to the sequence of convolution calculation, the serial numbers of the convolution kernels are determined to be the convolution kernel number 1 and the convolution kernel number 2.

It will be appreciated that the order of the convolution kernel, the channel, column andthe line number may determine a corresponding unique weight value. W may be adopted for the weight value _mnop Where m represents the convolution kernel sequence number, n represents the channel sequence number of the weight value, o represents the column sequence number of the weight value, and p represents the row sequence number of the weight value. In sorting, m is preferably increased, n is increased by 1 and m is reset when m is maximized, o is increased by 1 and m and n are reset when n is maximized, and p is increased by 1 and m, n and o are reset when o is maximized.

Specifically, the weighted values are sequentially sorted according to the priority of the convolution kernel, the channel and the data in the row, and the method comprises the following steps: writing the weight values into the register module according to the convolution kernel sequence number, writing the weight values into the register module according to the channel sequence number, writing the weight values into the register module according to the column sequence number and writing the weight values into the register module according to the row sequence number.

Writing the weight value into a register module according to the convolution kernel sequence number: and keeping the channel serial number, the column serial number and the line serial number of the weight value unchanged, and sequentially writing the weight value according to the convolution kernel serial number of the weight value until the maximum convolution kernel serial number is reached.

Writing the weighted value into a register module according to the channel sequence number: keeping the row sequence number and the line sequence number of the weight value unchanged, when the sequence number of the convolution kernel reaches the maximum, writing the weight value into the register module according to the step of writing the weight value into the register module according to the sequence number of the convolution kernel after the channel sequence number is added with 1 and the sequence number of the convolution kernel is reset to be the initial minimum value, and repeating the steps until the maximum channel sequence number is reached.

Writing the weighted value into the register module according to the column sequence number: and keeping the row sequence number unchanged, when the channel sequence number reaches the maximum, adding 1 to the row sequence number, resetting the convolution kernel sequence number and the channel sequence number to be initial minimum values respectively, writing the weight values into the register module according to the step of writing the weight values into the register module by the channel sequence number, repeating the steps until the maximum row sequence number is reached, and finishing the sequencing of the weight values of one row in the convolution kernel.

Writing the weight values into a register module according to the row sequence number: and when the row sequence number reaches the maximum, the row sequence number is added with 1, the convolution kernel sequence number, the channel sequence number and the row sequence number are reset to be respectively the initial minimum value, then the step of writing the weighted values into the register module according to the row sequence number is carried out, the step is repeated until the maximum row sequence number is reached, the sequencing of all the weighted values is completed, and the weighted values based on the sequencing of the row sequence number are realized.

Step 242, writing the feature values in the feature map of the current cycle into the register module according to a preset feature value sequence, where the preset feature value sequence is: sorting the characteristic values from the initial characteristic value according to the channel serial number, the column serial number, the convolution kernel serial number and the row serial number, when the channel serial number of the characteristic value is smaller than the maximum channel serial number of the characteristic value, enabling the channel serial number of the characteristic value to be added by 1, when the channel serial number of the characteristic value is increased to the maximum channel serial number of the characteristic value, enabling the column serial number of the characteristic value to be added by 1, resetting the column serial number of the characteristic value and the channel serial number of the reset characteristic value, when the column serial number of the characteristic value is increased to the maximum column serial number of the characteristic value, enabling the row serial number of the characteristic value to be added by 1, resetting the characteristic diagram serial number of the characteristic value, the column serial number of the reset characteristic value and the channel serial number of the reset characteristic value.

The characteristic values are arranged in a certain sequence and placed in a cyclic shift register module, and the characteristic values can be read in sequence in the convolution calculation process, so that the efficiency of convolution calculation is fully exerted. The eigenvalues are sorted by channel, inline data (sorted by column), priority of the profile. That is, the eigenvalues are sorted according to the channel serial number, the column serial number, the eigenvalue graph serial number and the row serial number.

The characteristic diagram sequence number, the channel sequence number, the column sequence number and the row sequence number are used for sequencing the characteristic diagram, the channel, the columns of the characteristic diagram and the rows of the characteristic diagram according to the sequence of the convolution calculation. For example, in one convolution cycle, there are 2 feature maps, and according to the sequence of convolution calculation, the serial numbers of the feature maps are determined to be feature map No.1 and feature map No. 2.

It will be appreciated that the number of lanes, columns, feature map, and rows may be based on the number of columns, the number of rows, and the number of columnsThe sequence number may determine a corresponding unique characteristic value. For the characteristic values I can be used _abcd Where a represents the channel number of the feature value, b represents the column number of the feature value, c represents the feature map number of the feature value and d represents the row number of the feature value. In sorting, a is preferably increased, b is increased by 1 and a is reset when a is maximized, c is increased by 1 and a and b are reset when b is maximized, and d is increased by 1 and a, b and c are reset when c is maximized.

Specifically, the feature values are sequentially sorted according to the priority of the channels, the data in the rows and the feature maps, and the method comprises the following steps: the characteristic value is written into the register module according to the channel serial number, the characteristic value is written into the register module according to the column serial number, the characteristic value is written into the register module according to the characteristic diagram serial number, and the characteristic value is written into the register module according to the row serial number.

And writing the characteristic value into a register module according to the channel sequence number: keeping the serial number of the characteristic value, the serial number of the characteristic diagram of the characteristic value and the serial number of the line unchanged, and writing the characteristic value in sequence according to the channel serial number of the characteristic value until the maximum channel serial number is reached.

And writing the characteristic values into the register module according to the column sequence numbers: and keeping the characteristic figure serial number and the line sequence number of the characteristic value unchanged, writing the characteristic value into the register module according to the step of writing the characteristic value into the register module by the channel serial number after the channel serial number reaches the maximum value and adding 1 to the line serial number and resetting the channel serial number as the initial minimum value, and repeating the steps until the maximum line serial number is reached.

And writing the characteristic value into a register module according to the sequence number of the characteristic diagram: and keeping the line sequence number unchanged, writing the characteristic values into the register module according to the step of writing the characteristic values into the register module after the sequence number of the characteristic diagram is increased by 1 when the sequence number of the column reaches the maximum and resetting the channel sequence number and the column sequence number as initial minimum values respectively, repeating the step until the sequence number of the characteristic diagram reaches the maximum, and finishing the sequencing of the characteristic values of one line in the characteristic diagram.

Writing the characteristic values into a register module according to the row sequence numbers: and when the serial number of the feature diagram reaches the maximum, the line serial number is added with 1, the serial number of the feature diagram, the channel serial number and the column serial number are reset to be initial minimum values respectively, then the feature values are written into the register module according to the step of writing the feature values into the register module by the serial number of the feature diagram, and the steps are repeated until the maximum line serial number is reached, all feature values are sorted, and the feature values sorted based on the line serial number are realized.

Steps 241 and 242 are to write the eigenvalues in the eigenvalue graph and the eigenvalues of the convolution kernel in the register module in order. In the process of configuring the register module, the register module adopts a first-in first-out writing-reading mode, and can realize the sequential writing of the characteristic value and the weight value.

In one embodiment of the image enhancement method based on an FPGA, reading a convolution kernel of a current cycle and a feature map of the current cycle from a register module, and performing convolution calculation according to the convolution kernel of the current cycle and the feature map of the current cycle, includes: a storing step 251, a reading step 252 and a calculating step 253.

A storing step 251: storing the weight value in the convolution kernel of the current cycle and the characteristic value in the characteristic diagram of the current cycle in a register unit to be written in a register module, wherein the register module comprises: the device comprises a register unit to be written, a weight value cyclic shift register unit and a characteristic value cyclic shift register unit, wherein the register unit to be written is connected with the weight value cyclic shift register unit, and the register unit to be written is connected with the characteristic value cyclic shift register unit.

In this embodiment, a weight value in a convolution kernel of a current loop in a register module and a feature value in a feature map of the current loop are divided, specifically, the weight value and the feature value are divided into partial data to be written and partial data to be read, the partial data to be read is data being shifted and read in a cyclic manner in a convolution calculation process, and the partial data to be read is divided into a feature value shift cyclic reading portion and a weight value shift cyclic reading portion.

Based on the division of the weight value in the convolution kernel of the current cycle and the characteristic value in the characteristic diagram of the current cycle, the register module is divided into a register unit to be written, a weight value cyclic shift register unit and a characteristic value cyclic shift register unit. The register unit to be written is used for storing partial data to be written, namely a weight value to be written and a characteristic value to be written in the current cycle, and the weight value and the characteristic value which are not subjected to shift cycle. The weight value cyclic shift register unit is used for storing the weight values which are being shifted and read circularly, shifting and reading the weight values circularly during convolution calculation, writing the corresponding weight values before the convolution calculation is started, and clearing the corresponding weight values after the convolution calculation is finished. The characteristic value cyclic shift register unit is used for storing characteristic values which are being shifted and read circularly, shifting and reading the characteristic values circularly during convolution calculation, writing corresponding characteristic values before the convolution calculation is started, and clearing corresponding characteristic values after the convolution calculation is finished.

Before convolution calculation, the weight values in the convolution kernels and the feature values in the feature map of the current cycle are stored in the register unit to be written in sequence in advance to prepare for convolution calculation.

Optionally, the number of the register units to be written in is one, the weight values in the convolution kernel of the current cycle in the register unit to be written in and the feature values in the feature map of the current cycle are sorted at intervals according to the reading sequence, for example, the weight values in the convolution kernel of the current cycle and the feature values in the feature map of the current cycle are sorted at intervals according to the line sequence number.

Optionally, the register units to be written in are multiple, and specifically include a weight value register unit to be written in and a feature value register unit to be written in, which are respectively used for storing the current cyclic weight value to be written in and storing the feature value to be written in.

A reading step 252: sequentially reading a row of weight values in the convolution kernel of the current cycle and writing the weight values into a weight value cyclic shift register unit, sequentially reading the characteristic values of the characteristic diagram of the current cycle and writing the characteristic values into a characteristic value cyclic shift register unit, wherein the number of the characteristic values in the characteristic value cyclic shift register unit is the product of the width of the convolution kernel of the current cycle and the number of the convolution kernel channels of the current cycle, and the characteristic values in the characteristic value cyclic shift register unit and the weight values in the weight value cyclic shift register unit have a convolution corresponding relation.

In the convolution calculation process, the number of weight values for shift cycling is determined according to the rows of the convolution kernels. Specifically, a whole line of weight values are written into the weight value circular shift register unit in sequence, and then the convolution calculation process of the line of weight values is realized. The whole-line weight value is the weight value with the same line serial number in the current cycle, and comprises a plurality of weight values with the same line serial number, different channel serial numbers, different convolution kernel serial numbers and different column serial numbers in each convolution kernel.

Optionally, the number of the weight value circular shift register units is multiple, and the weight values of each row in the convolution kernel of the current cycle are respectively stored.

In the convolution calculation process, the number of characteristic values of the shift cycle is determined according to the width of the convolution kernel and the number of convolution kernel channels. Specifically, the feature values subjected to shift cycling are feature values corresponding to (covered by) one convolution kernel one-line weight value in the feature map in the convolution calculation process, so that the number of the feature values subjected to shift cycling is the number of the feature values in the feature map corresponding to one convolution kernel one-line weight value in the convolution calculation process, that is, the number of the feature values in the feature value cyclic shift register unit is the product of the convolution kernel width of the current cycle and the convolution kernel channel number of the current cycle, and the feature values in the feature value cyclic shift register unit and the weight values in the weight value cyclic shift register unit have a convolution correspondence relationship. The convolution corresponding relation is determined according to a convolution calculation principle so as to multiply the weight value and the corresponding characteristic value and further complete convolution calculation.

Optionally, the number of the characteristic cyclic shift register units is multiple, and the characteristic cyclic shift register units are respectively used for storing characteristic values of each row in a convolution kernel of a current cycle.

A calculation step 253: performing shift cycle on the weight value in the weight value cyclic shift register unit once or more times, performing shift cycle on the characteristic value in the characteristic value cyclic shift register unit once when the shift cycle number of the weight value cyclic shift register unit is n +1, acquiring a new characteristic value from the register unit to be written and writing the new characteristic value into the characteristic value cyclic shift register unit when the shift cycle number of the weight value cyclic shift register unit is m +1, emptying data in the weight value cyclic shift register unit and returning the data to the reading step 252 when the number of acquiring the new characteristic value is a value obtained by dividing the difference between the width of the characteristic diagram and the width of the convolution kernel by the step length and then adding 1, wherein n is integral multiple of the number of convolution kernels of the current cycle, the characteristic value cyclic shift register unit comprises a selector and a register element, the output end of the selector comprises a first input end and a second input end, the output end of the register element comprises a first output end and a second output end, the output end of the selector is connected with the input end of the register element, and the first input end of the selector is connected with the first output end of the register element.

The control of the convolution calculation is realized by a counter, which may be arranged in, but not limited to, a multiplication processing unit. The counters comprise a shifting cycle number counter of the weight value cyclic shift register unit, a number counter for acquiring a new characteristic value, a counting number counter and the like. The counters in the multiplication processing unit cooperate with each other to control the calculation process, for example, read and write of the weight value and the feature value and cyclic shift of the shift register module.

When the number of shift cycles of the weight value circular shift register unit is n +1, n is an integral multiple of the number of convolution kernels of the current cycle, at this time, each convolution kernel of the current cycle already completes product calculation of the current characteristic value channel, and product calculation of the next characteristic value channel needs to be performed, so that one shift cycle is performed on the characteristic value in the characteristic value circular shift register unit. In other words, when the number of shift cycles of the weight value cyclic shift register unit is n +1, the channel number of the weight value at the target position in the weight value cyclic shift register unit is changed, and the channel number of the feature value at the target position in the feature value cyclic shift register unit needs to be changed.

When the shift cycle number of the weight value cyclic shift register unit is m +1, m is an integral multiple of the number of the weight values in the weight value cyclic shift register unit, at this time, each convolution kernel of the current cycle has already finished the product calculation of each characteristic value channel in the current weight value cyclic shift register unit, and the sliding of a convolution kernel window is required, so that a new characteristic value is added and an original characteristic value is removed in the sliding process of the convolution kernel window, and a new characteristic value is obtained from the register unit to be written and written into the characteristic value cyclic shift register unit.

Wherein the number of new eigenvalues is determined according to the convolution step. And determining a new characteristic value to be added and an original characteristic value to be removed according to the sliding of the convolution kernel window and the change of the covered characteristic value. The number of new eigenvalues is determined for the convolution step size and the eigenmap size. Specifically, the number of new eigenvalues is the product of the convolution step size and the number of eigen-map channels.

When the number of times of obtaining the new feature value is a value obtained by dividing the difference between the width of the feature map and the width of the convolution kernel by the step length and then adding 1, it is indicated that the convolution kernel window has finished sliding a whole line of the feature map and finished the corresponding product calculation, at this time, the product calculation of the feature value of the next line of the feature map should be performed, the original data in the weight value cyclic shift register unit and the weight value cyclic shift register unit is not needed, the original weight value of the weight value cyclic shift register unit and the original feature value in the weight value cyclic shift register unit are cleared, and the reading step 252 is returned, and the corresponding weight value of the next line and the corresponding feature value in the feature map are read from the convolution kernel respectively. So, according to steps 251 and 252, the loop is continuously performed until the convolution calculation of the convolution layer of the current loop is completed.

Optionally, rounding up is performed on the difference between the width of the feature map and the width of the convolution kernel divided by the step size.

When the reading times of the register unit to be written are the sum of the current loop weight value and the characteristic value, the fact that the reading of the weight value and the characteristic value in the register unit to be written is finished is indicated, namely the convolution calculation of the convolution layer of the current loop is finished, the parallel calculation module generates an interrupt request signal, the serial configuration module responds to the interrupt request signal and sends out a control signal, and the register module discards the current convolution information in the register module corresponding to the control signal; and writing the convolution information of the next convolution layer into the register module. It will be appreciated that the interrupt request signal indicates that a round of computation is complete, requiring the current data in the register module to be discarded and the data to be computed to be re-written.

Alternatively, the target position of the circular shift register unit is the read position of the circular shift register unit, usually the first position of the fifo register element. Based on the writing and reading methods, the speed of convolution calculation is favorably improved.

The circular shift register unit has a circular shift function and a first-in first-out shift function. Specifically, when the product calculation is performed to read the weight value and the feature value, the circular shift register unit has a circular shift function, and can read out the weight value or the feature value of the target position, and input the weight value or the feature value from the input end of the circular shift register unit again, so as to keep the number and the sequence of the weight values or the feature values in the circular shift register unit unchanged. When a new weight value and a new characteristic value are added and an original weight value and an original characteristic value are removed, the circular shift register unit has a first-in first-out writing and reading function, can write the new weight value or the new characteristic value from the input end, and sequentially removes the original weight value or the original characteristic value from the output end.

The characteristic value cyclic shift register unit comprises a selector and a register element, wherein the selector is used for controlling the transmission direction of data in the register element. The selector has an input and an output, wherein the input of the selector comprises a first input and a second input. The register element has an input and an output, wherein the output of the register element comprises a first output and a second output. The output end of the selector is connected with the input end of the register element, the data information can be transmitted to the register element through the selector, and the transmission of the data information is controlled through the selector. When the selector connects the first input end of the selector with the output end of the selector and disconnects the second input end of the selector with the output end of the selector, the data information output by the register element is rewritten into the register element, and the written data and the read data form a closed loop, which is equivalent to a circular shift register unit; when the selector disconnects the first input terminal of the selector from the selector output terminal and connects the second input terminal of the selector to the selector output terminal, the data information output by the register element is rewritten into the register element, and the data is written and read out in a first-in first-out manner.

Optionally, a second input end of the selector is connected to the register unit to be written, and a second output end of the register element is connected to the parallel computing module.

Specifically, the register module comprises a plurality of register units, and each register unit comprises a selector and a register element, wherein the selector is an alternative multiplexer. The control signal of the selector is C, when C is 0, the selector connects the input end of the register unit with the input end of the register module, and at the moment, input data can be written into the register unit; when C is 1, the selector connects the first input terminal of the selector and the input terminal of the register unit together to form a closed data loop, and the read data of the register unit is directly used as the write data of the register unit, that is, the register module is equivalent to a circular shift register unit, the data read by the register unit is directly written into the register unit, and the total amount of data in the register unit is kept unchanged.

Optionally, the register unit of the register module has a bit width of 16 bits and a depth of 256.

When the new characteristic value is written into the characteristic value circular shift register unit, the selector is used to disconnect the input end and the output end of the register element, so that the characteristic value circular shift register unit is changed into a first-in first-out register unit, the new characteristic value can be written in, and the original characteristic values which are written in first and have the same quantity as the new characteristic value can be deleted.

The embodiment utilizes the control signal of the register module to control the storage flow, thereby being beneficial to reducing the repeated reading of data from the outside to the inside, reducing the workload of the serial configuration module and improving the speed of image enhancement processing.

In the calculating step 253, before shift cycle is performed on the weight value in the weight value cyclic shift register unit each time, the weight value at the target position in the weight value cyclic shift register unit is read to obtain a read weight value, the feature value at the target position in the feature value cyclic shift register unit is read to obtain a read feature value, the read weight value and the read feature value are multiplied to obtain a multiplication result, the multiplication result is stored, and the multiplication result is accumulated according to a convolution principle.

And performing multiplication calculation between the read weight value and the read characteristic value through a multiplication processing unit. The multiplication result is stored in the memory in particular. The memory is positioned in the register module or the parallel computing module, which is beneficial to improving the speed of convolution computation. The memory may be a local RAM in the multiplication processing unit.

The multiplication processing unit may input the feature data and the weight data used for the convolution calculation into the multiplication processing unit for product calculation according to the multiplication processing unit control signal, and output a product result. As shown in fig. 4 to 10, the leftmost column in fig. 4 to 10 represents the characteristic value to be written into the register unit; there are three rows of data in the middle column: the top line represents the weight value in the weight value circular shift register unit, the middle line represents the characteristic value in the characteristic value circular shift register unit, and the third line represents the product result of the rightmost data in the first line and the second line; the rightmost column represents the accumulated sum of the partial product results stored in the memory cell.

Specifically, referring to fig. 4 and fig. 3, before the multiplication processing unit works, the weight values and the eigenvalues in the order are written into the weight value circular shift register unit and the eigenvalue circular shift register unit, respectively. The multiplication processing unit multiplies the first eigenvalue (the data value 2 corresponding to the first row, the first column, and the first channel of the first eigenvalue cyclic shift register unit) in the eigenvalue cyclic shift register unit by the first weight value (the data value 3 corresponding to the first row, the first column, and the first channel of the first convolution kernel) in the weight value cyclic shift register unit, obtains a product result 6, and stores the product result into the address N of the local RAM, as shown in fig. 4. Then, performing a cyclic shift on the weight values of the weight value cyclic shift register unit for one time, keeping the characteristic values in the characteristic value cyclic shift register unit unchanged, at this time, multiplying the corresponding data output by the two cyclic shift register modules by the characteristic value 2 corresponding to the first line, the first column and the first channel of the first characteristic diagram and the weight value 1 corresponding to the first line, the first column and the first channel of the second convolution kernel to obtain a product result of 2, and storing the product result into an address N +1 of the local RAM, as shown in fig. 5.

Referring to fig. 6, when calculating the second channel of the feature map, the weighted value circular shift register unit and the feature value circular shift register unit both perform circular shift, and perform multiplication on the feature value data 1 corresponding to the first row, the first column, and the second channel of the first feature map and the data 0 corresponding to the first row, the first column, and the second channel of the first convolution kernel to obtain a product result 0, and accumulate the product result 0 in the local RAM with an address N, as shown in fig. 6.

Referring to fig. 7 and 8, shift cycle is performed on the eigenvalue circular shift register and the weight value circular shift register continuously until all product results corresponding to all eigenvalue data in the eigenvalue circular shift register are calculated, and the product results are accumulated respectively according to convolution kernels to obtain corresponding accumulated sums. As shown in fig. 4, the partial sum stored at address N indicates the cumulative sum of the first row product results corresponding to the first feature map and the first row of the first convolution kernel when the first convolution kernel and the first feature map perform the first convolution operation; the partial sum stored at address N +1 represents the cumulative sum of the products of the first row of the second convolution kernel and the first row corresponding to the first signature graph when the second convolution kernel performs the first convolution operation with the first signature graph, as shown in fig. 7 and 8.

Referring to fig. 9, when all the feature values in the feature graph circular shift register are completely calculated, the feature values in the feature value circular shift register are calculated again by writing new feature values in the feature value circular shift register and removing part of the original feature values. And determining the new characteristic value and the removed original characteristic value according to the size and the step length of the convolution kernel window. For example, since the convolution step is 1 and the size of the convolution kernel is 2 × 2 × 2, during one convolution window sliding process, two feature values need to be updated compared to the previous calculation according to the sliding condition of the first row weight value of the convolution kernel and the first row feature value of the corresponding feature map, and the process is as shown in fig. 9.

Referring to fig. 10, when all the eigenvalues in the rows in the eigenvalue graph are calculated, the original eigenvalue in the eigenvalue circular shift register is cleared, the eigenvalue in the next row in the eigenvalue graph is written, and the eigenvalue in the next row is convolved. For example, when the number of times of reading the new feature value reaches the difference between the feature map width and the convolution kernel width, which indicates that the multiplication result corresponding to the feature value in the first row of the current feature map and all the accumulation and calculation are completed, the data in the feature value circular shift register module is emptied, and the feature value in the second row of the feature map is written.

And when the product results of all rows of the characteristic diagram and all the accumulation sums are calculated, clearing the original characteristic value in the characteristic value circulating register, writing the characteristic value of the next characteristic diagram, and performing convolution calculation on the characteristic value of the next characteristic diagram according to a characteristic diagram convolution calculation method to obtain the product results and the corresponding accumulation sums.

From the above analysis, the output target feature map is calculated according to the channel, the intra-row data and the priority of the feature map. The convolution circulation process is repeated continuously, so that the product result of one line of weight values and one line of characteristic values and the corresponding accumulated sum can be completed, and further, the multi-characteristic diagram, the multi-convolution kernel and the multi-channel convolution operation can be realized.

Optionally, the parallel computing module comprises a multiplication processing unit array composed of a plurality of multiplication processing units. The multiplication processing unit array is a three-dimensional array based on channels, columns and rows. When the multiplication processing units finish the calculation, firstly, all the product results are accumulated in the channel direction of the multiplication processing units respectively to obtain a first accumulated sum, and secondly, the first accumulated sum accumulated by the multiplication processing units in the same row is accumulated again to obtain a second accumulated result.

The parallel computing module also comprises an activation unit, and the multiplication processing array is connected with the activation unit. And inputting the convolution result into an activation unit, wherein the activation unit consists of a comparator and an activation selector, one end of the activation unit is connected with the convolution result output end, the other end of the activation unit is fixedly connected with 0, the convolution result is compared with 0, if the convolution result is greater than 0, the convolution result is output, otherwise, 0 is output, and the nonlinear activation process of the data is completed.

Alternatively, the array of multiplication processing units is configured to a size of 3 × 14 × 64, and the calculation bit width is set to 8 bits. Based on this, the convolution calculation process is limited by: maximum 3 × 3 convolution kernel, maximum number of channels 64, maximum number of rows processed simultaneously 16. The data of the weight value is transmitted to the multiplication processing unit array through one channel, and 512 (8 bits × 64) bits are transmitted at a time, namely, the data of all channels on one two-dimensional point. The eigenvalue is transmitted into 8192 (8 bit × 64 × 16) bit data at a time, that is, all image data required in the convolution multiplication processing unit array at a time.

In one embodiment, the image enhancement method based on the FPGA performs convolution calculation according to a convolution kernel of a current cycle and a feature map of the current cycle, and includes: step 261, step 262 and step 263.

And 261, splitting the characteristic diagram according to the array size of the multiplication processing unit and the size of the characteristic diagram of the current cycle to obtain the split characteristic diagram.

And 262, carrying out batch convolution calculation according to the split characteristic diagram to obtain a plurality of batch convolution results.

And 263, obtaining the convolution calculation result of the current cycle according to the convolution results of a plurality of batches.

In the convolution calculation process, splitting the characteristic diagram according to the array size of the multiplication processing unit and the size of the characteristic diagram; and carrying out batch convolution calculation according to the split characteristic diagram to obtain a target characteristic diagram (final convolution result). The limitation of the size of the array of the multiplication processing units can cause the limitation of the maximum channel number and the maximum row number in one convolution operation. If the maximum number of channels of the convolution is larger than 64 or the number of lines of the output feature graph is larger than 16, splitting the large convolution into small convolutions needs to be considered, and performing batch convolution calculation according to split convolution kernels to obtain a final convolution result. For example, assuming that the number of channels of convolution is 128, the calculation is divided into 2 times in the channel direction, the convolution results of the first 64 channels are calculated once, the convolution results of the last 64 channels are calculated once, the two are finally accumulated to obtain the final convolution result of the current cycle, and then the nonlinear activation is performed, so that the target feature map of the current cycle can be obtained.

Optionally, the calculated amount of each batch convolution calculation is equal to or different from 5% of the calculated amount, or the data amount of each split feature map is equal to or different from 5% of the data amount of the feature map.

In one embodiment, the image enhancement method based on the FPGA performs convolution calculation and nonlinear activation according to a convolution kernel of a current cycle and a feature map of the current cycle to obtain a target feature map of the current cycle, and includes: step 271, step 272, step 273, step 274 and step 275.

And 271, performing convolution calculation and nonlinear activation according to the convolution kernel of the current cycle and the feature map of the current cycle to obtain a target feature map before pooling of the current cycle.

And 272, obtaining a plurality of first feature values in the target feature map before the current cyclic pooling, and converting the plurality of first feature values into one or more first periodic feature values according to a preset calculation rule, wherein the distance between the plurality of first feature values is smaller than a first preset distance, and the number of the first periodic feature values is smaller than the number of the first feature values.

Step 273, obtaining a plurality of second eigenvalues in the target eigenvalue before the current cyclic pooling, and converting the plurality of second eigenvalues into one or more second periodic eigenvalues according to a preset calculation rule, wherein a distance between the plurality of second eigenvalues is smaller than a second preset distance, a distance between the plurality of first eigenvalues and the plurality of second eigenvalues is smaller than a third preset distance, and the number of the second periodic eigenvalues is smaller than the number of the second eigenvalues.

Step 274, converting the first cycle characteristic value and the second cycle characteristic value into one or more third cycle characteristic values according to a preset calculation rule, wherein the number of the third cycle characteristic values is smaller than the sum of the number of the first characteristic values and the number of the second characteristic values.

And 275, obtaining the target characteristic diagram after the current circulation pooling according to the third period characteristic value.

It should be noted that the first feature value and the second feature value are feature values in the target feature map, which are usually adjacent feature values or feature values that are less than a preset distance apart from each other in the target feature map, and may be understood as feature values in a certain target region of the feature map, for example, two adjacent features in the feature map are taken as the first feature value, two adjacent features in the feature map are taken as the second feature value, where the first feature value and the second feature value are also adjacent.

The preset calculation rule may be a calculation rule based on an average value or a maximum value, that is, an average value of the respective feature values is calculated or a maximum value thereof is obtained by comparing the respective feature values. And extracting the features of different positions of the feature map by calculating the average value or the maximum value in a certain region of the feature map after convolution operation.

By inserting the pooling layer into the continuous convolution layer, the feature map is subjected to pooling processing, so that the size of the target feature map can be reduced and then the target feature map enters the next convolution cycle, and overfitting is reduced.

The embodiment optimizes the pooling process, not only can compress the size of the target feature map, but also can improve the speed of pooling treatment.

For example, three cycles are used to complete the pooling operation of the target feature map. The first period obtains the larger number of the first group of characteristic values (two characteristic values), namely the first period characteristic value, and stores the first period characteristic value into a first register; the second period obtains the second group of numbers with larger characteristic values (the other two characteristic values), namely the second period characteristic value, and stores the second period characteristic value into a second register; and in the third period, the data of the first two periods (the first period characteristic value and the second period characteristic value) are compared and output, and the first register and the second register receive new data for storage, so that the pooling of a group of characteristic value data is completed, and the first period of the next pooling operation is started. The first register and the second register are located in the register module.

Example two

Fig. 11 is a schematic structural diagram of an FPGA device 60 in this embodiment, and as shown in fig. 11, the FPGA device 60 includes: the device comprises a serial configuration module 601, a parallel computing module 602 and a register module 603, wherein the serial configuration module 601 is connected with the parallel computing module 602, the serial configuration module 601 is connected with the register module 603, and the parallel computing module 602 is connected with the register module 603.

The serial configuration module 601 is configured to respond to a start signal or an interrupt request signal of a previous cycle, obtain a convolution kernel of a current cycle from the image-enhanced neural network model, obtain a feature map of the current cycle according to a target image or a target feature map of the previous cycle, and configure the register module 603 according to the convolution kernel of the current cycle and the feature map of the current cycle.

A parallel computing module 602, configured to read the convolution kernel of the current cycle and the feature map of the current cycle from the register module 603, perform convolution computation and nonlinear activation according to the convolution kernel of the current cycle and the feature map of the current cycle, obtain a target feature map of the current cycle, and output an interrupt request signal of the current cycle.

And the register module 603 is used for storing the convolution kernel of the current cycle and the feature map of the current cycle.

The FPGA device 60 provided in this embodiment may be used to perform enhancement processing on a target image by using an image enhancement neural network model, and a convolution processing process of the image enhancement neural network model is divided into two parts, where a configuration part of the register module 603 is implemented by controlling the serial configuration module 601, and convolution calculation is implemented by calculating the parallel calculation module 602, so that the characteristics that the serial configuration module 601 excels in controlling and the parallel calculation module 602 excels in parallel calculation can be utilized, which is beneficial to improving the processing speed of the image enhancement neural network model on the target image.

In one embodiment, the parallel computing module 602 is further configured to binarize a weight value of a convolution kernel in the image enhanced neural network model, and obtain a binarized convolution kernel according to a scaling factor and the binarized weight value, where the scaling factor is an average value of absolute values of the weight values in the convolution kernel; and performing convolution calculation according to the binarized convolution kernel and the feature map of the current cycle.

In one embodiment, the serial configuration module 601 is further configured to count the interrupt request signals to obtain the number of interrupts; and acquiring a corresponding convolution layer from the image enhancement neural network model according to the interruption times, and determining the convolution kernel of the current cycle according to the corresponding convolution layer.

In an embodiment of the present invention, the serial configuration module 601 is further configured to sequentially write the weight values in the convolution kernel of the current loop into the register module 603 according to a preset weight value sequence, where the preset weight value sequence is: the method comprises the steps of sequencing weighted values from an initial weighted value according to a convolution kernel serial number, a channel serial number, a line serial number and a row serial number, enabling the convolution kernel serial number of the weighted value to be added with 1 when the convolution kernel serial number of the weighted value is smaller than the maximum convolution kernel serial number of the weighted value, enabling the channel serial number of the weighted value to be added with 1 when the convolution kernel serial number of the weighted value is increased to the maximum convolution kernel serial number of the weighted value, resetting the channel serial number of the weighted value and the convolution kernel serial number of the weighted value, enabling the line serial number of the weighted value to be added with 1 when the channel serial number of the weighted value is increased to the maximum line serial number of the weighted value, resetting the line serial number of the weighted value, the channel serial number of the weighted value and the convolution kernel serial number of the weighted value.

In one embodiment, the serial configuration module 601 is further configured to write the feature values in the feature map of the current cycle into the register module 603 according to a preset feature value sequence, where the preset feature value sequence is: sorting the characteristic values from the initial characteristic value according to the channel serial number, the column serial number, the convolution kernel serial number and the row serial number, enabling the channel serial number of the characteristic value to be added by 1 when the channel serial number of the characteristic value is smaller than the maximum channel serial number of the characteristic value, enabling the column serial number of the characteristic value to be added by 1 when the channel serial number of the characteristic value is increased to the maximum channel serial number of the characteristic value, resetting the column serial number of the characteristic value and the channel serial number of the characteristic value, enabling the row serial number of the characteristic value to be added by 1 when the sequence number of the characteristic value is increased to the maximum sequence serial number of the characteristic value, resetting the characteristic diagram serial number of the characteristic value, the column serial number of the characteristic value and the channel serial number of the characteristic value.

In one embodiment, the serial configuration module 601 is further configured to store the weight values in the convolution kernel of the current loop and the feature values in the feature map of the current loop in a register unit to be written in the register module 603, where the register module 603 includes: the device comprises a register unit to be written, a weight value cyclic shift register unit and a characteristic value cyclic shift register unit, wherein the register unit to be written is connected with the weight value cyclic shift register unit, and the register unit to be written is connected with the characteristic value cyclic shift register unit.

In one embodiment, the register module 603 is further configured to sequentially read a row of weight values in a convolution kernel of the current cycle and write the row of weight values into a weight value circular shift register unit, and sequentially read a feature value of a feature map of the current cycle and write the feature value into a feature value circular shift register unit, where the number of feature values in the feature value circular shift register unit is a product of a width of the convolution kernel of the current cycle and a number of channels of the convolution kernel of the current cycle, and the feature values in the feature value circular shift register unit and the weight values in the weight value circular shift register unit have a convolution correspondence.

In one embodiment, the register module 603 is further configured to perform one or more shift cycles on the weight value in the weight value cyclic shift register unit, perform one shift cycle on the feature value in the feature value cyclic shift register unit when the shift cycle number of the weight value cyclic shift register unit is n +1, obtain a new feature value from the register unit to be written and write the new feature value into the feature value cyclic shift register unit when the shift cycle number of the weight value cyclic shift register unit is m +1, clear data in the weight value cyclic shift register unit and return to the reading step when the number of obtaining the new feature value is a value obtained by adding 1 after a difference between a width of the feature map and a width of the convolution kernel is divided by a step length, n is an integral multiple of the number of convolution kernels of the current cycle, m is an integral multiple of the number of weight values in a weight value cyclic shift register unit, the number of new characteristic values is determined according to convolution step length, when the new characteristic values are written into the characteristic value cyclic shift register unit, the input end and the output end of the register element are disconnected through a selector, then the new characteristic values are written into the register element, and the original characteristic values with the number equal to that of the new characteristic values are deleted.

In an embodiment of the foregoing, the parallel computation module 602 is further configured to, before performing shift cycle on the weight values in the weight value cyclic shift register unit each time, read the weight value of the target position in the weight value cyclic shift register unit to obtain a read weight value, read the feature value of the target position in the feature value cyclic shift register unit to obtain a read feature value, multiply the read weight value and the read feature value to obtain a product result, store the product result, and accumulate the product result according to a convolution principle.

In one embodiment, the serial configuration module 601 is further configured to configure the enhanced target image in the register module 603.

In one embodiment, the parallel computing module 602 is further configured to read the enhanced target image in the register module 603 and generate digitized image information.

The helmet provided by this embodiment can enhance the target image according to the image enhancement neural network model, and the convolution processing process of the image enhancement neural network model is divided into two parts, wherein the configuration part of the register module 603 is implemented by the control of the serial configuration module 601, and the convolution calculation is implemented by the calculation of the parallel calculation module 602, so that the characteristics that the serial configuration module 601 excels in the control and the parallel calculation module 602 excels in the parallel calculation can be utilized, and the processing speed of the image enhancement neural network model on the target image can be improved.

The FPGA device 60 of the present embodiment is an FPGA based image enhancement method. The operation principle of the FPGA device 60 can refer to the related description of the image enhancement method based on the FPGA, and thus, the description thereof is omitted.

EXAMPLE III

Fig. 12 is a schematic structural view of a helmet according to the present embodiment, and as shown in fig. 12, the helmet includes: a camera 702, an enhanced processor 703 and a display screen 704.

And a camera 702 for acquiring a target image.

The enhancement processor 703 is configured to obtain an image-enhanced neural network model, and perform enhancement processing on the target image through the image-enhanced neural network model to obtain an enhanced target image.

And a display screen 704 for displaying the enhanced target image.

Wherein, the enhancement processor 703 includes: an FPGA; the FPGA includes: the device comprises a serial configuration module, a parallel computation module and a register module, wherein the serial configuration module is connected with the parallel computation module, the serial configuration module is connected with the register module, and the parallel computation module is connected with the register module.

Enhancement processing comprising: performing convolution loop processing based on the FPGA according to the target image and the image enhancement neural network model until the number of loop times of the convolution loop processing is equal to the number of layers of all convolution layers in the image enhancement neural network model to obtain a target characteristic diagram; and enhancing the target image according to the target characteristic graph to obtain the enhanced target image.

Convolution cyclic processing based on FPGA includes: responding to a starting signal or an interrupt request signal of the previous cycle through a serial configuration module, acquiring a convolution kernel of the current cycle from an image enhancement neural network model, acquiring a feature map of the current cycle according to a target image or a target feature map of the previous cycle, and configuring a register module according to the convolution kernel of the current cycle and the feature map of the current cycle; reading the convolution kernel of the current cycle and the feature map of the current cycle from the register module through the parallel computing module, performing convolution computation and nonlinear activation according to the convolution kernel of the current cycle and the feature map of the current cycle, obtaining a target feature map of the current cycle and outputting an interrupt request signal of the current cycle.

The helmet of this embodiment is a wearable equipment, and convenience of customers carries. Optionally, the helmet comprises a helmet body 701, wherein the helmet body 701 is connected with the display screen 704. The enhanced processor 703 is disposed in the helmet, for example, the enhanced processor 703 is fixed on the helmet body 701 or on the display screen 704. The camera 702 is arranged in the forehead area of the helmet, and the lens faces the preset front of the helmet, wherein the camera 702 is connected with the enhancement processor 703.

The helmet can rapidly process the acquired target image, and the enhanced image is displayed. For example, at night, color restoration can be performed on an acquired infrared image, specifically, the image enhancement neural network model is a convolution neural network model with color restoration, and can perform color restoration on an infrared image or a black-and-white image shot at night and restore the infrared image or the black-and-white image shot at night to a corresponding scene picture under a visible light condition in the day. Therefore, the user can carry the helmet with him to observe the target area and acquire the color image restored by the target area in real time.

The helmet is a product or system of an FPGA-based image enhancement method. The operating principle of the helmet can refer to the image enhancement method based on the FPGA, and thus, the description is omitted.

Example four

Fig. 13 is a schematic structural diagram of an electronic device according to the present invention. The electronic device comprises a memory and a processor, the memory stores a computer program which can run on the processor, wherein when the processor executes the program, the steps of the above image enhancement method based on the FPGA are realized, wherein the processor comprises: and (5) FPGA.

The FPGA includes: the device comprises a serial configuration module, a parallel computation module and a register module, wherein the serial configuration module is connected with the parallel computation module, the serial configuration module is connected with the register module, and the parallel computation module is connected with the register module.

The electronic device includes a processor 802 and a memory 801 communicatively coupled to each other via a system bus 803. It is noted that only an electronic device having components 801-803 is shown, but it is understood that not all of the illustrated components are required to be implemented, and that more or fewer components can alternatively be implemented. As will be understood by those skilled in the art, the electronic device is an electronic device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The electronic device may be a desktop computer, a notebook, a palmtop, a cloud server, or other computing device. The device can be in man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 801 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 801 may be an internal storage unit of the device, such as a hard disk or a memory of the device. In other embodiments, the memory 801 may be an external storage device of the device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. provided on the device. Of course, the memory 801 may also include both internal and external memory units of the device. In this embodiment, the memory 801 is generally used for storing an operating system and various types of application software installed in the device, such as computer readable instructions of a method for constructing a knowledge graph. In addition, the memory 801 can also be used to temporarily store various types of data that have been output or are to be output.

The processor 802 may be a FGPA processor, controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 802 is typically used to control the overall operation of the device. In this embodiment, the processor 802 is configured to execute computer readable instructions stored in the memory 801 or to process data, such as computer readable instructions for executing an FPGA-based image enhancement method.

EXAMPLE five

The present invention provides a storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the above FPGA-based image enhancement method, wherein the processor comprises: and (5) FPGA.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (such as a ROM/RAM, a magnetic disk, and an optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method of the embodiments of the present application.

It should be understood that the above-described embodiments are merely exemplary of some, and not all, embodiments of the present application, and that the drawings illustrate preferred embodiments of the present application without limiting the scope of the claims appended hereto. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims

1. An image enhancement method based on FPGA is characterized by comprising the following steps:

acquiring a target image;

acquiring an image enhancement neural network model, and enhancing the target image through the image enhancement neural network model to obtain an enhanced target image;

the enhancement processing includes:

and reading the convolution kernel of the current cycle and the feature map of the current cycle from the register module through the parallel computing module, performing convolution computation and nonlinear activation according to the convolution kernel of the current cycle and the feature map of the current cycle to obtain a target feature map of the current cycle and output an interrupt request signal of the current cycle.

2. The FPGA-based image enhancement method of claim 1, wherein the performing convolution calculation according to the convolution kernel of the current loop and the feature map of the current loop comprises:

3. The FPGA-based image enhancement method of claim 1, wherein the obtaining the convolution kernel of the current cycle from the image enhancement neural network model comprises:

4. The FPGA-based image enhancement method of claim 1, wherein the configuring the register module according to the convolution kernel of the current loop and the feature map of the current loop comprises:

sequentially writing the weight values in the convolution kernels of the current cycle into the register module according to a preset weight value sequence, wherein the preset weight value sequence is as follows: sequencing weight values from an initial weight value according to a convolution kernel serial number, a channel serial number, a line serial number and a line serial number, wherein the convolution kernel serial number of the weight value is enabled to be self-added with 1 when the convolution kernel serial number of the weight value is smaller than the maximum convolution kernel serial number of the weight value, the channel serial number of the weight value is enabled to be self-added with 1 when the convolution kernel serial number of the weight value is increased to the maximum convolution kernel serial number of the weight value, the channel serial number of the weight value is enabled to be self-added with 1 when the channel serial number of the weight value is increased to the maximum channel serial number of the weight value, the channel serial number of the weight value is enabled to be self-added with 1, the line serial number of the weight value is reset, the channel serial number of the weight value is reset, and the convolution kernel serial number of the weight value is reset when the line serial number of the weight value is increased to the maximum line serial number of the weight value;

writing the characteristic values in the characteristic diagram of the current cycle into the register module according to a preset characteristic value sequence, wherein the preset characteristic value sequence is as follows: sorting the characteristic values from the initial characteristic value according to a channel serial number, a column serial number, a convolution kernel serial number and a row serial number, adding 1 to the channel serial number of the characteristic value when the channel serial number of the characteristic value is smaller than the maximum channel serial number of the characteristic value, adding 1 to the column serial number of the characteristic value when the channel serial number of the characteristic value is increased to the maximum channel serial number of the characteristic value, and resetting the channel serial number of the characteristic value, when the characteristic value sequence number is added to the maximum sequence number of the characteristic value, the characteristic diagram sequence number of the characteristic value is enabled to be self-added with 1 and reset the characteristic value sequence number and the characteristic value channel sequence number, and when the characteristic diagram sequence number of the characteristic value is added to the maximum characteristic diagram sequence number of the characteristic value, the characteristic value sequence number is enabled to be self-added with 1 and reset the characteristic value sequence number, the characteristic value sequence number and the characteristic value channel sequence number are reset.

5. The FPGA-based image enhancement method of claim 4, wherein the reading the convolution kernel of the current loop and the feature map of the current loop from the register module, and performing convolution calculation according to the convolution kernel of the current loop and the feature map of the current loop comprises:

a storage step: storing the weight value in the convolution kernel of the current cycle and the feature value in the feature map of the current cycle in a register unit to be written into the register module, wherein the register module comprises: the register unit to be written in is connected with the weight value cyclic shift register unit, and the register unit to be written in is connected with the characteristic value cyclic shift register unit;

and (3) calculating: performing one or more shift cycles on the weight values in the weight value circular shift register unit, when the shift cycle number of the weighted value circular shift register unit is n +1, the characteristic value in the characteristic value circular shift register unit is subjected to one shift cycle, when the shift cycle number of the weight value cyclic shift register unit is m +1, acquiring a new characteristic value from the register unit to be written and writing the new characteristic value into the characteristic value cyclic shift register unit, when the number of times of obtaining the new feature value is a value obtained by dividing the difference between the width of the feature map and the width of the convolution kernel by the step length and then adding 1, emptying the data in the weight value cyclic shift register unit and returning to the reading step, wherein n is an integer multiple of the number of convolution kernels of the current cycle, m is an integer multiple of the number of weight values in the weight value cyclic shift register unit, the number of the new characteristic values is determined according to a convolution step length, and when writing the new eigenvalue into the eigenvalue circular shift register unit, disconnecting the input terminal and the output terminal of the register element by the selector, then writing the new eigenvalue and deleting the original eigenvalue of the same number as the new eigenvalue, said eigenvalue circular shift register unit comprises said selector and said register elements, the output of said selector comprises a first input and a second input, the output end of the register element comprises a first output end and a second output end, the output end of the selector is connected with the input end of the register element, and the first input end of the selector is connected with the first output end of the register element;

6. The FPGA-based image enhancement method of claim 1, further comprising:

7. The FPGA-based image enhancement method of claim 1, said performing convolution calculations based on said convolution kernel of said current loop and said feature map of said current loop, comprising:

splitting the feature map according to the size of a multiplication processing unit array and the size of the feature map of the current cycle to obtain the split feature map, wherein the multiplication processing unit array is a unit array used for multiplication calculation in the parallel calculation module;

carrying out batch convolution calculation according to the split characteristic diagram to obtain a plurality of batch convolution results;

8. The image enhancement method based on the FPGA of claim 1, wherein the performing convolution calculation and nonlinear activation according to the convolution kernel of the current loop and the feature map of the current loop to obtain the target feature map of the current loop comprises:

obtaining a plurality of second characteristic values in the target characteristic map before the current cyclic pooling, and converting the plurality of second characteristic values into one or more second periodic characteristic values according to a preset calculation rule, wherein the distance between the plurality of second characteristic values is smaller than a second preset distance, the distance between the plurality of first characteristic values and the plurality of second characteristic values is smaller than a third preset distance, and the number of the second periodic characteristic values is smaller than the number of the second characteristic values;

converting the first periodic characteristic value and the second periodic characteristic value into one or more third periodic characteristic values according to a preset calculation rule, wherein the number of the third periodic characteristic values is less than the sum of the number of the first characteristic values and the number of the second characteristic values;

9. An FPGA device, comprising: the device comprises a serial configuration module, a parallel computation module and a register module, wherein the serial configuration module is connected with the parallel computation module, the serial configuration module is connected with the register module, and the parallel computation module is connected with the register module;

the serial configuration module is used for responding to a starting signal or an interrupt request signal of the previous cycle, acquiring a convolution kernel of the current cycle from the image enhancement neural network model, acquiring a feature map of the current cycle according to a target image or a target feature map of the previous cycle, and configuring the register module according to the convolution kernel of the current cycle and the feature map of the current cycle;

10. A helmet, comprising:

the camera is used for acquiring a target image;

the display screen is used for displaying the enhanced target image;

wherein the enhancement processor comprises: an FPGA;

the enhancement processing includes:

the convolution cyclic processing based on the FPGA comprises the following steps:

responding to a start signal or an interrupt request signal of the previous cycle through the serial configuration module, acquiring a convolution kernel of the current cycle from the image enhancement neural network model, acquiring a feature map of the current cycle according to the target image or the target feature map of the previous cycle, and configuring the register module according to the convolution kernel of the current cycle and the feature map of the current cycle;

11. An electronic device comprising a memory and a processor, said memory storing a computer program operable on the processor, wherein the processor when executing said program performs the steps of the FPGA-based image enhancement method of any one of claims 1 to 8, wherein the processor comprises: an FPGA;

12. A storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the FPGA-based image enhancement method of any one of claims 1 to 8, wherein the processor comprises: an FPGA;