CN116051356A - Rapid style migration method based on image and FPGA system - Google Patents

Rapid style migration method based on image and FPGA system Download PDF

Info

Publication number
CN116051356A
CN116051356A CN202310049200.5A CN202310049200A CN116051356A CN 116051356 A CN116051356 A CN 116051356A CN 202310049200 A CN202310049200 A CN 202310049200A CN 116051356 A CN116051356 A CN 116051356A
Authority
CN
China
Prior art keywords
module
data
attention
convolution
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310049200.5A
Other languages
Chinese (zh)
Inventor
陈盼盼
孙莉
张国和
郑培清
秦玉
侯俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Siyuan Integrated Circuit And Intelligent Technology Research Institute Co ltd
Original Assignee
Jiangsu Siyuan Integrated Circuit And Intelligent Technology Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Siyuan Integrated Circuit And Intelligent Technology Research Institute Co ltd filed Critical Jiangsu Siyuan Integrated Circuit And Intelligent Technology Research Institute Co ltd
Priority to CN202310049200.5A priority Critical patent/CN116051356A/en
Publication of CN116051356A publication Critical patent/CN116051356A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Complex Calculations (AREA)

Abstract

The invention relates to the technical field of image processing, in particular to a rapid style migration method based on images and an FPGA system, which comprises the steps of optimizing a style migration network by improving a residual error network structure and a dual attention mechanism; and the hardware realization of the improved style migration network is realized by the software processing realized based on the ARM CPU and the programmable hardware logic processing realized based on the FPGA. The invention improves the residual error network to obviously reduce the parameters of the rapid style network of the image; the network structure of dual attention is increased, the definition of the content structure and the expression capability of the style characteristics are improved, and therefore the quality of the processed image is improved; in addition, hardware deployment is carried out on the FPGA, compared with the traditional image online processing which does not depend on network link, the reasoning speed of the model is not influenced by the quality of transmission signals, and the privacy of user data can be ensured.

Description

Rapid style migration method based on image and FPGA system
Technical Field
The invention relates to the technical field of image processing, in particular to a rapid style migration method based on images and an FPGA system.
Background
The image style migration technology is an image processing method for rendering semantic content by using different styles, and converts the artistic style of an image while guaranteeing the structure of the original image content to obtain the texture and aesthetic characteristics of the style image, so that the finally output generated image presents perfect combination of different image contents and styles. However, the existing image style migration technology has the problems that the content structure is unclear and the processed images are inconsistent in terms of color, texture, shape and the like.
Neural networks are an artificial intelligent machine learning technology, and particularly, deep convolutional neural networks have received a great deal of attention, and have achieved some remarkable results in the fields of speech recognition, natural language processing and intelligent image processing, particularly in the field of image recognition. However, the common network model achieves the magnitude of 10 hundred million in calculation amount, and achieves the magnitude of hundreds of megameters in parameter amount; for embedded equipment with tense resources and more sensitive power consumption, huge calculation and parameter quantity make the neural network put a more stringent requirement for the realization of the convolutional neural network.
The hardware accelerator platforms currently mainly have three types: graphics Processors (GPUs), application Specific Integrated Circuits (ASICs), and Field Programmable Gate Arrays (FPGAs). The GPU is widely applied to the neural network, but has higher power consumption; ASIC and FPGA are both high performance and low power consumption, where ASIC can design a dedicated architecture to accommodate the operation of a specific neural network architecture, but at high development and manufacturing costs; the FPGA is a programmable logic array, flexible in development and low in development cost, and is relatively suitable for design research of an accelerator as a functional simulation and verification platform before the current AI chip is subjected to film streaming.
FPGA (Field Programmable Gate Array) is a configurable logic gate circuit with configurable logic blocks and user input and output interface components, and research developers can build processing architectures capable of realizing different functions by configuring relevant switch states in the FPGA. The advantages of high-performance parallel computation, ultra-low power consumption, low cost and the like of the FPGA are fully utilized to research a high-performance architecture for realizing the convolutional neural network, and the method is one of the necessary trends in the field of artificial intelligence.
Disclosure of Invention
Aiming at the defects of the existing algorithm, the invention improves the residual error network to obviously reduce the parameters of the rapid style network of the image; the network structure of dual attention is increased, the definition of the content structure and the expression capability of the style characteristics are improved, and therefore the quality of the processed image is improved; in addition, hardware deployment is carried out on the FPGA, compared with the traditional image online processing which does not depend on network link, the reasoning speed of the model is not influenced by the quality of transmission signals, and the privacy of user data can be ensured.
The technical scheme adopted by the invention is as follows: a rapid style migration method based on images comprises the following steps:
step one, optimizing a style migration network by improving a residual error network structure and a dual attention mechanism;
further, improving the residual network reduces 256 dimensions to 64 dimensions under one 1x1 convolution layer first by the middle 3x3 convolution layer, and then increases the dimensions to 256 by the 1x1 convolution layer, thereby improving the residual network and reducing the calculation amount while maintaining the accuracy.
Further, the improved dual attention mechanism comprises a position attention module and a channel attention module, wherein the position attention module selectively gathers each position feature by calculating a weighted sum of fusion features of the whole space, and similar features are related to each other no matter the distance; the channel attention module selectively emphasizes the channel graphs related to each other by integrating the related features in all the fusion channel graphs, and finally adds the enhancement results of the two attention modules.
Further, the feature map formula of the position attention module is:
Figure BDA0004057016340000031
wherein K is s Is a position attention profile; θ 1 Weight coefficient; f (f) i 3 Is the line i feature of position attention;
Figure BDA0004057016340000034
is a location attention mask; f (f) cs Is a preliminary fusion feature map.
Further, the characteristic diagram formula of the channel attention module is as follows:
Figure BDA0004057016340000032
wherein θ 2 Weight coefficient;
Figure BDA0004057016340000035
is a channel attention mask; f (f) i 3t Is the channel attention line i feature; f (f) cs Is a preliminary fusion feature map.
Further, the two attention module strengthening results are added to obtain a fusion feature map:
Figure BDA0004057016340000033
wherein K is s K is a feature map of the position attention module c Is a feature map of the channel attention module.
And step two, performing software processing based on ARM CPU and programmable hardware logic processing based on FPGA to realize the hardware implementation of the improved style migration network.
Further, the FPGA system based on the rapid style migration of the image comprises: the device comprises a register module, a data interface module, a convolution module, a pooling module and a control module, wherein the register comprises a memory address and network layer parameters, the data interface module is used for completing data conversion of an accelerator and an external internal size interface, the convolution module is used for completing acceleration operation of a convolution layer and a full connection layer, the pooling module is used for completing pooling operation, and the control module is used for completing scheduling of each module.
Further, the convolution module includes: the system comprises a DMA module, an on-chip cache module, a logic control module, a convolution calculation module and a read-write address module, wherein the DMA module is responsible for high-efficiency transmission of data between an acceleration circuit and an external memory; the on-chip cache module stores the characteristic and weight data, and the read-write address module calculates the characteristic weight address in the BRAM; the convolution calculation module is responsible for multiply-accumulate calculation; the logic control module performs scheduling of each module; and the AXI_lite bus carries out signal transmission of PS and PL ends, the on-chip buffer writes register data corresponding to each convolution layer into the hardware accelerator, and the PL end hardware is scheduled to calculate the convolution layers.
Further, the control method of the convolution module comprises the following steps:
the data is stored in a buffer unit after being calculated in blocks by adopting a multiply-accumulate tree structure;
calculating the size N of the convolution kernel of the input channel N and the output channel N, reading the characteristic data of the N in each clock period, copying N copies, reading the weight data of the input N in from the on-chip buffer, and obtaining a calculation result after multiplying the weight data by the accumulator;
and writing the feature weight into a read-write address module in real time, and giving the read feature weight and the read weight to a DMA (direct memory access) control DDR, storing the read feature and the read weight into a BRAM (binary data memory) after the calculation of the BRAM, simultaneously calculating a read-out address of the BRAM, carrying out padding zero padding, and then giving the data to a convolution calculation module.
Furthermore, the pooling module uses an AXI bus to transmit input and output data, the pooling parameter is transmitted to the pooling module by the DDR through the AXI_lite bus configuration register, and then the data is written into the DDR.
The invention has the beneficial effects that:
1. the problems of blurring of a generated image structure and unclear edges caused by insufficient feature fusion are solved, and the expression capability of important features of the image is improved;
2. the total unit structure parameter is reduced from 1179648 to 69632 by about 94%, and the calculated amount is reduced while the precision is maintained;
3. the FPGA has the advantages of extremely small area, higher speed and lower power consumption, is suitable for equipment with limited resources, and has good application prospect. The method is realized through FPGA hardware, the dependence of the traditional online style migration network on network connection is eliminated, style migration can be completed in real time in an offline state, and user privacy and information security can be protected in part of application scenes.
Drawings
FIG. 1 is an image style migration network of the present invention;
FIG. 2 is a diagram of a residual network structure before and after modification;
FIG. 3 is a hardware system architecture of the present invention;
FIG. 4 is a flow chart of the data flow control of the present invention;
FIG. 5 is a block diagram of a convolution module of the present invention;
FIG. 6 is a block diagram of a fully connected module of the present invention;
FIG. 7 is a dual attention model of the present invention;
FIG. 8 is a diagram of an image rapid style migration network incorporating dual attention mechanisms of the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and examples, which are simplified schematic illustrations showing only the basic structure of the invention and thus showing only those constructions that are relevant to the invention.
And transmitting weight parameters of the image style migration algorithm trained in advance to a weight data buffer module, performing layer-by-layer calculation of the whole convolutional neural network by the convolutional module according to the buffered weight data, bias data and image data, finally obtaining content loss between the generated image and the content image and style loss between the generated image and the style image, and continuously calculating to reduce the loss.
And back-propagating the loss to the image generation network and optimizing the image generation network to obtain a qualified image style conversion model.
(1) Image style migration model compression:
as shown in FIG. 1, the model migration network mainly comprises an image generation network and a loss function network, wherein the loss network is a convolutional neural network, the network model parameters are obtained through pre-training, and the model parameters cannot be changed in the whole conversion process.
The image generation network mainly comprises a downsampling convolution layer, a residual error network and an upsampling convolution layer; the image generation network carries out forward and backward style conversion on the picture, and the loss network carries out constraint on characteristic data of the image during training; firstly, a pre-trained part of VGG network is adopted to carry out downsampling operation, so that a characteristic diagram is reduced, the network is deepened, and the calculated amount is reduced; and the residual network performs normalization operation on the content and the style images, performs feature mapping statistic matching to realize image normalization, then performs self-adaptive feature space mapping to generate a target picture, and then restores the target picture into an image through up-sampling, and continuously updates normalization layer parameters gamma and beta to enable the network to converge, so that the self-adaptive normalization layer realizes conversion of any style.
The model trained by the image style migration algorithm is strong in performance, but the size of a single style model is about 20M, the storage and calculation cost is high, great challenges are generated for the storage capacity and the calculation capacity of a mobile terminal, and the model is difficult to deploy to a hardware platform; the parameters of the whole network model are mainly concentrated in the residual layers, the parameter of the single residual layer accounts for about 17.6% in the whole network model structure, and the parameter of the five residual network parameters accounts for 88% in the whole network model, so that the network of the rapid style migration model is compressed, the calculation mode of the residual network in the image generation network is improved, the convolution layers of two layers 3x3 are changed into convolution layers of two layers 1x1 and 3x3, the improved network is shown on the right side of fig. 2, the left side of fig. 2 is the existing residual network structure, and the formula is:
P=N*C*3*3 (1)
in formula (1), P is the total number of parameters, N is the number of convolution kernels, i.e., the number of output channels, and C is the number of input channels; if the number of input channels is 256, the convolution kernel size is 3x3, the number of output channels is 256, the total number of parameters of the whole unit structure calculated to be 3x3x256x256x 2= 1179648, the whole structure is optimized by improving a residual block, the number of input channels input to the convolution kernel of 3x3 is reduced, the new structure can reduce 256 dimensions to 64 dimensions under one 1x1 convolution layer, then the convolution layer of 1x1 rises to 256 dimensions, the calculated amount is reduced while the precision is maintained, the number of parameters is 1x64x256+3x 64x64+1x64x256= 69632, and the parameters of each unit are reduced by about 94%.
(2) A rapid style migration network based on dual attention mechanisms;
after the content features and the style features are fused through self-adaptive normalization, the picture is input into a dual attention module, the dual attention module can effectively combine the picture space information and the channel information, as shown in fig. 7, the position attention module selectively gathers each position feature by calculating a weighted sum of the fused features of the whole space, and the similar features can be related no matter how far or near; the channel attention module selectively emphasizes the channel graphs related to each other by integrating the related features in all the fusion channel graphs, and finally adds the strengthening results of the two attention modules to further improve the picture effect.
Through normalization, the f of primary fusion is obtained cs First f is carried out cs Input into a convolution layer with the convolution kernel size of 1*1 for compression to obtain a matrix f 1 、f 2 、f 3 Then f 1 Transpose, and sum f 2 Multiplying the matrix to obtain the associated intensity matrix between any two point features, and normalizing by softmax operation to obtain the attention map of each position to other positions
Figure BDA0004057016340000072
Wherein i, j is the image pixel position.
Position ofAttention profile K s The calculation formula of (a) is shown as formula (2), A is firstly carried out ji s Transpose, and then sum f 3 i And performing matrix multiplication operation, and finally performing point operation with the original feature points of the corresponding positions.
Figure BDA0004057016340000071
Wherein K is s Is a position attention profile; θ 1 The weight coefficient is initially 0; f (f) i 3 Is the line i feature of position attention;
Figure BDA0004057016340000073
is a location attention mask; f (f) cs Is a preliminary fusion feature map.
The channel attention module is similar to the position attention module, and performs dimension transformation and matrix multiplication on any two channel characteristics when the channel attention module obtains the characteristic attention, so as to obtain the association strength of any two channels, and a channel attention characteristic diagram K c The calculation formula is formula (3).
Matrix addition operation is carried out on corresponding points of the feature graphs output by the two modules, and a fusion feature graph F is obtained cs As in equation (4), the weight of the important feature points is enhanced.
Figure BDA0004057016340000081
Wherein θ 2 The weight coefficient is initialized to 0, and larger weight is obtained through gradual learning in the training process,
Figure BDA0004057016340000083
is a channel attention mask, f i 3t By combining f cs The compressed data is input into a convolution layer with the convolution kernel size of 1x1 to be compressed;
Figure BDA0004057016340000082
/>
wherein K is s K is a feature map of the position attention module c Is a feature map of the channel attention module.
(3) Building a hardware circuit;
the system structure mainly comprises two parts: the ARM CPU-based software processing system and the FPGA-based programmable hardware logic circuit exert the performance advantages of large-scale parallel computation of the FPGA, and the ARM check network is used for flexible configuration; firstly, carrying out functional division on software and hardware according to the calculated amount and complexity, deploying a task with intensive calculation to an FPGA end, deploying a control task with a core to an ARM end, carrying out hardware control instruction sending and register configuration on a PS end, preparing and preprocessing data, analyzing a network model and extracting parameters, inputting the control parameters by a PL end, and simultaneously completing calculation, thereby improving the operation efficiency and reducing the power consumption; after weight data and bias data generated by pre-training a convolutional neural network to be built are obtained, the PS end reads in the trained model and test data, and the PL end realizes forward propagation calculation.
Fig. 3 is a system overall hardware architecture, an AXI4 bus is responsible for data transmission, an axi_lite bus is responsible for signal transmission, and an ARM is an external control unit and controls an internal register. The FPGA system mainly comprises a register module, a data interface module, a convolution module, a pooling module and a control module, wherein the register comprises a memory address and network layer parameters, the data interface module is used for completing data conversion of an accelerator and an external internal size interface, the convolution module is used for completing acceleration operation of a convolution layer and a full connection layer, the pooling module is used for completing pooling operation, and the control module is used for completing scheduling of each module.
Fig. 5 is a convolution module comprising: the system comprises a DMA module, an on-chip cache module, a logic control module, a convolution calculation module and a read-write address module. The DMA module is responsible for high-efficiency transmission of data between the acceleration circuit and the external memory; the on-chip cache module stores the characteristic and weight data, and the read-write address module calculates the characteristic weight address in the BRAM; the convolution calculation module is responsible for multiply-accumulate calculation, and the logic control module performs scheduling of each module. And the AXI_lite bus carries out signal transmission of PS and PL ends, the on-chip buffer writes register data corresponding to each convolution layer into the hardware accelerator, and the PL end hardware is scheduled to calculate the convolution layers.
FIG. 4 is a flow chart of data flow control for a convolution module, comprising:
the convolution module adopts a multiply-accumulate tree structure, data are stored in the buffer unit after being calculated in blocks, the parallelism is high, the performance is good, and the resource utilization rate is improved;
the convolution calculation calculates an input 4 channel and an output 4 channel simultaneously, the convolution kernel size is 3*3, 36 pieces of characteristic data are read every clock period, 4 parts are copied, 4x 3 input from an on-chip cache are read, namely 144 weight data are read, and a calculation result is given after a multiplication accumulator;
the method comprises the steps of calculating characteristic weights in real time, writing the characteristic weights into a read-write address module, giving DMA control DDR, storing the read characteristics and weights into BRAM after calculation by the DMA, storing the results into the BRAM, calculating a read-out address of the BRAM, carrying out padding zero-filling, and giving data to a convolution calculation module; and the whole design pipeline operates, internal data is regularly and movably transmitted, and accurate calculation of the data is ensured.
FIG. 6 is a fully connected module architecture, the overall design employing pipelining, control modules and computation modules through a handshake protocol to ensure normal transmission of data; the data in the convolution module is transmitted to an external DDR through an AXI4 through a data interface module; the input feature map and the weight are transmitted into the on-chip cache module through DMA; the full-connection layer has few features and more weights, the full-connection layer of the first layer has weight parameters of 98M, the features are all cached in BRAM, DDR reads the weights and the features in BRAM to calculate.
The pooling module uses an AXI bus to transmit input and output data, uses an AXI_lite bus to transmit the pooling parameters of the register signal pooling module to be configured through a register, and the DDR transmits the pooling parameters to the pooling module and then writes the data into the DDR, so that the requirement of the module on buffering can be reduced, and the use of resources is reduced; the design circuit of the invention firstly transversely pools the larger size, then longitudinally pools the larger size, and stores the adjacent data into the on-chip cache after the adjacent data are stronger.
The logic control module schedules each module to operate according to a start signal in the register, wherein the start signals are conv_valid and pool_valid, and the start of the convolution module and the pooling module are respectively controlled; after the pooling module starts to work, transmitting the processed data to the DDR, judging whether pooling is finished or not by the control module according to the returned finishing signal, pulling up a pool_fin signal after pooling is finished, and transmitting the signal to a register; when the convolution module starts to operate, the dat_run and wt_run signals separately control the feature map multiplexing and the weight multiplexing, and after one convolution is completed, if the dat_run and wt_run signals are valid, the address is updated,
multiplexing weights, and calculating and reading updated feature map data next time, wherein the weight edge is used last time; the counter calculates the number of updates and sends a completion signal when all convolution calculations are completed.
The data path adopts a hierarchical buffer and double buffer design, so that the access to the off-chip memory is reduced, the working efficiency is improved, and the data transmission among the internal modules adopts the staged multiplexing data, so that the access to the internal data is reduced. The input and output channels are calculated in parallel, mapped into the multiply-add array, and simultaneously fused with the multiply-add tree and the two-dimensional multiply-add array of the pulsation array, and a 6-stage pipeline structure is introduced, so that the resource utilization and the acceleration performance can be considered.
Compared with the prior art, the invention is realized by FPGA hardware, has the advantages of extremely small area, higher speed and lower power consumption, is suitable for equipment with limited resources, and has good application prospect.
With the above-described preferred embodiments according to the present invention as an illustration, the above-described descriptions can be used by persons skilled in the relevant art to make various changes and modifications without departing from the scope of the technical idea of the present invention. The technical scope of the present invention is not limited to the description, but must be determined according to the scope of claims.

Claims (10)

1. The rapid style migration method based on the image is characterized by comprising the following steps of:
step one, optimizing a style migration network by improving a residual error network structure and a dual attention mechanism;
and step two, performing software processing based on ARM CPU and programmable hardware logic processing based on FPGA to realize the hardware implementation of the improved style migration network.
2. The rapid style migration method based on images according to claim 1, wherein: the improved residual network reduces 256 dimensions to 64 dimensions under one 1x1 convolution layer first, and then increases the dimensions to 256 through the 1x1 convolution layer, which improves the residual network to reduce the computational effort while maintaining accuracy.
3. The rapid style migration method based on images according to claim 1, wherein: the improved dual attention mechanism comprises a position attention module and a channel attention module, wherein the position attention module selectively gathers each position feature no matter how far or near the position feature is, and the similar features are related to each other by calculating a weighted sum of fusion features of the whole space; the channel attention module selectively emphasizes the channel graphs related to each other by integrating the related features in all the fusion channel graphs, and finally adds the enhancement results of the two attention modules.
4. The rapid style migration method according to claim 3, wherein the feature map formula of the location attention module is:
Figure FDA0004057016330000011
wherein K is s Is a position attention profile; θ 1 Weight coefficient; f (f) i 3 Is the line i feature of position attention;
Figure FDA0004057016330000012
is a location attention mask; f (f) cs Is a preliminary fusion feature map.
5. The rapid style migration method according to claim 3, wherein the feature map formula of the channel attention module is:
Figure FDA0004057016330000021
wherein θ 2 Weight coefficient;
Figure FDA0004057016330000022
is a channel attention mask; f (f) i 3t Is the channel attention line i feature; f (f) cs Is a preliminary fusion feature map.
6. The rapid style migration method according to claim 3, wherein the two attention module strengthening results are added to obtain a fusion feature map:
Figure FDA0004057016330000023
wherein K is s K is a feature map of the position attention module c Is a feature map of the channel attention module.
7. An FPGA system based on rapid style migration of images, comprising: the device comprises a register module, a data interface module, a convolution module, a pooling module and a control module, wherein the register comprises a memory address and network layer parameters, the data interface module is used for completing data conversion of an accelerator and an external internal size interface, the convolution module is used for completing acceleration operation of a convolution layer and a full connection layer, the pooling module is used for completing pooling operation, and the control module is used for completing scheduling of each module.
8. The FPGA system of claim 7, wherein the convolution module comprises: the system comprises a DMA module, an on-chip cache module, a logic control module, a convolution calculation module and a read-write address module, wherein the DMA module is responsible for high-efficiency transmission of data between an acceleration circuit and an external memory; the on-chip cache module stores the characteristic and weight data, and the read-write address module calculates the characteristic weight address in the BRAM; the convolution calculation module is responsible for multiply-accumulate calculation; the logic control module performs scheduling of each module; and the AXI_lite bus carries out signal transmission of PS and PL ends, the on-chip buffer writes register data corresponding to each convolution layer into the hardware accelerator, and the PL end hardware is scheduled to calculate the convolution layers.
9. The FPGA system of claim 8, wherein the control method of the convolution module comprises:
the data is stored in a buffer unit after being calculated in blocks by adopting a multiply-accumulate tree structure;
calculating the size N of the convolution kernel of the input channel N and the output channel N, reading the characteristic data of the N in each clock period, copying N copies, reading the weight data of the input N in from the on-chip buffer, and obtaining a calculation result after multiplying the weight data by the accumulator;
and writing the feature weight into a read-write address module in real time, and giving the read feature weight and the read weight to a DMA (direct memory access) control DDR, storing the read feature and the read weight into a BRAM (binary data memory) after the calculation of the BRAM, simultaneously calculating a read-out address of the BRAM, carrying out padding zero padding, and then giving the data to a convolution calculation module.
10. The FPGA system of claim 7, wherein the pooling module uses an AXI bus to transmit input and output data, and wherein the axi_lite bus is configured to transmit pooling parameters via a register, and wherein the DDR transmits the pooling parameters to the pooling module and then writes the data to the DDR.
CN202310049200.5A 2023-02-01 2023-02-01 Rapid style migration method based on image and FPGA system Pending CN116051356A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310049200.5A CN116051356A (en) 2023-02-01 2023-02-01 Rapid style migration method based on image and FPGA system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310049200.5A CN116051356A (en) 2023-02-01 2023-02-01 Rapid style migration method based on image and FPGA system

Publications (1)

Publication Number Publication Date
CN116051356A true CN116051356A (en) 2023-05-02

Family

ID=86123487

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310049200.5A Pending CN116051356A (en) 2023-02-01 2023-02-01 Rapid style migration method based on image and FPGA system

Country Status (1)

Country Link
CN (1) CN116051356A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117621145A (en) * 2023-12-01 2024-03-01 安徽大学 Fruit maturity detects flexible arm system based on FPGA

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117621145A (en) * 2023-12-01 2024-03-01 安徽大学 Fruit maturity detects flexible arm system based on FPGA

Similar Documents

Publication Publication Date Title
CN109284817B (en) Deep separable convolutional neural network processing architecture/method/system and medium
US11501415B2 (en) Method and system for high-resolution image inpainting
WO2020073211A1 (en) Operation accelerator, processing method, and related device
US11593658B2 (en) Processing method and device
CN108764466B (en) Convolution neural network hardware based on field programmable gate array and acceleration method thereof
CN107169563B (en) Processing system and method applied to two-value weight convolutional network
CN108665063B (en) Bidirectional parallel processing convolution acceleration system for BNN hardware accelerator
EP3407266A1 (en) Artificial neural network calculating device and method for sparse connection
CN111414994B (en) FPGA-based Yolov3 network computing acceleration system and acceleration method thereof
CN113051216B (en) MobileNet-SSD target detection device and method based on FPGA acceleration
CN111210019B (en) Neural network inference method based on software and hardware cooperative acceleration
US20200265300A1 (en) Processing method and device, operation method and device
CN113792621B (en) FPGA-based target detection accelerator design method
CN116051356A (en) Rapid style migration method based on image and FPGA system
CN115423081A (en) Neural network accelerator based on CNN _ LSTM algorithm of FPGA
Shahshahani et al. Memory optimization techniques for fpga based cnn implementations
CN113392973A (en) AI chip neural network acceleration method based on FPGA
CN113301221B (en) Image processing method of depth network camera and terminal
Zhao et al. A 307-fps 351.7-GOPs/W deep learning FPGA accelerator for real-time scene text recognition
Yu et al. Optimizing FPGA-based convolutional encoder-decoder architecture for semantic segmentation
CN116011534A (en) FPGA-based general convolutional neural network accelerator implementation method
CN116246110A (en) Image classification method based on improved capsule network
CN115170381A (en) Visual SLAM acceleration system and method based on deep learning
CN112001492B (en) Mixed running water type acceleration architecture and acceleration method for binary weight DenseNet model
Bai et al. An OpenCL-based FPGA accelerator with the Winograd’s minimal filtering algorithm for convolution neuron networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination