CN113706366B - Image feature data extraction method, system and related device - Google Patents

Image feature data extraction method, system and related device Download PDF

Info

Publication number
CN113706366B
CN113706366B CN202110873716.2A CN202110873716A CN113706366B CN 113706366 B CN113706366 B CN 113706366B CN 202110873716 A CN202110873716 A CN 202110873716A CN 113706366 B CN113706366 B CN 113706366B
Authority
CN
China
Prior art keywords
data
image characteristic
characteristic data
register
ddr
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110873716.2A
Other languages
Chinese (zh)
Other versions
CN113706366A (en
Inventor
蒋东东
董刚
赵雅倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN202110873716.2A priority Critical patent/CN113706366B/en
Publication of CN113706366A publication Critical patent/CN113706366A/en
Application granted granted Critical
Publication of CN113706366B publication Critical patent/CN113706366B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The application provides an extraction method of image characteristic data, which comprises the following steps: acquiring image characteristic data and determining the corresponding parallel channel number; taking the parallel channel number as the depth of the image characteristic data, and increasing the height of the image characteristic data; using a preset number of RAMs of the FPGA as a first-level cache of the DDR for data multiplexing; and configuring a register corresponding to the convolution kernel at the back end of the first-level cache, and outputting the image characteristic data row by utilizing the register. The DDR data reading and writing efficiency can be improved to the maximum extent, uninterrupted data output of the assembly line is realized, the data reading pressure of the DDR is reduced, and the input requirement of the back-end high-bandwidth convolution computing unit can be met. The application also provides an extraction system of the image characteristic data, a computer readable storage medium and electronic equipment, which have the beneficial effects.

Description

Image feature data extraction method, system and related device
Technical Field
The present invention relates to the field of data processing, and in particular, to a method, a system, and a related device for extracting image feature data.
Background
Currently, there are mainly the following 2 implementations of CNN (Convolutional Neural Networks, convolutional neural network) convolutional data extraction processes:
1. the image characteristic data are cached in an off-chip DDR (Dynamic Random Access Memory ) of an FPGA (Field-Programmable Gate Array, field programmable gate array), only the data of the small 3*3 are read for convolution each time, the pressure of storage resources of the FPGA is reduced by utilizing the DDR with a plurality of times of reading the small range, the wiring difficulty is reduced, and the rate of convolution multiplication of a back-end pipeline is improved.
Because 3*3 data is equivalent to 3 3*1 data, reading one 3*3 data requires sending 3 read and refresh commands to the DDR, and the address needs to jump, the small segment of data read/write of the non-continuous address can greatly reduce the DDR read/write rate, typically to below 10%. Even though the partial optimization algorithm can realize pipeline convolution calculation of 3×11 data, the read-write capability of DDR (double data Rate) still cannot be completely released after 11×3 data are read each time, and the bottleneck of system calculation speed is formed.
2. The method has the advantages that data are all read into the FPGA, 3*3 data at any position can be read in 1 period and used for convolution calculation at the back end, but the FPGA has the defects that internal RAM resources are very expensive and very small, the size of 5MB can be rarely achieved in general, one input channel data is generally smaller than 512 x 8bit, the data of 20 input channels can be stored at most, and the ping-pong cache is supposed to be carried out according to 16 channel data, so that the RAM resources in the FPGA are excessively occupied, and because the RAMs of the FPGA are uniformly distributed, large-area serial wiring is required, so that wiring congestion is caused, the design implementation difficulty is extremely large, the efficiency is low, and the more the input channels are, the method is unsuitable and is not suitable for expansion.
Disclosure of Invention
The invention aims to provide an extraction method, an extraction system, a computer-readable storage medium and electronic equipment for image characteristic data, which can realize back-end multidimensional convolution calculation of a high-speed assembly line.
In order to solve the technical problems, the application provides an extraction method of image feature data, which comprises the following specific technical scheme:
acquiring image characteristic data and determining the corresponding parallel channel number;
taking the parallel channel number as the depth of the image characteristic data, and increasing the height of the image characteristic data;
using a preset number of RAMs of the FPGA as a first-level cache of the DDR for data multiplexing; each RAM stores a row of lateral data;
a register corresponding to the convolution kernel is configured at the rear end of the first-level cache, and the image characteristic data is output row by utilizing the register; wherein the time at which the image feature data is output once is taken as one clock cycle, and the image feature data in the last clock cycle is multiplexed from the second clock cycle.
Optionally, before the data multiplexing is performed by using a preset number of RAMs of the FPGA as the first level buffer of the DDR, the method further includes:
and determining the preset quantity according to the size of the convolution kernel, wherein the preset quantity is larger than the size of the convolution kernel.
Optionally, after the first level buffer backend is configured with a register corresponding to the convolution kernel, the method further includes:
and adding corresponding padding based on the register and the image characteristic data.
Optionally, when the image feature data is output line by using the register, before each line feed of the register, the register further includes:
resetting the value of the register and multiplexing the repeated data in the RAM.
The application also provides an extraction system of image feature data, comprising:
the data acquisition module is used for acquiring image characteristic data and determining the corresponding parallel channel number;
the data format changing module is used for taking the parallel channel number as the depth of the image characteristic data and increasing the height of the image characteristic data;
the data multiplexing module is used for multiplexing data by using a preset number of RAMs of the FPGA as a first-level cache of the DDR; each RAM stores a row of lateral data;
the data extraction module is used for configuring a register corresponding to the convolution kernel at the rear end of the first-level cache and outputting the image characteristic data line by utilizing the register; wherein the time at which the image feature data is output once is taken as one clock cycle, and the image feature data in the last clock cycle is multiplexed from the second clock cycle.
Optionally, the method further comprises:
the quantity determining module is used for determining the preset quantity according to the size of the convolution kernel, and the preset quantity is larger than the size of the convolution kernel.
Optionally, the method further comprises:
and the extraction preparation module is used for adding corresponding padding based on the register and the image characteristic data.
Optionally, the method further comprises:
and the reset module is used for resetting the value of the register before each line feed of the register when the register is utilized to output the image characteristic data line by line, and multiplexing the repeated data in the RAM.
The present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method as described above.
The application also provides an electronic device comprising a memory in which a computer program is stored and a processor which when calling the computer program in the memory implements the steps of the method as described above.
The application provides an extraction method of image characteristic data, which comprises the following steps: acquiring image characteristic data and determining the corresponding parallel channel number; taking the parallel channel number as the depth of the image characteristic data, and increasing the height of the image characteristic data; using a preset number of RAMs of the FPGA as a first-level cache of the DDR for data multiplexing; each RAM stores a row of lateral data; a register corresponding to the convolution kernel is configured at the rear end of the first-level cache, and the image characteristic data is output row by utilizing the register; wherein the time at which the image feature data is output once is taken as one clock cycle, and the image feature data in the last clock cycle is multiplexed from the second clock cycle.
According to the method, the register shift is utilized to adjust the image characteristic data format, the DDR reading and writing efficiency is improved to the greatest extent, the first-stage buffer memory of the RAM and the register shift array unit are combined, automatic packing and multiplexing of the image characteristic data are achieved, continuous data output of a production line is achieved, meanwhile, the DDR data reading pressure is reduced, and the input requirement of a rear-end high-bandwidth convolution computing unit can be met.
The application further provides an extraction system of image feature data, a computer readable storage medium and an electronic device, which have the above beneficial effects and are not described herein.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings may be obtained according to the provided drawings without inventive effort to a person skilled in the art.
FIG. 1 is a schematic diagram of a three-dimensional convolution calculation process provided herein;
fig. 2 is a flowchart of a method for extracting image feature data according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an original format of image feature data according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a rearranged format of image feature data according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a first level cache structure according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of a register output process for a first clock cycle according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a second clock cycle register output process according to an embodiment of the present application;
FIG. 8 is a schematic diagram of a register output process after line feed according to an embodiment of the present disclosure;
fig. 9 is a schematic structural diagram of an extraction system of image feature data according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
The three-dimensional convolution calculation process performed in the CNN network is as follows: referring to fig. 1, fig. 1 is a schematic diagram of a three-dimensional convolution calculation process provided in the present application, and a color image is set to be 6×6×3, where 3 refers to three color channels, and may be a stack of three 6×6 images. To detect the edges or other features of the image, it is convolved with a three-dimensional filter whose dimensions are 3 x 3, the filter also having three layers, corresponding to the three red-green and blue channels. The first 6 of the original image represents the image height, the second 6 represents the width, and 3 represents the number of channels. The filter also has a height, width and number of channels, and the number of channels of the image must be equal to the number of channels of the filter. This convolution operation would be a 4 x 1 image. The convolution kernels of the sizes 3*3, 7*7,5*5 and 1*1 are popular, and the calculation principle is similar, so the following is mainly exemplified by the application scenario of the convolution kernels of the size 3*3, but the application is compatible with the convolution kernels of the sizes 7*7,5*5 and 1*1.
Referring to fig. 2, fig. 2 is a flowchart of an image feature data extraction method provided in an embodiment of the present application, where the specific technical scheme is as follows:
s101: acquiring image characteristic data and determining the corresponding parallel channel number;
s102: taking the parallel channel number as the depth of the image characteristic data, and increasing the height of the image characteristic data;
s103: using a preset number of RAMs of the FPGA as a first-level cache of the DDR for data multiplexing;
s104: and configuring a register corresponding to the convolution kernel at the back end of the first-level cache, and outputting the image characteristic data row by utilizing the register.
Firstly, the image characteristic data needs to be acquired, and the number of parallel channels is determined, wherein the number of the parallel channels depends on the depth of the image characteristic data. In general, if the depth of image feature data in CNN calculation is an integer multiple of 64, an exponent power of 2 can be taken as the number of parallel channels. Step S102 requires converting the image feature data with larger depth into data with lower depth but higher height, so as to facilitate improving the read-write bandwidth of the DDR read image feature data.
The preset number is not limited herein, and the preset number may be determined according to the size of the convolution kernel before the data multiplexing is performed by using the preset number of RAMs of the FPGA as the first level buffer of the DDR, where the preset number is greater than the size of the convolution kernel. Since the 7*7,5*5,3*3 and 1*1 convolution kernels are typically employed, if 8 RAMs are employed as the first level caches of the DDR, it is sufficient to be compatible with the common various convolution kernels. Second, not all RAM is used, which is used to multiplex data. The width of the RAM is not limited in this embodiment, and may be set according to the image feature data and the FPGA computing resource used. Note that each RAM holds a row of lateral data. It should be noted that, DDR in this step refers to a memory using a double rate synchronous dynamic random access memory.
Thereafter, at the back end of the first level cache, registers are designed corresponding to the convolution kernels. If a convolution kernel of 3*3 is used, 9 registers are needed, and if a convolution kernel of 5*5 is used, 25 registers are needed. The register is used for outputting image characteristic data, multiplexing the data, and automatically adding corresponding padding based on the register and the image characteristic data, wherein the padding refers to a space between a frame of a defined element and the content of the element, and a register in which five 0 s in the 9 registers on the left side in fig. 6 are located is the added padding.
And then outputting image characteristic data line by using a register, so that uninterrupted data output of the pipeline can be realized.
According to the embodiment of the application, the image characteristic data format is adjusted by using the register shift, the DDR reading and writing efficiency is improved to the greatest extent, the first-stage buffer memory of the RAM and the register shift array unit are combined, automatic packing and multiplexing of the image characteristic data are realized, uninterrupted data output of a production line is realized, the data reading pressure of the DDR is reduced, and the input requirement of a rear-end high-bandwidth convolution computing unit can be met.
To better describe the above embodiments, the following exemplifies the above procedure:
referring to fig. 3, in the CNN calculation, the depth of the image feature data is an integer multiple of 64 (except that the original image is 3 layers), the width and the height are identical, and are integers of 224 at maximum, and if the depth direction is in 8 units (the input channel calculation parallelism, here, an example of the 8-channel parallel calculation), the original format of the feature data is as shown in fig. 3.
In order to furthest improve the DDR read-write bandwidth when reading the image characteristic data, the storage format of the data in the DDR needs to be rearranged, and the invention calculates all the image characteristic data of the first 8 input channels. It should be noted that, the input channel 8 may be arbitrarily adjusted, and generally, a corresponding setting is made according to the depth direction of the image feature data. The data format rearranged in DDR is shown in fig. 4.
The data corresponding to one address in the DDR may have data of n x 8 channels (determined by DDR data bit width), but n is necessarily an integer, so the data is extracted, the addresses are all arranged sequentially, and the DDR can be operated with the maximum read-write bandwidth.
In the embodiment of the present application, 3*3 is used as an example, and only the first 4 of 8 RAMs are used, as shown in fig. 5, and one more RAM is used for redundancy ping-pong for enabling data to flow. RAM is used for multiplexing the previous data. The RAM has a width of 8B and corresponds to 8 channels of data (e.g., adjustable according to the FPGA computing resources and cache resources used) and a depth of 256 (> 224). Each RAM stores a whole row of data in the W direction.
At the back end of the first level cache, 9 registers are designed to exemplify for a 3*3 convolution kernel, while 7*7 convolution kernels are similar to 5*5 convolution kernels, multiplexing and automatic padding are performed on data corresponding to one input channel. The initial value of the 9 registers is 0. To implement the functions of multiplexing data, padding=1 and outputting data, as shown in 3*3 convolution in fig. 6, the first output data is output after padding is needed to be added to the upper left corner of the image feature data, when the first 2 lines of data of the image feature data are buffered in the RAM buffer of the first stage, the data output can be started, first 2 data of the first 2 lines are output to the corresponding positions of 9 registers, and the effective data of the first 3*3 can be output by using initial values of 0 padding of 9 registers. Meanwhile, the first-level cache can continue to cache the data of the third line and the fourth line, so that data cache stream is realized.
After the first line is output, and then the second clock period is needed to realize 3*3 data on the left side of fig. 7, the 9 registers are moved to the right in the whole first, and meanwhile, the third data output by the first-level buffer is received, so that the output of the data can be realized, and the previous data can be multiplexed. The image feature data is output for one line at a time as a clock cycle, and the image feature data in the last clock cycle is multiplexed from the second clock cycle. Repeating the above steps, finishing the packing, selecting multiplexing and outputting of the first two rows of image characteristic data, then carrying out line feed, and resetting 9 register values. And multiplexing the first and second lines of data in the RAM cache.
Therefore, continuous output of data and maximum multiplexing can be ensured, and the multiplexing and packing of the whole image characteristic data and high-speed continuous output can be completed sequentially.
According to the method and the device, address skipping can be avoided when DDR reading is performed, DDR reading and writing efficiency is improved to the greatest extent, meanwhile, the first-level cache only occupies less than 3% of RAM resources of VU7 (an FPGA), meanwhile, because all needed data are multiplexed, the problem that the back-end computing capacity is not matched with DDR data output bandwidth is effectively solved. Meanwhile, as part of input channel data is calculated first, all the filter cores can be multiplexed, and the data transmission bandwidth requirement of the filter cores is reduced.
The following describes an image feature data extraction system provided in the embodiments of the present application, and the image feature data extraction system described below and the image feature data extraction method described above may be referred to correspondingly.
Referring to fig. 9, fig. 9 is a schematic structural diagram of an extraction system of image feature data provided in an embodiment of the present application, and the present application further provides an extraction system of image feature data, including:
a data acquisition module 100, configured to acquire image feature data and determine a corresponding number of parallel channels;
a data format changing module 200, configured to increase the height of the image feature data by using the parallel channel number as the depth of the image feature data;
the data multiplexing module 300 is configured to perform data multiplexing by using a preset number of RAMs of the FPGA as a first level buffer of the DDR; each RAM stores a row of lateral data;
the data extraction module 400 is configured to configure a register corresponding to the convolution kernel at the back end of the first-level buffer, and output the image feature data line by using the register; wherein the time at which the image feature data is output once is taken as one clock cycle, and the image feature data in the last clock cycle is multiplexed from the second clock cycle.
Based on the above embodiment, as a preferred embodiment, further comprising:
the quantity determining module is used for determining the preset quantity according to the size of the convolution kernel, and the preset quantity is larger than the size of the convolution kernel.
Based on the above embodiment, as a preferred embodiment, further comprising:
and the extraction preparation module is used for adding corresponding padding based on the register and the image characteristic data.
Based on the above embodiment, as a preferred embodiment, further comprising:
and the reset module is used for resetting the value of the register before each line feed of the register when the register is utilized to output the image characteristic data line by line, and multiplexing the repeated data in the RAM.
The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed, implements the steps provided by the above embodiments. The storage medium may include: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The application also provides an electronic device, which may include a memory and a processor, where the memory stores a computer program, and the processor may implement the steps provided in the foregoing embodiments when calling the computer program in the memory. Of course the electronic device may also include various network interfaces, power supplies, etc.
In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. The system provided by the embodiment is relatively simple to describe as it corresponds to the method provided by the embodiment, and the relevant points are referred to in the description of the method section.
Specific examples are set forth herein to illustrate the principles and embodiments of the present application, and the description of the examples above is only intended to assist in understanding the methods of the present application and their core ideas. It should be noted that it would be obvious to those skilled in the art that various improvements and modifications can be made to the present application without departing from the principles of the present application, and such improvements and modifications fall within the scope of the claims of the present application.
It should also be noted that in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims (6)

1. An extraction method of image feature data, characterized by comprising the following steps:
acquiring image characteristic data and determining the corresponding parallel channel number;
taking the parallel channel number as the depth of the image characteristic data, and increasing the height of the image characteristic data;
using a preset number of RAMs of the FPGA as a first-level cache of the DDR for data multiplexing; each RAM stores a row of lateral data; wherein the storage format of the data in the DDR is rearranged, and the DDR is operated with a maximum read-write bandwidth when the data is extracted from the DDR, and the addresses in the DDR are sequentially arranged;
a register corresponding to the convolution kernel is configured at the rear end of the first-level cache, and the image characteristic data is output row by utilizing the register; the time of outputting the image characteristic data once is taken as one clock period, and the image characteristic data in the last clock period is multiplexed from the second clock period;
before the data multiplexing is performed by using the preset number of RAMs of the FPGA as the first-level buffer memory of the DDR, the method further comprises the following steps:
determining the preset number according to the size of the convolution kernel, wherein the preset number is larger than the size of the convolution kernel;
when the register is utilized to output the image characteristic data row by row, the register further comprises before each line feed:
resetting the value of the register and multiplexing the repeated data in the RAM.
2. The extraction method according to claim 1, further comprising, after the first level cache back end is configured with a register corresponding to a convolution kernel:
and adding corresponding padding based on the register and the image characteristic data.
3. An extraction system of image feature data, comprising:
the data acquisition module is used for acquiring image characteristic data and determining the corresponding parallel channel number;
the data format changing module is used for taking the parallel channel number as the depth of the image characteristic data and increasing the height of the image characteristic data;
the data multiplexing module is used for multiplexing data by using a preset number of RAMs of the FPGA as a first-level cache of the DDR; each RAM stores a row of lateral data; wherein the storage format of the data in the DDR is rearranged, and the DDR is operated with a maximum read-write bandwidth when the data is extracted from the DDR, and the addresses in the DDR are sequentially arranged;
the data extraction module is used for configuring a register corresponding to the convolution kernel at the rear end of the first-level cache and outputting the image characteristic data line by utilizing the register; the time of outputting the image characteristic data once is taken as one clock period, and the image characteristic data in the last clock period is multiplexed from the second clock period;
wherein, the extraction system further includes:
the quantity determining module is used for determining the preset quantity according to the size of the convolution kernel, and the preset quantity is larger than the size of the convolution kernel;
and the reset module is used for resetting the value of the register before each line feed of the register when the register is utilized to output the image characteristic data line by line, and multiplexing the repeated data in the RAM.
4. The extraction system of claim 3, further comprising:
and the extraction preparation module is used for adding corresponding padding based on the register and the image characteristic data.
5. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, carries out the steps of the image feature data extraction method according to any one of claims 1-2.
6. An electronic device comprising a memory and a processor, the memory having a computer program stored therein, the processor, when calling the computer program in the memory, implementing the steps of the method for extracting image feature data according to any one of claims 1-2.
CN202110873716.2A 2021-07-30 2021-07-30 Image feature data extraction method, system and related device Active CN113706366B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110873716.2A CN113706366B (en) 2021-07-30 2021-07-30 Image feature data extraction method, system and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110873716.2A CN113706366B (en) 2021-07-30 2021-07-30 Image feature data extraction method, system and related device

Publications (2)

Publication Number Publication Date
CN113706366A CN113706366A (en) 2021-11-26
CN113706366B true CN113706366B (en) 2024-02-27

Family

ID=78651142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110873716.2A Active CN113706366B (en) 2021-07-30 2021-07-30 Image feature data extraction method, system and related device

Country Status (1)

Country Link
CN (1) CN113706366B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109032781A (en) * 2018-07-13 2018-12-18 重庆邮电大学 A kind of FPGA parallel system of convolutional neural networks algorithm
CN110084739A (en) * 2019-03-28 2019-08-02 东南大学 A kind of parallel acceleration system of FPGA of the picture quality enhancement algorithm based on CNN
CN110738317A (en) * 2019-10-17 2020-01-31 中国科学院上海高等研究院 FPGA-based deformable convolution network operation method, device and system
CN111008040A (en) * 2019-11-27 2020-04-14 厦门星宸科技有限公司 Cache device and cache method, computing device and computing method
CN111199273A (en) * 2019-12-31 2020-05-26 深圳云天励飞技术有限公司 Convolution calculation method, device, equipment and storage medium
CN111506343A (en) * 2020-03-05 2020-08-07 北京大学深圳研究生院 Deep learning convolution operation implementation method based on pulse array hardware architecture
CN111583095A (en) * 2020-05-22 2020-08-25 浪潮电子信息产业股份有限公司 Image data storage method, image data processing system and related device
CN112464150A (en) * 2020-11-06 2021-03-09 苏州浪潮智能科技有限公司 Method, device and medium for realizing data convolution operation based on FPGA

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190138830A1 (en) * 2015-01-09 2019-05-09 Irvine Sensors Corp. Methods and Devices for Cognitive-based Image Data Analytics in Real Time Comprising Convolutional Neural Network

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109032781A (en) * 2018-07-13 2018-12-18 重庆邮电大学 A kind of FPGA parallel system of convolutional neural networks algorithm
CN110084739A (en) * 2019-03-28 2019-08-02 东南大学 A kind of parallel acceleration system of FPGA of the picture quality enhancement algorithm based on CNN
CN110738317A (en) * 2019-10-17 2020-01-31 中国科学院上海高等研究院 FPGA-based deformable convolution network operation method, device and system
CN111008040A (en) * 2019-11-27 2020-04-14 厦门星宸科技有限公司 Cache device and cache method, computing device and computing method
CN111199273A (en) * 2019-12-31 2020-05-26 深圳云天励飞技术有限公司 Convolution calculation method, device, equipment and storage medium
CN111506343A (en) * 2020-03-05 2020-08-07 北京大学深圳研究生院 Deep learning convolution operation implementation method based on pulse array hardware architecture
CN111583095A (en) * 2020-05-22 2020-08-25 浪潮电子信息产业股份有限公司 Image data storage method, image data processing system and related device
CN112464150A (en) * 2020-11-06 2021-03-09 苏州浪潮智能科技有限公司 Method, device and medium for realizing data convolution operation based on FPGA

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种通用CNN加速器的缓冲区地址调度方法;武磊;魏子涵;张伟功;王晶;高岚;;微电子学与计算机(07);全文 *
基于FPGA的CNN加速SoC系统设计;赵烁;范军;何虎;;计算机工程与设计(04);全文 *

Also Published As

Publication number Publication date
CN113706366A (en) 2021-11-26

Similar Documents

Publication Publication Date Title
CN106875011B (en) Hardware architecture of binary weight convolution neural network accelerator and calculation flow thereof
CN111445012B (en) FPGA-based packet convolution hardware accelerator and method thereof
US20210158068A1 (en) Operation Circuit of Convolutional Neural Network
WO2018196863A1 (en) Convolution acceleration and calculation processing methods and apparatuses, electronic device and storage medium
US10769749B2 (en) Processor, information processing apparatus, and operation method of processor
US20230196500A1 (en) Image data storage method, image data processing method and system, and related apparatus
CN110852944B (en) Multi-frame self-adaptive fusion video super-resolution method based on deep learning
CN111242277A (en) Convolutional neural network accelerator supporting sparse pruning and based on FPGA design
CN110148143A (en) A method of the image segmentation based on FPGA and simultaneous display
CN110647978B (en) System and method for extracting convolution window in convolution neural network
CN109858622B (en) Data handling circuit and method for deep learning neural network
CN113706366B (en) Image feature data extraction method, system and related device
CN111931909A (en) Light-weight convolutional neural network reconfigurable deployment method based on FPGA
CN109089120B (en) Analysis-aided encoding
CN107894957B (en) Convolutional neural network-oriented memory data access and zero insertion method and device
CN111191774B (en) Simplified convolutional neural network-oriented low-cost accelerator architecture and processing method thereof
US10140681B2 (en) Caching method of graphic processing unit
CN111191780B (en) Averaging pooling accumulation circuit, device and method
CN113933111B (en) Up-sampling device and method for realizing image size amplification
US11587203B2 (en) Method for optimizing hardware structure of convolutional neural networks
CN110674934B (en) Neural network pooling layer and operation method thereof
US11362672B2 (en) Inline decompression
CN110532219B (en) FPGA-based ping-pong data storage removing method
CN113658049A (en) Image transposition method, equipment and computer readable storage medium
CN102685480B (en) A kind of video filtering method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant