CN108509382B - Method for realizing quick convolution operation of super-long sequence based on FPGA - Google Patents
Method for realizing quick convolution operation of super-long sequence based on FPGA Download PDFInfo
- Publication number
- CN108509382B CN108509382B CN201810276062.3A CN201810276062A CN108509382B CN 108509382 B CN108509382 B CN 108509382B CN 201810276062 A CN201810276062 A CN 201810276062A CN 108509382 B CN108509382 B CN 108509382B
- Authority
- CN
- China
- Prior art keywords
- burst
- fpga
- data
- convolution operation
- fast convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/15—Correlation function computation including computation of convolution operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7803—System on board, i.e. computer system on one or more PCB, e.g. motherboards, daughterboards or blades
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention discloses a method for realizing ultra-long sequence fast convolution operation based on an FPGA (field programmable gate array). The method comprises the steps of firstly collecting two paths of signals through an AD sampling module, then respectively storing the collected two paths of data into two random access memories (SRAMs) outside an FPGA in sequence, then taking out the data from the two SRAMs in a burst length in a reverse sequence to carry out fast convolution operation, and simultaneously storing a large amount of data obtained through operation into a DDR2 outside the FPGA, thereby finally realizing the fast convolution operation of the two paths of signals based on the FPGA.
Description
[ technical field ] A method for producing a semiconductor device
The invention belongs to the field of high-speed real-time digital signal processing in the space positioning problem, and is used for carrying out high-speed and large-depth sampling and convolution operation on two paths of signals so as to distinguish the tiny phase difference of the two paths of signals. The AD sampling module adopts a dual-channel 12-bit AD acquisition module AN926, and the FPGA adopts a Cyclone IV series of Altera corporation, wherein the model is Cyclone IV EP4CE115F29C 7N. The two pieces of SRAM for storing data are read in a burst mode, burst lengths are accumulated in sequence, the operation speed is greatly improved, and finally, the fast convolution operation of two paths of signals is achieved in a short time.
[ background of the invention ]
In the spatial localization problem, the phase difference of two signals is an important parameter, and spatial accurate localization can be realized through the phase difference, and convolution operation is an important method for finding the phase difference.
When the phase difference of two paths of signals is extremely small, such as optical signals, the resolution of the phase difference of the two paths of signals can be realized only by performing convolution after high sampling rate and large-depth sampling. At present, convolution operation on two paths of discrete digital signals is mainly realized by software, the speed of the software for executing ultra-long sequence convolution is low, the requirement on the real-time performance of high-speed signal processing cannot be met, and the characteristics of rich hardware resources and parallel operation can be fully utilized by adopting the FPGA, so that the operation speed is greatly improved, and the real-time performance of space positioning is realized.
[ summary of the invention ]
The method utilizes FPGA to realize the fast convolution operation of two paths of signals, and the technical scheme comprises the following aspects:
AD Module data acquisition
The two paths of analog signals are sampled by a dual-channel 12-bit AD acquisition module AN926, the sampling frequency is 50Mhz, the sampling time is 5.243ms, and two groups of 262144-bit data are obtained after the sampling is finished.
2. Data write-in FPGA off-chip SRAM
The external static random access memory adopts two SRAM with model number IS61LV25616AL, and IS connected to the FPGA through GPIO port, and the storage resource of the SRAM IS 256Kx 16bit, so that the storage requirement IS met.
Because the data line of the SRAM is 16bit, and the data collected by the AD which needs to be stored is only 12 bits, D12, D13, D14 and D15 of the SRAM are directly grounded on the external hardware design.
The data writing into the external SRAM is realized by controlling a three-state bus through control logic inside the FPGA. Firstly, when data acquired by AD needs to be written into SRAM, FPGA controls AD module to obtain bus use right, and when writing is finished, FPGA controls AD module to release bus, and bus control right is handed to fast convolution module. 3. Reading data from SRAM
In order to fit the fast convolution method, the method of reading data from the SRAM is a critical part of the method. The external SRAM is read in a burst mode, the burst length is started from 256 bits, and the read length is increased by 256 bits every time the burst is read. Reading the two pieces of SRAM by adopting a completely reverse sequence, namely the reading sequence of the SRAM _ A in the first burst is 0, 1, 2, 3 … 255; the read order of SRAM _ B is 255, 254, 253 … 2, 1, 0. The reading sequence of the SRAM _ A in the second burst is 0, 1, 2, 3 … 511; the read order of SRAM _ B is 511, 510, 509 … 2, 1, 0.
4. Fast convolution operation
The data read from the SRAM is sent to a fast convolution module, when the data is burst for the first time, the burst length is set to be 256, the data are sequentially and parallelly input into 256 multipliers, the multiplication results are latched by latches, each latch is sequentially and independently counted, when all the latches are full of 256 counts, namely the result is output at the same time when the first burst is finished, and the 256 convolution operation results calculated by the first burst can be obtained. And immediately entering a second burst with the burst length of 512, still sending the read data to a fast convolution module, and outputting the result when all the latch counts are full of 512. In this way, a total of 1024 bursts can be passed to obtain the full convolution result.
5. Storing operation result into FPGA external DDR2
The DRR2 SDRAM has the characteristics of large capacity and high read-write speed. The read-write mode of DDR2 is burst mode. After the fast convolution module finishes one burst read operation and outputs the result, the DDR2 enters write burst with the length of 256, and the results are written into the DDR2 in sequence according to the sequence of the latches. The DDR2 will write and release the bus before the next read burst begins.
[ advantages and advantageous effects of the invention ]
The invention utilizes the characteristics of abundant hardware resources and parallel computation of the FPGA, realizes the fast convolution operation of the overlong sequence by using a hardware circuit, and ensures the real-time property of high-speed signal processing on the premise of meeting the measurement precision. If the conventional software convolution method is adopted to read data from the memory bit by bit and multiply and accumulate the data, when the convolution is completed, 1+2+3+4+ … 262143+262144 length of two sequences needs to be consumed, namely, 3.436 × 1010One clock cycle, when the clock is 100Mhz, the consumed time is about 5min43 s; the method only needs to consume 256+512+768+ … +261888+262144 ═ 1.3435 × 108In each clock period, when the clock is 100Mhz, the consumed time is about 1.34s, the operation speed is improved by 255 times, and the operation speed is greatly improved.
[ description of the drawings ]
FIG. 1 is a schematic block diagram of a system;
FIG. 2 is a functional block diagram of a fast convolution module;
FIG. 3 is a first sinusoidal signal;
FIG. 4 is a second path of sinusoidal signals;
FIG. 5 is a diagram showing a small phase difference between two signals;
FIG. 6 is the result of a fast convolution operation;
fig. 7 is a flow chart of a fast convolution method.
[ detailed description ] embodiments
In order to more clearly illustrate the embodiments of the present invention, the present invention will be further described below with reference to the following drawings.
As shown in fig. 3-5, convolution operation is performed on two sinusoidal signals with a small phase difference, the lengths of the two sinusoidal signals after being collected by the AD module are 262144, and the two sinusoidal signals are sequentially stored in an SRAM outside the FPGA. Next, 1024 times of burst read operations are performed in sequence, the read length is increased by 256 in sequence from 256, and the data obtained by reversely reading the two pieces of SRAM is sent to the fast convolution module, as shown in fig. 2. And when the burst is ended, 256 convolution results are obtained at the same time and are sequentially stored in DDR2 outside the FPGA. After 1024 bursts, the fast convolution operation is finished, and the convolution result is shown in fig. 6, it can be seen that although the phase difference of the two paths of signals is extremely small, the peak value can still be distinguished at the position which is slightly left in the middle.
Claims (1)
1.A method for realizing ultra-long sequence fast convolution operation based on FPGA is characterized in that: storing two paths of data acquired by the AD module in two independent SRAMs outside the FPGA through a three-state bus, then respectively reading the two SRAMs in reverse order within a burst length, sending the read data into a fast convolution module, and simultaneously storing an operation result in a DDR2 outside the FPGA; the method comprises the following steps of respectively reading two pieces of SRAM in a reverse order within a burst length, specifically, reading the two pieces of SRAM in a completely reverse order, wherein the burst length starts from 256 bits, and the read length is increased by 256 bits each time of reading burst; the read data are sent to a fast convolution module, specifically, during the first burst, the burst length is set to be 256, the data are sequentially input into 256 multipliers in parallel, multiplication results are latched by latches, each latch counts independently in sequence, when all the latches are full of 256 counts, namely the first burst is ended, the results are output simultaneously, the 256 convolution operation results calculated by the first burst can be obtained, then the data immediately enter the second burst, the burst length is 512, and in this way, all the convolution results can be obtained after 1024 bursts.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810276062.3A CN108509382B (en) | 2018-03-27 | 2018-03-27 | Method for realizing quick convolution operation of super-long sequence based on FPGA |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810276062.3A CN108509382B (en) | 2018-03-27 | 2018-03-27 | Method for realizing quick convolution operation of super-long sequence based on FPGA |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108509382A CN108509382A (en) | 2018-09-07 |
CN108509382B true CN108509382B (en) | 2022-06-07 |
Family
ID=63377951
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810276062.3A Active CN108509382B (en) | 2018-03-27 | 2018-03-27 | Method for realizing quick convolution operation of super-long sequence based on FPGA |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108509382B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101652757A (en) * | 2007-04-13 | 2010-02-17 | Rgb网络有限公司 | Sdram convolutional interleaver with two paths |
US8799750B1 (en) * | 2011-05-09 | 2014-08-05 | Xilinx, Inc. | Convolutional interleaver for bursty memory access |
CN104461934A (en) * | 2014-11-07 | 2015-03-25 | 北京海尔集成电路设计有限公司 | Time-domain deconvolution interweaving device and method suitable for DDR memorizer |
CN105282083A (en) * | 2015-11-03 | 2016-01-27 | 西安烽火电子科技有限责任公司 | Burst-mode broadband data processing device and method based on FPGA chip |
CN107403221A (en) * | 2016-05-03 | 2017-11-28 | 想象技术有限公司 | The hardware of convolutional neural networks is realized |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1089440A1 (en) * | 1999-09-28 | 2001-04-04 | TELEFONAKTIEBOLAGET L M ERICSSON (publ) | Interleaver and method for interleaving a data bit sequence |
CN1148881C (en) * | 2002-01-30 | 2004-05-05 | 信息产业部电信传输研究所 | Realising method for parallel cascade convolution code hardware decoder |
US10572824B2 (en) * | 2003-05-23 | 2020-02-25 | Ip Reservoir, Llc | System and method for low latency multi-functional pipeline with correlation logic and selectively activated/deactivated pipelined data processing engines |
US20060236045A1 (en) * | 2005-04-13 | 2006-10-19 | Analog Devices, Inc. | Apparatus for deinterleaving interleaved data using direct memory access |
US7743287B2 (en) * | 2006-10-18 | 2010-06-22 | Trellisware Technologies, Inc. | Using SAM in error correcting code encoder and decoder implementations |
CN101237240B (en) * | 2008-02-26 | 2011-07-20 | 北京海尔集成电路设计有限公司 | A method and device for realizing cirrocumulus interweaving/de-interweaving based on external memory |
US11074492B2 (en) * | 2015-10-07 | 2021-07-27 | Altera Corporation | Method and apparatus for performing different types of convolution operations with the same processing elements |
-
2018
- 2018-03-27 CN CN201810276062.3A patent/CN108509382B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101652757A (en) * | 2007-04-13 | 2010-02-17 | Rgb网络有限公司 | Sdram convolutional interleaver with two paths |
US8799750B1 (en) * | 2011-05-09 | 2014-08-05 | Xilinx, Inc. | Convolutional interleaver for bursty memory access |
CN104461934A (en) * | 2014-11-07 | 2015-03-25 | 北京海尔集成电路设计有限公司 | Time-domain deconvolution interweaving device and method suitable for DDR memorizer |
CN105282083A (en) * | 2015-11-03 | 2016-01-27 | 西安烽火电子科技有限责任公司 | Burst-mode broadband data processing device and method based on FPGA chip |
CN107403221A (en) * | 2016-05-03 | 2017-11-28 | 想象技术有限公司 | The hardware of convolutional neural networks is realized |
Non-Patent Citations (1)
Title |
---|
FPGA在实时SAR成像系统中的应用;赵博;《中国优秀硕士学位论文全文数据库信息科技辑》;20101105(第11期);摘要,正文第37-38页4.2节 * |
Also Published As
Publication number | Publication date |
---|---|
CN108509382A (en) | 2018-09-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110364203B (en) | Storage system supporting internal calculation of storage and calculation method | |
CN101231877B (en) | N-port memory and method for accessing n-port memory M memory address | |
KR100945968B1 (en) | A semiconductor memory | |
CN104077492A (en) | Sample data interpolation method based on FPGA | |
CN209842608U (en) | DDR3 memory control based on FPGA FIFO module | |
CN108038068A (en) | One kind reads method of data synchronization and system based on DDR | |
CN111694029A (en) | Hardware implementation method for generating B1C signal pseudo-random noise code | |
CN103678729A (en) | High-speed A/D sampling data real-time storage method achieved based on FPGA | |
CN108509382B (en) | Method for realizing quick convolution operation of super-long sequence based on FPGA | |
CN103592489A (en) | Method for designing deep storage of digital oscilloscope | |
CN103309981A (en) | ADC (analog-to-digital converter) data organization system with high storage efficiency and ADC data organization method | |
CN103794244B (en) | A kind of phase transition storage reading circuit based on SPI interface and method | |
CN106158012A (en) | Sequential processing method, on-chip SRAM and the FPGA of FPGA on-chip SRAM | |
CN103065672B (en) | A kind of asynchronous static RAM based on synchronized SRAM IP | |
CN112711393B (en) | Real-time multichannel accumulation method based on FPGA | |
CN102231140B (en) | Method for obtaining data envelopments based on double-port random access memory (DPRAM) | |
CN108665937B (en) | Storage component testing method and device | |
US7684257B1 (en) | Area efficient and fast static random access memory circuit and method | |
CN105528305B (en) | A kind of short cycle storage method based on DDR2 SDRAM | |
CN114758686A (en) | Position determination method, reading method, starting method, device, equipment and medium | |
JP5499131B2 (en) | Dual port memory and method thereof | |
JP2841456B2 (en) | Data transfer method and data buffer device | |
CN113555051B (en) | SAR imaging data transposition processing system based on DDR SDRAM | |
US4796225A (en) | Programmable dynamic shift register with variable shift control | |
CN202661997U (en) | High-speed dual-channel signal acquisition and cache circuit with synchronization performance |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |