CN108762719B - Parallel generalized inner product reconstruction controller - Google Patents

Parallel generalized inner product reconstruction controller Download PDF

Info

Publication number
CN108762719B
CN108762719B CN201810497969.2A CN201810497969A CN108762719B CN 108762719 B CN108762719 B CN 108762719B CN 201810497969 A CN201810497969 A CN 201810497969A CN 108762719 B CN108762719 B CN 108762719B
Authority
CN
China
Prior art keywords
address
intermediate result
inner product
bank
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810497969.2A
Other languages
Chinese (zh)
Other versions
CN108762719A (en
Inventor
李丽
祁鹏展
鲍贤亮
宋文清
李伟
何书专
潘红兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201810497969.2A priority Critical patent/CN108762719B/en
Publication of CN108762719A publication Critical patent/CN108762719A/en
Application granted granted Critical
Publication of CN108762719B publication Critical patent/CN108762719B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/06Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
    • G06F12/0646Configuration or reconfiguration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Nonlinear Science (AREA)
  • Logic Circuits (AREA)
  • Complex Calculations (AREA)

Abstract

The parallel generalized inner product reconstruction controller of the invention comprises: an intermediate result calculation module for receiving the source data and calculating an intermediate result vector based on the source data
Figure DEST_PATH_IMAGE002
Generating a vector
Figure 544857DEST_PATH_IMAGE002
Storing the address of the bank; each time it is completed
Figure DEST_PATH_IMAGE004
A completion signal is generated by the calculation of the (a), and the completion signal is sent to a final result calculation module to be used as a starting signal; the final result calculation module is used for obtaining a result matrix by feeding the read data into a complex multiply accumulator to calculate the final result
Figure DEST_PATH_IMAGE006
The L th element
Figure DEST_PATH_IMAGE008
Generating a vector
Figure 543556DEST_PATH_IMAGE008
Storing the address of the bank; and the data storage address processing module is used for selecting data according to the ping-pong operation selection signal and generating a correct bank address signal. The beneficial effects are that: the method has the advantages of short calculation time and high storage resource utilization rate, and can meet the high real-time requirement of acquiring test statistics when non-uniform detection is performed in a plurality of signal detection application scenes.

Description

Parallel generalized inner product reconstruction controller
Technical Field
The invention belongs to the technical field of non-uniform detection, and particularly relates to a parallel generalized inner product reconstruction controller.
Background
Space-time adaptive processing (STAP) is a detection technique for moving objects. In the conventional STAP algorithm, a clutter covariance matrix estimation must be performed. When the secondary data is used for estimating the clutter covariance matrix, the secondary data must meet the condition of independent same distribution to reduce the performance loss.
In practical applications, the detected signal echoes are not only contaminated by natural clutter, but also by artificial non-uniform interference, so that the independent co-distribution conditions are often not satisfied.
For an interference target in a sample, melvin first proposes the idea of a non-uniform detector (NHD) to suppress its effect on clutter covariance matrix estimation by rejecting samples containing the interference target. The basic idea of NHD is: and setting corresponding test statistics to distinguish the two samples according to the difference of the statistical properties of the sample polluted by the interference target and other samples.
Regarding NHD test statistic selection, gerlach et al, the United states naval laboratory, proposed two criteria, generalized Inner Product (GIP) and adaptive power remaining. Let X L Representing the L-th sample of the initial samples, its corresponding autocorrelation matrix is expressed as:
Figure GDA0004092931730000011
wherein T is a noise covariance matrix, let ∈ ->
Figure GDA0004092931730000012
Representing a sample covariance matrix composed of L samples, the GIP value corresponding to each sample can be expressed as: />
Figure GDA0004092931730000013
According to the GIP value corresponding to each sample, the interference target can be effectively eliminated.
The clutter suppression capability of the generalized inner product non-uniform detection method is related to the number of samples, and the larger the number of samples is, the more true the clutter covariance matrix data is, and the stronger the clutter suppression capability is. The method for detecting the generalized inner product non-uniformity on the software has the problems of low precision and overlong operation time when a large number of samples are calculated, so that the high real-time requirement of the actual non-uniformity detection technology is met.
Disclosure of Invention
The invention aims to overcome the defects in the background technology, and provides a parallel generalized inner product reconstruction controller which better meets the requirements of high real-time performance and large point calculation of practical application, and is realized by the following technical scheme:
the parallel generalized inner product reconstruction controller comprises:
an intermediate result calculation module for receiving the source data and calculating an intermediate result vector Y based on the source data L Generating a vector Y L Storing the address of the bank; per completion one intermediate result vector Y L A completion signal is generated by the calculation of the (a), and the completion signal is sent to a final result calculation module to be used as a starting signal;
the final result calculation module continuously generates columns X of matrix X through an address generator L Address of element and corresponding intermediate result vector Y L The address of the element, the read data enter a complex multiply accumulator to obtain a result matrix Z 1xN The L-th element Z L Generating a vector Z L Storing the address of the bank;
and the data storage address processing module is used for selecting data according to the ping-pong operation selection signal, and processing signals aiming at the same bank from the intermediate result calculation module and the final result calculation module to generate a correct bank address signal.
Further hardware implementation method of parallel generalized inner product operationThe design is that, calculate Y L The process of (1) is X L And a square matrix T, wherein the number of rows and columns of the square matrix T is equal to the number of columns of the matrix X, and each column is multiplied and accumulated by multiple paths of parallel calculation.
The hardware implementation method of the parallel generalized inner product operation is further designed in that the intermediate result calculation module is implemented by adopting a four-way parallel implementation mode.
The hardware implementation method of the parallel generalized inner product operation is further designed in that the source data storage mode of the intermediate result calculation module is as follows: the matrix T is stored in a bank0-bank3 according to the columns, and is continuously stored in a bank4-bank7 according to the columns after being full; the matrix X is stored in a column in bank8-bank 11.
The hardware implementation method of the parallel generalized inner product operation is further designed in that an intermediate result storage mode of the intermediate result calculation module is as follows: odd items are deposited in bank12 and even items are deposited in bank 13.
The hardware implementation method of the parallel generalized inner product operation is further designed in that the flow of intermediate result calculation performed by the intermediate result calculation module is as follows: in one operation, first the address generator generates a list of elements X of X L And four columns of T matrix element addresses, simultaneously carrying corresponding matrix element data, and inputting the matrix element data into a complex multiplication accumulator to obtain an intermediate result Y L The method comprises the steps of carrying out a first treatment on the surface of the Then the address generator generates an intermediate result storage address, and stores the intermediate result in the bank.
The hardware implementation method of the parallel generalized inner product operation is further designed in that the flow of the final result calculation performed by the final result calculation module is as follows: when the final result calculation module obtains the intermediate result calculation completion signal, the address generator continuously generates the column X of the matrix X L Address of element and corresponding intermediate result vector Y L An address of the element; simultaneously input to a complex multiply accumulator to obtain a final result Z L The final result storage address is generated by the address generator and the final result is stored in the bank.
The hardware implementation method of the parallel generalized inner product operation is further designed in that the complex multipliers are all pipelined single-precision floating point operation units delayed by 4 clock cycles, and the access delay of the complex multipliers is set to be 6 cycles.
The hardware implementation method of the parallel generalized inner product operation is further designed in that the number of the complex multiply accumulators is five, wherein four of the complex multiply accumulators are used for four-way parallel calculation of intermediate results, and the other complex multiply accumulators are used for synchronous calculation of final results.
The hardware implementation method of the parallel generalized inner product operation is further designed in that each complex multiply accumulator consists of a complex multiplier and three complex adders, and the DC synthesized area under the 40nm CMOS process is 19993.56 mu m 2
THE ADVANTAGES OF THE PRESENT INVENTION
The parallel generalized inner product reconstruction controller provided by the invention calculates Z by adopting a strategy of calculating an intermediate result and then immediately calculating a final result element L-1 Can be hidden from the time of computing Y L The calculation time is short and the utilization rate of storage resources is high. The parallel generalized inner product reconstruction controller can meet the high real-time requirement of acquiring test statistics when non-uniform detection is performed in a plurality of signal detection application scenes.
Drawings
FIG. 1 is a schematic diagram of an architecture of a parallel generalized inner product reconstruction controller.
FIG. 2 is a schematic diagram of parallel generalized inner product data storage.
FIG. 3 is a schematic diagram of a parallel generalized inner product algorithm calculation flow.
Detailed Description
The invention is described in detail below with reference to the drawings and specific embodiments.
As shown in fig. 1, the parallel generalized inner product reconstruction controller of the present embodiment is configured by four ways of parallel operation, and mainly comprises three sub-modules, which are respectively: the device comprises an intermediate result calculation module, a final result calculation module and a data storage address processing module. The intermediate result calculation module is used for calculating an intermediate result; the final result calculation module calculates a final result; the data storage address processing module processes related signals such as bank addresses and the like.
Intermediate result calculating module for calculating intermediate result vector Y in complete pipeline L Includes generating X L Column element address, pair X L A row of elements and a square matrix T MxM Each row performs inner product multiply-accumulate operation to obtain an intermediate result vector Y L Generating a vector Y L Is stored in bank. Every time finish one Y L Giving a completion signal to the final result calculation module as a start signal for its one calculation.
The final result calculation module continuously generates columns X of matrix X through an address generator L Address of element and corresponding intermediate result vector Y L The address of the element, the read data enter a complex multiply accumulator to obtain a result matrix Z 1xN The L-th element Z L Generating a vector Z L Is stored in bank.
And the data storage address processing module is used for selecting data according to the ping-pong operation selection signal, processing signals aiming at the same bank from the intermediate result calculation module and the final result calculation module, and generating signals such as correct bank addresses.
As shown in FIG. 1, the memory unit comprises 15 banks, wherein a matrix T is stored in banks 0-7, a matrix X is stored in banks 8-11, and an intermediate result Y L Is stored in a bank12 and a bank13, and the final parallel generalized inner product matrix is stored in the bank 14. The arithmetic unit comprises 5 complex multiply accumulators, the complex multiply accumulators 0-3 are used for four-way parallel calculation of intermediate results, and the complex multiply accumulators 4 are used for simultaneous calculation of final results.
A schematic diagram of parallel generalized inner product data storage is shown in fig. 2. The source data storage mode is as follows: the matrix T is stored in a bank0-bank3 according to the columns, and is continuously stored in a bank4-bank7 according to the columns after being full; the matrix X is stored in a column in bank8-bank 11. So as to store and calculate the intermediate result Y L 4 paths of parallel operation are performed, and the design of a corresponding DMA module can be simplified; intermediate result Y L ,Y 1 、Y 3 … and other odd items are deposited into bank12 (the latter covering the former), Y 2 、Y 4 …, etc. are stored in the bank13 (the latter covers the former). Final generalized senseThe inner product matrix is stored in bank 14.
As shown in fig. 3, the flow of intermediate result calculation by the parallel generalized inner product algorithm is as follows: in one operation, first the address generator 1 generates a column of elements Y of X L And four columns of T matrix element addresses, simultaneously carrying corresponding matrix element data, and inputting the matrix element data into a complex multiplication accumulator to obtain an intermediate result Y L Then, an intermediate result storage address is generated by the address generator 2, and the intermediate result is stored in the bank.
Similarly, the flow of the final result calculation by the parallel generalized inner product algorithm is as follows: in one operation, when the module obtains the intermediate result calculation completion signal, the address generator 1 continuously generates the column X of the matrix X L Address of element, and corresponding intermediate result vector Y L The address of the element. Simultaneously input to a complex multiply accumulator to obtain a final result Z L The final result storage address is then generated by the address generator 2 and the final result is stored in the bank.
The hardware implementation of the parallel generalized inner product algorithm of the invention comprises the following steps:
step 1) setting l=1, starting from the first column of matrix X;
step 2) calculating an intermediate result Y L
Calculating intermediate result Y L The method comprises the following steps:
step 2-1) sequentially taking X according to the address generated by the address generator sub-module L Sum (T) 1 T 2 T 3 T 4 ) The elements are sent to a multiply-accumulate sub-module for complex multiply-accumulate operation to obtain (Y L1 Y L2 Y L3 Y L4 );
Step 2-2) will (Y) according to the address generated by the address generator sub-module L1 Y L2 Y L3 Y L4 ) Sequentially writing into intermediate result bank while taking down a group of 4-column T matrix elements and X L Repeating 1) and 2) until Y is completed L Is calculated;
step 3) calculating the final result Z L . With 1), 2) if Y has been generated L-1 According to the groundThe addresses generated by the address generator are sequentially taken as X L-1 And Y L-1 Is subjected to complex multiply-accumulate to obtain Z L-1 Writing the final result into a final result bank according to the address generated by the address generator;
step 4) if L < N, l=l+1, jumping to step two;
step 5) taking X sequentially N And Y N Is subjected to complex multiply-accumulate to obtain Z N And (5) storing the result in a bank to finish the inner product operation.
The complex multipliers used in the parallel generalized inner product reconstruction controller of the embodiment are complex adders which are all running single-precision floating point operation units delayed by 4 clock cycles, access delay is 6 cycles, EDA simulation/synthesis tools are adopted, and the working main frequency reaches 1GHz.
The parallel generalized inner product reconstruction controller of this embodiment totals five complex multiply accumulators, four of which are used to compute intermediate results in four-way parallel and the other of which is used to compute final results in synchronization. Each complex multiply-accumulator consists of a complex multiplier and three complex adders, and the DC integrated area under 40nm CMOS process is 19993.56 μm 2
The parallel generalized inner product reconstruction controller of this embodiment calculates Z by employing a strategy that calculates an intermediate result and then immediately calculates a final result element L-1 Can be hidden from the time of computing Y L Compared with the method for calculating the final result in parallel after calculating the complete intermediate result, the method has the advantages of less calculation time and high storage resource utilization rate.
The parallel generalized inner product reconstruction controller of the embodiment has the characteristics of high calculation speed, flexible and variable points and high utilization rate of storage resources. The method can meet the high real-time requirement of acquiring test statistics when non-uniform detection is performed in digital signal processing with large data volume, such as instant signal detection application scenes.
The present invention is not limited to the above-mentioned embodiments, and any changes or modifications within the technical scope of the present invention will be apparent to those skilled in the art. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (10)

1. The utility model provides a parallel generalized inner product reconstruction controller which characterized in that: comprising the following steps:
an intermediate result calculation module for receiving the source data and calculating an intermediate result vector Y based on the source data L Generating a vector Y L Storing the address of the bank; per completion one intermediate result vector Y L A completion signal is generated by the calculation of the (a), and the completion signal is sent to a final result calculation module to be used as a starting signal;
the final result calculation module continuously generates columns X of matrix X through an address generator L Address of element and corresponding intermediate result vector Y L The address of the element, the read data enter a complex multiply accumulator to calculate the final result to obtain a result matrix Z 1xN The L-th element Z L Generating a vector Z L Storing the address of the bank;
and the data storage address processing module is used for selecting data according to the ping-pong operation selection signal, and processing signals aiming at the same bank from the intermediate result calculation module and the final result calculation module to generate a correct bank address signal.
2. The parallel generalized inner product reconstruction controller according to claim 1, wherein: calculation of Y L The process of (1) is X L And a square matrix T, wherein the number of rows and columns of the square matrix T is equal to the number of columns of the matrix X, and each column is multiplied and accumulated by multiple paths of parallel calculation.
3. The parallel generalized inner product reconstruction controller according to claim 2, wherein: the intermediate result calculation module is realized by adopting a four-way parallel implementation mode.
4. A parallel generalized inner product reconstruction controller according to claim 3, characterized in that: the source data storage mode of the intermediate result calculation module is as follows: the matrix T is stored in a bank O-bank3 according to the columns, and is continuously stored in a bank4-bank7 according to the columns after being full; the matrix X is stored in a column in bank8-bank 11.
5. A parallel generalized inner product reconstruction controller according to claim 3, characterized in that: the intermediate result storage mode of the intermediate result calculation module is as follows: odd items are deposited in bank12 and even items are deposited in bank 13.
6. The parallel generalized inner product reconstruction controller according to claim 1, wherein: the intermediate result calculation module performs the process of intermediate result calculation: in one operation, first the address generator generates a list of elements X of X L And four columns of T matrix element addresses, simultaneously carrying corresponding matrix element data, and inputting the matrix element data into a complex multiplication accumulator to obtain an intermediate result Y L The method comprises the steps of carrying out a first treatment on the surface of the Then the address generator generates an intermediate result storage address, and stores the intermediate result in the bank.
7. The parallel generalized inner product reconstruction controller according to claim 1, wherein: the final result calculation module performs the following final result calculation process: when the final result calculation module obtains the intermediate result calculation completion signal, the address generator continuously generates the column X of the matrix X L Address of element and corresponding intermediate result vector Y L An address of the element; simultaneously input to a complex multiply accumulator to obtain a final result Z L The final result storage address is generated by the address generator and the final result is stored in the bank.
8. The parallel generalized inner product reconstruction controller according to claim 1, wherein: the complex multiply accumulator is a pipeline single-precision floating point operation unit delayed by 4 clock cycles, and the memory access delay of the complex multiply accumulator is set to be 6 cycles.
9. The parallel generalized inner product reconstruction controller according to claim 1, wherein: the complex multiply accumulator is five, four of which are used for four-way parallel computing intermediate results, and the other is used for synchronous computing final results.
10. The parallel generalized inner product reconstruction controller according to claim 1, wherein: each complex multiply-accumulator consists of a complex multiplier and three complex adders, and the DC integrated area under 40nm CMOS process is 19993.56 μm 2
CN201810497969.2A 2018-05-21 2018-05-21 Parallel generalized inner product reconstruction controller Active CN108762719B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810497969.2A CN108762719B (en) 2018-05-21 2018-05-21 Parallel generalized inner product reconstruction controller

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810497969.2A CN108762719B (en) 2018-05-21 2018-05-21 Parallel generalized inner product reconstruction controller

Publications (2)

Publication Number Publication Date
CN108762719A CN108762719A (en) 2018-11-06
CN108762719B true CN108762719B (en) 2023-06-06

Family

ID=64004919

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810497969.2A Active CN108762719B (en) 2018-05-21 2018-05-21 Parallel generalized inner product reconstruction controller

Country Status (1)

Country Link
CN (1) CN108762719B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111045965B (en) * 2019-10-25 2021-06-04 南京大学 Hardware implementation method for multi-channel conflict-free splitting, computer equipment and readable storage medium for operating method
CN110796193A (en) * 2019-10-29 2020-02-14 南京宁麒智能计算芯片研究院有限公司 Reconfigurable KNN algorithm-based hardware implementation system and method
CN110795687A (en) * 2019-10-29 2020-02-14 南京宁麒智能计算芯片研究院有限公司 Hierarchical segmentation system and method for autocorrelation algorithm

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5276902A (en) * 1988-11-07 1994-01-04 Fujitsu Limited Memory access system for vector data processed or to be processed by a vector processor
CN104794002A (en) * 2014-12-29 2015-07-22 南京大学 Multi-channel parallel dividing method based on specific resources and hardware architecture of multi-channel parallel dividing method based on specific resources
CN106855618A (en) * 2017-03-06 2017-06-16 西安电子科技大学 Based on the interference sample elimination method under broad sense inner product General Cell
CN106940815A (en) * 2017-02-13 2017-07-11 西安交通大学 A kind of programmable convolutional neural networks Crypto Coprocessor IP Core

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5276902A (en) * 1988-11-07 1994-01-04 Fujitsu Limited Memory access system for vector data processed or to be processed by a vector processor
CN104794002A (en) * 2014-12-29 2015-07-22 南京大学 Multi-channel parallel dividing method based on specific resources and hardware architecture of multi-channel parallel dividing method based on specific resources
CN106940815A (en) * 2017-02-13 2017-07-11 西安交通大学 A kind of programmable convolutional neural networks Crypto Coprocessor IP Core
CN106855618A (en) * 2017-03-06 2017-06-16 西安电子科技大学 Based on the interference sample elimination method under broad sense inner product General Cell

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
二维高精度MUSIC算法的高速实现;张多利等;《合肥工业大学学报(自然科学版)》;20180328(第03期);全文 *

Also Published As

Publication number Publication date
CN108762719A (en) 2018-11-06

Similar Documents

Publication Publication Date Title
CN108762719B (en) Parallel generalized inner product reconstruction controller
CN105955706B (en) A kind of divider and division operation method
CN111684473A (en) Improving performance of neural network arrays
US20120078988A1 (en) Modified gram-schmidt core implemented in a single field programmable gate array architecture
CN102680945B (en) Doppler modulation frequency estimation method based on field programmable gate array (FPGA)
US20200026746A1 (en) Matrix and Vector Multiplication Operation Method and Apparatus
US10402196B2 (en) Multi-dimensional sliding window operation for a vector processor, including dividing a filter into a plurality of patterns for selecting data elements from a plurality of input registers and performing calculations in parallel using groups of the data elements and coefficients
WO2018027706A1 (en) Fft processor and algorithm
CN111812632B (en) FPGA-based two-dimensional ordered statistics constant false alarm detector implementation method
CN108710505A (en) A kind of expansible Sparse Matrix-Vector based on FPGA multiplies processor
CN116710912A (en) Matrix multiplier and control method thereof
US10949493B2 (en) Multi-functional computing apparatus and fast fourier transform computing apparatus
CN106775579B (en) Floating-point operation accelerator module based on configurable technology
JP7435602B2 (en) Computing equipment and computing systems
CN113592075A (en) Convolution operation device, method and chip
JP2021531572A (en) Performing successive MAC operations on a set of data using different kernels in the MAC circuit
CN113890508A (en) Hardware implementation method and hardware system for batch processing FIR algorithm
CN104574409A (en) Method and device for detecting target from image
CN104598199B (en) The data processing method and system of a kind of Montgomery modular multipliers for smart card
CN111008697B (en) Convolutional neural network accelerator implementation architecture
Sotiropoulos et al. A fast parallel matrix multiplication reconfigurable unit utilized in face recognitions systems
Kalbasi et al. A classified and comparative study of 2-D convolvers
Ehsan et al. Novel hardware algorithms for row-parallel integral image calculation
Javadi et al. An area-efficient hardware implementation for real-time window-based image filtering
CN107193784A (en) The sinc interpolation realization method and systems of the low hardware complexity of high accuracy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant