CN108762719B - Parallel generalized inner product reconstruction controller - Google Patents
Parallel generalized inner product reconstruction controller Download PDFInfo
- Publication number
- CN108762719B CN108762719B CN201810497969.2A CN201810497969A CN108762719B CN 108762719 B CN108762719 B CN 108762719B CN 201810497969 A CN201810497969 A CN 201810497969A CN 108762719 B CN108762719 B CN 108762719B
- Authority
- CN
- China
- Prior art keywords
- address
- intermediate result
- inner product
- bank
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/06—Addressing a physical block of locations, e.g. base addressing, module addressing, memory dedication
- G06F12/0646—Configuration or reconfiguration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7867—Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/483—Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/02—Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Nonlinear Science (AREA)
- Logic Circuits (AREA)
- Complex Calculations (AREA)
Abstract
The parallel generalized inner product reconstruction controller of the invention comprises: an intermediate result calculation module for receiving the source data and calculating an intermediate result vector based on the source dataGenerating a vectorStoring the address of the bank; each time it is completedA completion signal is generated by the calculation of the (a), and the completion signal is sent to a final result calculation module to be used as a starting signal; the final result calculation module is used for obtaining a result matrix by feeding the read data into a complex multiply accumulator to calculate the final resultThe L th elementGenerating a vectorStoring the address of the bank; and the data storage address processing module is used for selecting data according to the ping-pong operation selection signal and generating a correct bank address signal. The beneficial effects are that: the method has the advantages of short calculation time and high storage resource utilization rate, and can meet the high real-time requirement of acquiring test statistics when non-uniform detection is performed in a plurality of signal detection application scenes.
Description
Technical Field
The invention belongs to the technical field of non-uniform detection, and particularly relates to a parallel generalized inner product reconstruction controller.
Background
Space-time adaptive processing (STAP) is a detection technique for moving objects. In the conventional STAP algorithm, a clutter covariance matrix estimation must be performed. When the secondary data is used for estimating the clutter covariance matrix, the secondary data must meet the condition of independent same distribution to reduce the performance loss.
In practical applications, the detected signal echoes are not only contaminated by natural clutter, but also by artificial non-uniform interference, so that the independent co-distribution conditions are often not satisfied.
For an interference target in a sample, melvin first proposes the idea of a non-uniform detector (NHD) to suppress its effect on clutter covariance matrix estimation by rejecting samples containing the interference target. The basic idea of NHD is: and setting corresponding test statistics to distinguish the two samples according to the difference of the statistical properties of the sample polluted by the interference target and other samples.
Regarding NHD test statistic selection, gerlach et al, the United states naval laboratory, proposed two criteria, generalized Inner Product (GIP) and adaptive power remaining. Let X L Representing the L-th sample of the initial samples, its corresponding autocorrelation matrix is expressed as:wherein T is a noise covariance matrix, let ∈ ->Representing a sample covariance matrix composed of L samples, the GIP value corresponding to each sample can be expressed as: />According to the GIP value corresponding to each sample, the interference target can be effectively eliminated.
The clutter suppression capability of the generalized inner product non-uniform detection method is related to the number of samples, and the larger the number of samples is, the more true the clutter covariance matrix data is, and the stronger the clutter suppression capability is. The method for detecting the generalized inner product non-uniformity on the software has the problems of low precision and overlong operation time when a large number of samples are calculated, so that the high real-time requirement of the actual non-uniformity detection technology is met.
Disclosure of Invention
The invention aims to overcome the defects in the background technology, and provides a parallel generalized inner product reconstruction controller which better meets the requirements of high real-time performance and large point calculation of practical application, and is realized by the following technical scheme:
the parallel generalized inner product reconstruction controller comprises:
an intermediate result calculation module for receiving the source data and calculating an intermediate result vector Y based on the source data L Generating a vector Y L Storing the address of the bank; per completion one intermediate result vector Y L A completion signal is generated by the calculation of the (a), and the completion signal is sent to a final result calculation module to be used as a starting signal;
the final result calculation module continuously generates columns X of matrix X through an address generator L Address of element and corresponding intermediate result vector Y L The address of the element, the read data enter a complex multiply accumulator to obtain a result matrix Z 1xN The L-th element Z L Generating a vector Z L Storing the address of the bank;
and the data storage address processing module is used for selecting data according to the ping-pong operation selection signal, and processing signals aiming at the same bank from the intermediate result calculation module and the final result calculation module to generate a correct bank address signal.
Further hardware implementation method of parallel generalized inner product operationThe design is that, calculate Y L The process of (1) is X L And a square matrix T, wherein the number of rows and columns of the square matrix T is equal to the number of columns of the matrix X, and each column is multiplied and accumulated by multiple paths of parallel calculation.
The hardware implementation method of the parallel generalized inner product operation is further designed in that the intermediate result calculation module is implemented by adopting a four-way parallel implementation mode.
The hardware implementation method of the parallel generalized inner product operation is further designed in that the source data storage mode of the intermediate result calculation module is as follows: the matrix T is stored in a bank0-bank3 according to the columns, and is continuously stored in a bank4-bank7 according to the columns after being full; the matrix X is stored in a column in bank8-bank 11.
The hardware implementation method of the parallel generalized inner product operation is further designed in that an intermediate result storage mode of the intermediate result calculation module is as follows: odd items are deposited in bank12 and even items are deposited in bank 13.
The hardware implementation method of the parallel generalized inner product operation is further designed in that the flow of intermediate result calculation performed by the intermediate result calculation module is as follows: in one operation, first the address generator generates a list of elements X of X L And four columns of T matrix element addresses, simultaneously carrying corresponding matrix element data, and inputting the matrix element data into a complex multiplication accumulator to obtain an intermediate result Y L The method comprises the steps of carrying out a first treatment on the surface of the Then the address generator generates an intermediate result storage address, and stores the intermediate result in the bank.
The hardware implementation method of the parallel generalized inner product operation is further designed in that the flow of the final result calculation performed by the final result calculation module is as follows: when the final result calculation module obtains the intermediate result calculation completion signal, the address generator continuously generates the column X of the matrix X L Address of element and corresponding intermediate result vector Y L An address of the element; simultaneously input to a complex multiply accumulator to obtain a final result Z L The final result storage address is generated by the address generator and the final result is stored in the bank.
The hardware implementation method of the parallel generalized inner product operation is further designed in that the complex multipliers are all pipelined single-precision floating point operation units delayed by 4 clock cycles, and the access delay of the complex multipliers is set to be 6 cycles.
The hardware implementation method of the parallel generalized inner product operation is further designed in that the number of the complex multiply accumulators is five, wherein four of the complex multiply accumulators are used for four-way parallel calculation of intermediate results, and the other complex multiply accumulators are used for synchronous calculation of final results.
The hardware implementation method of the parallel generalized inner product operation is further designed in that each complex multiply accumulator consists of a complex multiplier and three complex adders, and the DC synthesized area under the 40nm CMOS process is 19993.56 mu m 2 。
THE ADVANTAGES OF THE PRESENT INVENTION
The parallel generalized inner product reconstruction controller provided by the invention calculates Z by adopting a strategy of calculating an intermediate result and then immediately calculating a final result element L-1 Can be hidden from the time of computing Y L The calculation time is short and the utilization rate of storage resources is high. The parallel generalized inner product reconstruction controller can meet the high real-time requirement of acquiring test statistics when non-uniform detection is performed in a plurality of signal detection application scenes.
Drawings
FIG. 1 is a schematic diagram of an architecture of a parallel generalized inner product reconstruction controller.
FIG. 2 is a schematic diagram of parallel generalized inner product data storage.
FIG. 3 is a schematic diagram of a parallel generalized inner product algorithm calculation flow.
Detailed Description
The invention is described in detail below with reference to the drawings and specific embodiments.
As shown in fig. 1, the parallel generalized inner product reconstruction controller of the present embodiment is configured by four ways of parallel operation, and mainly comprises three sub-modules, which are respectively: the device comprises an intermediate result calculation module, a final result calculation module and a data storage address processing module. The intermediate result calculation module is used for calculating an intermediate result; the final result calculation module calculates a final result; the data storage address processing module processes related signals such as bank addresses and the like.
Intermediate result calculating module for calculating intermediate result vector Y in complete pipeline L Includes generating X L Column element address, pair X L A row of elements and a square matrix T MxM Each row performs inner product multiply-accumulate operation to obtain an intermediate result vector Y L Generating a vector Y L Is stored in bank. Every time finish one Y L Giving a completion signal to the final result calculation module as a start signal for its one calculation.
The final result calculation module continuously generates columns X of matrix X through an address generator L Address of element and corresponding intermediate result vector Y L The address of the element, the read data enter a complex multiply accumulator to obtain a result matrix Z 1xN The L-th element Z L Generating a vector Z L Is stored in bank.
And the data storage address processing module is used for selecting data according to the ping-pong operation selection signal, processing signals aiming at the same bank from the intermediate result calculation module and the final result calculation module, and generating signals such as correct bank addresses.
As shown in FIG. 1, the memory unit comprises 15 banks, wherein a matrix T is stored in banks 0-7, a matrix X is stored in banks 8-11, and an intermediate result Y L Is stored in a bank12 and a bank13, and the final parallel generalized inner product matrix is stored in the bank 14. The arithmetic unit comprises 5 complex multiply accumulators, the complex multiply accumulators 0-3 are used for four-way parallel calculation of intermediate results, and the complex multiply accumulators 4 are used for simultaneous calculation of final results.
A schematic diagram of parallel generalized inner product data storage is shown in fig. 2. The source data storage mode is as follows: the matrix T is stored in a bank0-bank3 according to the columns, and is continuously stored in a bank4-bank7 according to the columns after being full; the matrix X is stored in a column in bank8-bank 11. So as to store and calculate the intermediate result Y L 4 paths of parallel operation are performed, and the design of a corresponding DMA module can be simplified; intermediate result Y L ,Y 1 、Y 3 … and other odd items are deposited into bank12 (the latter covering the former), Y 2 、Y 4 …, etc. are stored in the bank13 (the latter covers the former). Final generalized senseThe inner product matrix is stored in bank 14.
As shown in fig. 3, the flow of intermediate result calculation by the parallel generalized inner product algorithm is as follows: in one operation, first the address generator 1 generates a column of elements Y of X L And four columns of T matrix element addresses, simultaneously carrying corresponding matrix element data, and inputting the matrix element data into a complex multiplication accumulator to obtain an intermediate result Y L Then, an intermediate result storage address is generated by the address generator 2, and the intermediate result is stored in the bank.
Similarly, the flow of the final result calculation by the parallel generalized inner product algorithm is as follows: in one operation, when the module obtains the intermediate result calculation completion signal, the address generator 1 continuously generates the column X of the matrix X L Address of element, and corresponding intermediate result vector Y L The address of the element. Simultaneously input to a complex multiply accumulator to obtain a final result Z L The final result storage address is then generated by the address generator 2 and the final result is stored in the bank.
The hardware implementation of the parallel generalized inner product algorithm of the invention comprises the following steps:
step 1) setting l=1, starting from the first column of matrix X;
step 2) calculating an intermediate result Y L 。
Calculating intermediate result Y L The method comprises the following steps:
step 2-1) sequentially taking X according to the address generated by the address generator sub-module L Sum (T) 1 T 2 T 3 T 4 ) The elements are sent to a multiply-accumulate sub-module for complex multiply-accumulate operation to obtain (Y L1 Y L2 Y L3 Y L4 );
Step 2-2) will (Y) according to the address generated by the address generator sub-module L1 Y L2 Y L3 Y L4 ) Sequentially writing into intermediate result bank while taking down a group of 4-column T matrix elements and X L Repeating 1) and 2) until Y is completed L Is calculated;
step 3) calculating the final result Z L . With 1), 2) if Y has been generated L-1 According to the groundThe addresses generated by the address generator are sequentially taken as X L-1 And Y L-1 Is subjected to complex multiply-accumulate to obtain Z L-1 Writing the final result into a final result bank according to the address generated by the address generator;
step 4) if L < N, l=l+1, jumping to step two;
step 5) taking X sequentially N And Y N Is subjected to complex multiply-accumulate to obtain Z N And (5) storing the result in a bank to finish the inner product operation.
The complex multipliers used in the parallel generalized inner product reconstruction controller of the embodiment are complex adders which are all running single-precision floating point operation units delayed by 4 clock cycles, access delay is 6 cycles, EDA simulation/synthesis tools are adopted, and the working main frequency reaches 1GHz.
The parallel generalized inner product reconstruction controller of this embodiment totals five complex multiply accumulators, four of which are used to compute intermediate results in four-way parallel and the other of which is used to compute final results in synchronization. Each complex multiply-accumulator consists of a complex multiplier and three complex adders, and the DC integrated area under 40nm CMOS process is 19993.56 μm 2 。
The parallel generalized inner product reconstruction controller of this embodiment calculates Z by employing a strategy that calculates an intermediate result and then immediately calculates a final result element L-1 Can be hidden from the time of computing Y L Compared with the method for calculating the final result in parallel after calculating the complete intermediate result, the method has the advantages of less calculation time and high storage resource utilization rate.
The parallel generalized inner product reconstruction controller of the embodiment has the characteristics of high calculation speed, flexible and variable points and high utilization rate of storage resources. The method can meet the high real-time requirement of acquiring test statistics when non-uniform detection is performed in digital signal processing with large data volume, such as instant signal detection application scenes.
The present invention is not limited to the above-mentioned embodiments, and any changes or modifications within the technical scope of the present invention will be apparent to those skilled in the art. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.
Claims (10)
1. The utility model provides a parallel generalized inner product reconstruction controller which characterized in that: comprising the following steps:
an intermediate result calculation module for receiving the source data and calculating an intermediate result vector Y based on the source data L Generating a vector Y L Storing the address of the bank; per completion one intermediate result vector Y L A completion signal is generated by the calculation of the (a), and the completion signal is sent to a final result calculation module to be used as a starting signal;
the final result calculation module continuously generates columns X of matrix X through an address generator L Address of element and corresponding intermediate result vector Y L The address of the element, the read data enter a complex multiply accumulator to calculate the final result to obtain a result matrix Z 1xN The L-th element Z L Generating a vector Z L Storing the address of the bank;
and the data storage address processing module is used for selecting data according to the ping-pong operation selection signal, and processing signals aiming at the same bank from the intermediate result calculation module and the final result calculation module to generate a correct bank address signal.
2. The parallel generalized inner product reconstruction controller according to claim 1, wherein: calculation of Y L The process of (1) is X L And a square matrix T, wherein the number of rows and columns of the square matrix T is equal to the number of columns of the matrix X, and each column is multiplied and accumulated by multiple paths of parallel calculation.
3. The parallel generalized inner product reconstruction controller according to claim 2, wherein: the intermediate result calculation module is realized by adopting a four-way parallel implementation mode.
4. A parallel generalized inner product reconstruction controller according to claim 3, characterized in that: the source data storage mode of the intermediate result calculation module is as follows: the matrix T is stored in a bank O-bank3 according to the columns, and is continuously stored in a bank4-bank7 according to the columns after being full; the matrix X is stored in a column in bank8-bank 11.
5. A parallel generalized inner product reconstruction controller according to claim 3, characterized in that: the intermediate result storage mode of the intermediate result calculation module is as follows: odd items are deposited in bank12 and even items are deposited in bank 13.
6. The parallel generalized inner product reconstruction controller according to claim 1, wherein: the intermediate result calculation module performs the process of intermediate result calculation: in one operation, first the address generator generates a list of elements X of X L And four columns of T matrix element addresses, simultaneously carrying corresponding matrix element data, and inputting the matrix element data into a complex multiplication accumulator to obtain an intermediate result Y L The method comprises the steps of carrying out a first treatment on the surface of the Then the address generator generates an intermediate result storage address, and stores the intermediate result in the bank.
7. The parallel generalized inner product reconstruction controller according to claim 1, wherein: the final result calculation module performs the following final result calculation process: when the final result calculation module obtains the intermediate result calculation completion signal, the address generator continuously generates the column X of the matrix X L Address of element and corresponding intermediate result vector Y L An address of the element; simultaneously input to a complex multiply accumulator to obtain a final result Z L The final result storage address is generated by the address generator and the final result is stored in the bank.
8. The parallel generalized inner product reconstruction controller according to claim 1, wherein: the complex multiply accumulator is a pipeline single-precision floating point operation unit delayed by 4 clock cycles, and the memory access delay of the complex multiply accumulator is set to be 6 cycles.
9. The parallel generalized inner product reconstruction controller according to claim 1, wherein: the complex multiply accumulator is five, four of which are used for four-way parallel computing intermediate results, and the other is used for synchronous computing final results.
10. The parallel generalized inner product reconstruction controller according to claim 1, wherein: each complex multiply-accumulator consists of a complex multiplier and three complex adders, and the DC integrated area under 40nm CMOS process is 19993.56 μm 2 。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810497969.2A CN108762719B (en) | 2018-05-21 | 2018-05-21 | Parallel generalized inner product reconstruction controller |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810497969.2A CN108762719B (en) | 2018-05-21 | 2018-05-21 | Parallel generalized inner product reconstruction controller |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108762719A CN108762719A (en) | 2018-11-06 |
CN108762719B true CN108762719B (en) | 2023-06-06 |
Family
ID=64004919
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810497969.2A Active CN108762719B (en) | 2018-05-21 | 2018-05-21 | Parallel generalized inner product reconstruction controller |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108762719B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111045965B (en) * | 2019-10-25 | 2021-06-04 | 南京大学 | Hardware implementation method for multi-channel conflict-free splitting, computer equipment and readable storage medium for operating method |
CN110796193A (en) * | 2019-10-29 | 2020-02-14 | 南京宁麒智能计算芯片研究院有限公司 | Reconfigurable KNN algorithm-based hardware implementation system and method |
CN110795687A (en) * | 2019-10-29 | 2020-02-14 | 南京宁麒智能计算芯片研究院有限公司 | Hierarchical segmentation system and method for autocorrelation algorithm |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5276902A (en) * | 1988-11-07 | 1994-01-04 | Fujitsu Limited | Memory access system for vector data processed or to be processed by a vector processor |
CN104794002A (en) * | 2014-12-29 | 2015-07-22 | 南京大学 | Multi-channel parallel dividing method based on specific resources and hardware architecture of multi-channel parallel dividing method based on specific resources |
CN106855618A (en) * | 2017-03-06 | 2017-06-16 | 西安电子科技大学 | Based on the interference sample elimination method under broad sense inner product General Cell |
CN106940815A (en) * | 2017-02-13 | 2017-07-11 | 西安交通大学 | A kind of programmable convolutional neural networks Crypto Coprocessor IP Core |
-
2018
- 2018-05-21 CN CN201810497969.2A patent/CN108762719B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5276902A (en) * | 1988-11-07 | 1994-01-04 | Fujitsu Limited | Memory access system for vector data processed or to be processed by a vector processor |
CN104794002A (en) * | 2014-12-29 | 2015-07-22 | 南京大学 | Multi-channel parallel dividing method based on specific resources and hardware architecture of multi-channel parallel dividing method based on specific resources |
CN106940815A (en) * | 2017-02-13 | 2017-07-11 | 西安交通大学 | A kind of programmable convolutional neural networks Crypto Coprocessor IP Core |
CN106855618A (en) * | 2017-03-06 | 2017-06-16 | 西安电子科技大学 | Based on the interference sample elimination method under broad sense inner product General Cell |
Non-Patent Citations (1)
Title |
---|
二维高精度MUSIC算法的高速实现;张多利等;《合肥工业大学学报(自然科学版)》;20180328(第03期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN108762719A (en) | 2018-11-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108762719B (en) | Parallel generalized inner product reconstruction controller | |
CN105955706B (en) | A kind of divider and division operation method | |
CN111684473A (en) | Improving performance of neural network arrays | |
US20120078988A1 (en) | Modified gram-schmidt core implemented in a single field programmable gate array architecture | |
CN102680945B (en) | Doppler modulation frequency estimation method based on field programmable gate array (FPGA) | |
US20200026746A1 (en) | Matrix and Vector Multiplication Operation Method and Apparatus | |
US10402196B2 (en) | Multi-dimensional sliding window operation for a vector processor, including dividing a filter into a plurality of patterns for selecting data elements from a plurality of input registers and performing calculations in parallel using groups of the data elements and coefficients | |
WO2018027706A1 (en) | Fft processor and algorithm | |
CN111812632B (en) | FPGA-based two-dimensional ordered statistics constant false alarm detector implementation method | |
CN108710505A (en) | A kind of expansible Sparse Matrix-Vector based on FPGA multiplies processor | |
CN116710912A (en) | Matrix multiplier and control method thereof | |
US10949493B2 (en) | Multi-functional computing apparatus and fast fourier transform computing apparatus | |
CN106775579B (en) | Floating-point operation accelerator module based on configurable technology | |
JP7435602B2 (en) | Computing equipment and computing systems | |
CN113592075A (en) | Convolution operation device, method and chip | |
JP2021531572A (en) | Performing successive MAC operations on a set of data using different kernels in the MAC circuit | |
CN113890508A (en) | Hardware implementation method and hardware system for batch processing FIR algorithm | |
CN104574409A (en) | Method and device for detecting target from image | |
CN104598199B (en) | The data processing method and system of a kind of Montgomery modular multipliers for smart card | |
CN111008697B (en) | Convolutional neural network accelerator implementation architecture | |
Sotiropoulos et al. | A fast parallel matrix multiplication reconfigurable unit utilized in face recognitions systems | |
Kalbasi et al. | A classified and comparative study of 2-D convolvers | |
Ehsan et al. | Novel hardware algorithms for row-parallel integral image calculation | |
Javadi et al. | An area-efficient hardware implementation for real-time window-based image filtering | |
CN107193784A (en) | The sinc interpolation realization method and systems of the low hardware complexity of high accuracy |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |