CN103995688B - A kind of unordered vector reduction circuit based on label - Google Patents
A kind of unordered vector reduction circuit based on label Download PDFInfo
- Publication number
- CN103995688B CN103995688B CN201410240877.8A CN201410240877A CN103995688B CN 103995688 B CN103995688 B CN 103995688B CN 201410240877 A CN201410240877 A CN 201410240877A CN 103995688 B CN103995688 B CN 103995688B
- Authority
- CN
- China
- Prior art keywords
- module
- tape label
- data
- label data
- output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The invention discloses a kind of unordered vector reduction circuit based on label, in a vector, all of data are all accompanied with a signal to indicate its affiliated vector, and this signal is the label of data;Unordered vector reduction circuit includes a container module, buffer zone module, MUX module and operator block.This unordered vector reduction circuit can process multiple independent, reduction operations of random length vector simultaneously, and the input order of data is any;Only use 1 common hardware arithmetical unit, and flowing water completely;Motility is strong with scalability, the internal control algolithm that there is not complexity and logic;Interface is simple, and the time sequential routine random access storage device general with is identical.
Description
Technical field
The present invention relates to a kind of vector reduction circuit field, more particularly, to a kind of based on label, can
Carry out simultaneously multiple independent vector reduction operations, the arbitrary unordered vector reduction circuit of data input order.
Background technology
Vector reduction, refers to the value of a series of input data, i.e. vector, is one by computing reduction
The process of scalar value.Such vector calculus form is the basis of a large amount of scientific and engineering computing, its core
Heart computing general satisfaction exchange rate and combination rate.Common vector reduction operations has the tired of all elements in vector
Add and, tired product, find the maximum in vector, least member etc..The feature of vector reduction operations is that it is
Multi-step operation, and later step needs the operation result of preceding step.
In the Embedded Application that some are high to algorithm requirement of real-time, the function of vector reduction operations needs
On the chips such as FPGA, ASIC, the form with hardware circuit realizes.But, in order to reach higher work frequency
Rate, the hardware computation device of most complex calculation such as floating add and floating-point multiplication etc., generally go through degree of depth flowing water,
It is delayed over a clock cycle.The vector reduction circuit using multicycle hardware computation device should arrange good luck
Each step calculated, reasonably stores and dispatches the operation result of each step so that the flowing water of hardware computation device
Line can be filled as best one can, it is ensured that the handling capacity of circuit.Furthermore, it is contemplated that the level of resources utilization and system are total
Body performance, a vector reduction circuit should use the fewest hardware computation device, can process simultaneously multiple solely
The reduction operations of vertical vector, and realize complete flowing water, do not block input data.
Existing vector reduction circuit there are some can meet above-mentioned requirements, but process multiple vector at the same time
Reduction operations time, be desirable that the data of a vector could input next vector after fully entering circuit
Data, say, that, it is desirable to input data are orderly.But, in some applications, vector reduction circuit needs
The input order of data to be processed is unordered, and the data of the most multiple vectors are the most mixed in together defeated
Enter.Such as, vector reduction circuit and multiple upper modules are based on probability secondary by application Lottery algorithm etc.
The bus arbiter cutting out algorithm connects, or constitutes network-on-chip, and the order at this moment inputting data substantially cannot
Prediction;Existing vector reduction circuit cannot process such input data.
It addition, from towards reality application angle, a vector reduction circuit should have specification and close friend's
Interface, without the complicated time sequential routine, to ensure and the favorable compatibility of upper the next module.Existing vector is returned
About circuit scarcely pays close attention to the problem of this respect, makes troubles to reality application.
Summary of the invention
The problem existed for existing vector reduction circuit, the present invention proposes a kind of based on label unordered
Vector reduction circuit, it can process multiple independent, reduction operations of random length vector, data simultaneously
Input order any;The more flexible property of this vector reduction circuit, scalability is strong, and inside does not exist complexity
Control algolithm and logic.
To achieve these goals, the technical scheme is that
A kind of unordered vector reduction circuit based on label, in a vector, all of data are all accompanied with one
Signal is to indicate its affiliated vector, and this signal is the label of data;
Circuit includes container module, buffer zone module, MUX module and operator block;
Container module: in each clock cycle, container module accepts two tape label data, checks container mould
Data with same label are made into tape label data pair, and export by all of data two-by-two in block,
Two tape label data that container module each clock cycle accepts are from unordered vector reduction electricity respectively
The tape label data that tape label data outside road export with operator block;
Buffer zone module: carry out buffer operation according to the quantity of the tape label data pair of container module output;
MUX module: by effective tape label data of buffer zone module gating output to output to computing
Device module;
Operator block: the data of effective tape label data centering of input are carried out computing, by operation result
Constitute tape label data with the label in effective tape label data, return container module.
Its main operational of vector reduction be typically to meet exchange rate and combination rate binary operation (multiplication, addition,
Take big person, take little person etc.).Claim all middle junction during all initial datas and the reduction operations of a vector
The collection of fruit is combined into a data set.For the reduction operations of multiple vectors, the processing procedure of the present invention: arrange
One container module, in each clock cycle by current input data and the intermediate object program from output arithmetical unit
Data put into container, and select two from container and belong to the data set of same data set in a pair, send into fortune
Calculate in device and carry out computing.If the data of all vectors input circuit, in container, any two data are the most not
Belong to same data set, and the streamline of arithmetical unit has been empty, then the reduction operations of all vectors is just
Completing, now the data in container are the reduction operations results of all vectors.Such processing mode is obvious
It is not required to be concerned about the input order of the data of these vectors.Because input data are the numbers from multiple vectors
Forming according to mixing, each data inherently attach a signal to indicate its affiliated vector.By this signal
It is considered as the label signal of data signal, is attached in data signal.Container relies on label to distinguish each data
The data of collection, store data accordingly and coupling etc. operates.The data of same label can be combined into one
To sending into arithmetical unit, operation result also can be attached the same label.So, each data set is corresponding
Only one label value, invalid data can be attached a label the most different from the label of all data sets,
To indicate that it is invalid, it is not belonging to any one data set, it is ensured that the correctness of computing and accuracy.
Described container module each clock cycle accepts two bands from circuit external with operator block output
Label data, and check all internal datas including two the tape label data being currently entered, will
The data of same label are made into tape label data pair the most two-by-two, and all export.
Owing to paired data can be made into data to output at once, so for each data set, container
Module is the most only by its data of storage.The i.e. storage size of container module is equal to described circuit energy
The unordered input vector number simultaneously processed.When the vector number processed is less than the storage of container module simultaneously
Space size, regardless of the input order of its data, circuit can guarantee that correctness and the accuracy of computing.
Owing to tape label data are to being only possible to by two the tape label data inputted constituting, or by the band inputted
Label data is previously stored in the data within container module and constitutes, at each clock cycle, container module
The situation of the quantity of output tape label data pair is divided into three kinds: does not export tape label data pair, is designated as state 0;
Export a pair tape label data pair, be designated as state 1;Output two, to tape label data pair, is designated as state 2;
Buffer zone module is that the quantity of the tape label data pair exported according to described container module carries out buffering behaviour
Make, to solve to export the conflict to being caused of 2 tape label data when container module simultaneously.Detailed process:
When the quantity situation of container module output tape label data pair belongs to state 0, buffer zone module ejects
A pair tape label data pair;
When the quantity situation of container module output tape label data pair belongs to state 1, buffer zone module keeps
The state of a upper clock cycle is constant;
When the quantity situation of container module output tape label data pair belongs to state 2, buffer zone module is by it
In 1 tape label data to press-in.
When the quantity situation of container module output tape label data pair belongs to state 0, buffer zone module ejects
A pair tape label data pair;When buffer zone module is empty, the label segment making buffer zone module output port is
Invalid value, to ensure correctness and the accuracy of computing.
The press-in of buffer zone module is ejected order no requirement (NR) by the circuit of the present invention.The minimum-depth of relief area is
One order inputted with the data of the vector number to process, the length of each vector, each vector simultaneously
The most unrelated finite value p-1, wherein p is the pipeline series sum of container module and operator block.As long as
Not less than this minimum-depth, at arbitrary vector length and data input order, buffer zone module will not overflow.
Operator block is made up of a general-purpose operation device and a signal delay device, and general-purpose operation device is according to tool
The vector reduction operations that body is to be carried out is selected, and its pipeline series is arbitrary;The delay week of signal delay device
Issue is equal with the pipeline series of general-purpose operation device, so that the output data of general-purpose operation device and signal delay
The output label coupling of device.
Each clock cycle, operator block accepts tape label data pair, and data division sends into general fortune
Calculating device and carry out computing, label segment is sent into signal delay device and is postponed, the data exported arithmetical unit and letter
The label of number delayer output constitutes tape label data, returns container module.
The tape label data that container module and buffer zone module are exported by MUX module are to gating to computing
Device module.In each clock cycle, if container module 1 tape label data pair of output, MUX mould
Block by these tape label data to gating to operator block;If container module 2 tape label data pair of output,
MUX module will not be pressed into the tape label data of described buffer zone module to gating to mould arithmetical unit
Block;If container module 0 tape label data pair of output, buffer zone module is ejected by MUX module
Tape label data are to gating to operator block.
One group of status signal that MUX module provides according to container module, by container module and relief area
Effective tape label data of module output are to gating to operator block, and gating scheme can guarantee that all vectors
The correctness of reduction operations and accuracy.
Container module each clock cycle can accept one from outside tape label data, relief area mould
As long as block would not overflow not less than minimum-depth, operator block each clock cycle only need to process a band
Label data pair, and the tape label data exported can also enter container module without blocking.This represents this
There is not the situation of obstruction in bright circuit, is complete flowing water.
It is choke free during from outside tape label data input pod module, the reduction operations knot of all vectors
Fruit also is stored in container module, owing to memory space and the data set of container module map one by one, available
Label addresses in container module.This makes for the data input and output of circuit, and label signal is of equal value
In address signal, the time sequential routine of described circuit is as general random access storage device.
When last tape label data input circuit, through at mostTime individual
In the clock cycle, can read the reduction operations result of all vectors in described container module, wherein p is
The internal total flowing water progression of described circuit.
The circuit interface of the present invention is simple, and the time sequential routine is identical with general random access storage device.
Accompanying drawing explanation
Fig. 1 be unordered vector reduction circuit based on label be the circuit theory diagrams of embodiment 1.
Fig. 2 is the circuit theory diagrams of container module Container.
Detailed description of the invention
The present invention will be further described below in conjunction with the accompanying drawings, but embodiments of the present invention are not limited to this.
Embodiment 1
Fig. 1, shown in 2 it is that the circuit of embodiment 1 of the present invention unordered vector reduction circuit based on label is former
Reason figure, by container module Container, buffer zone module RBuff, MUX module MUX and fortune
Calculate device module Operator to constitute.Stb_x, tag_x, dat_x are the input ports of tape label data, dat_o
For the output port of result data, ctl_read is the control signal for reading operation result.
From circuit external tape label data the label segment of stb_x, tag_x, dat_x} be stb_x,
Tag_x}, operator block return tape label data the label segment of stb_r, tag_r, dat_r} be stb_r,
Tag_r}, other tape label data in circuit and tape label data are in like manner.All tape label data or band mark
{ width of stb_*, tag_*} (* is asterisk wildcard, lower same) is 1+n to the label segment of label data pair.Wherein,
Stb_* width is 1, is the highest order of label signal, referred to as gating signal, when its value is 1, represents this mark
It is signed with effect, when being 0, represents this tag deactivation, as long as so judging that the height of gating signal i.e. can determine whether this
Label is the most effective.Tag_* is the value of effective label, and width is n, then available effective label is m=2n
Individual, effective label span is 0~2n-1。
Hardware computation device in operator block Operator is according to vector reduction operations choosing to be carried out
Fixed, its pipeline series is any.Interior and other parts of circuit the signal delay device of operator block all uses D
Trigger realizes.
Container module is by cache module Cache, buffer status inquiry and more new module CacheStatQAU_PT
The two submodule and a label comparison circuit composition.Container module each cycle accepts from circuit external
With the tape label data of operator block each one.Owing to for each data set, container module is the most only
Store its data, so effective tape label data are to only 3 kinds of combinations: by two bands inputted
Label data is constituted, and is made up of the corresponding data in the tape label data and container of circuit external, or by
Corresponding data in the tape label data and container of operator block output is constituted.Container module is set out
The tape label data pair of these 3 kinds combinations: flg_x_matches_r, tag_x, dat_x, dat_r},
Flg_x_got_matched, tag_x, dat_x, dat_x_cache}, flg_r_got_matched, tag_r, dat_r,
Dat_r_cache}, effectiveness respectively by its gating signal flg_x_matches_r, flg_x_got_matched,
Flg_r_got_matched indicates.MUX module MUX and buffer zone module RBuff are according to it
Gating signal distinguish their effectiveness and carry out associative operation.
Cache module Cache is responsible for being currently entered the slow of the data division of two tape label data of container module
Deposit, simultaneously the data of storage before output two, for current data pair.Caching Cache uses one
The individual dual-port Ram being operated under Read First pattern realizes, and read latency is pcc, general pcc=1.Its
Effective label number that capacity correspondence is available, for m=2n, then circuit at most can process m data collection simultaneously
Reduction operations.Signal tag_* is directly as the address of caching Cache, so that data set to be processed is slow
Unique memory space is had in storing module Cache.And the tape label data of input pod module are when being invalid
Can be then that low meeting makes to write enable for low because of gating signal, the data in cache module Cache will not be by shadow
Ring.For the tape label data of input, { stb_x, tag_x, dat_x}, caching Cache is with the value of tag_x as ground
Location, is addressed to store the position of the data of the data set that label value is tag_x, by the data of last stored
Dat_x_cache read, be used for constitute tape label data to flg_x_got_matched, tag_x, dat_x,
dat_x_cache}.Meanwhile, if the label of stb_x, tag_x, dat_x} effectively and with another tape label data
{ label of stb_r, tag_r, dat_r} is unequal, and cache module Cache uses dat_x after being read by dat_x_cache
Cover it.For the tape label data of input pod module, { stb_r, tag_r, dat_r} process similar.Caching
The read-write of Cache is not affected by the buffer status of each data set.Although input tape label data with
Being stored in the data of cache module Cache before and be made into valid data to rear, its data division still can be stored
Get off, but because cache module Cache is operated in Read First pattern, and buffer status inquiry with more
New module CacheStatQAU_PT records buffer status, and data are by double counting and the most calculated data
Can't be occurred by situations such as new data wash out.
Label comparison circuit provides tape label data to the { choosing of flg_x_matches_r, tag_x, dat_x, dat_r}
Messenger flg_x_got_matched.If the label of the tape label data of the two of input pod module is the most effective
And equal, show that dat_x and dat_r is a pair effective data pair, the value of label is tag_x, label ratio
Relatively circuit then makes flg_x_matches_r be high to show that { flg_x_matches_r, tag_x, dat_x, dat_r} are
One effective tape label data pair, otherwise, it is low for making flg_x_matches_r.
Buffer status inquiry and more new module CacheStatQAU_PT are according to two band marks of input pod module
Sign the label that data are carried, inquire about and update corresponding data collection storage state in container module, and being given
Tape label data are to { the gating signal of flg_x_got_matched, tag_x, dat_x, dat_x_cache}
Flg_x_got_matched and the { gating signal of flg_r_got_matched, tag_r, dat_r, dat_r_cache}
flg_r_got_matched.Because the buffer status that each data set is in container module only has two kinds: in container
It is with or without the data of this data set, the caching shape of a m position is set in module CacheStatQAU_PT
State depositor, the buffer status of one data set of its every record, carrys out bit addressing with label signal.For input
The tape label data of container module stb_x, tag_x, dat_x}, and if carried is effective label, module
CacheStatQAU is according to the corresponding position of its label value tag_x addressing buffer status depositor.If this position is
1, show that cache module Cache storage has the data of data set same with dat_x, i.e. dat_x_cache
Being a pair effective data pair with dat_x, module CacheStatQAU then makes gating signal
Flg_x_got_matched is high to show that { flg_x_got_matched, tag_x, dat_x, dat_x_cache} are
One effective tape label data pair, and be 0 this bit flipping, do not have in representing cache module Cache
The data of data set same with dat_x;If this position is 0, represent that caching Cache does not stores and dat_x
The data of same data set, it is low to show that module CacheStatQAU then puts flg_x_got_matched
Flg_x_got_matched, tag_x, dat_x, dat_x_cache} are invalid tape label data pair, and
It is 1 this bit flipping, the existing data with the same data set of dat_x in representing cache module Cache.If
{ stb_x, tag_x, dat_x} band is invalid tag, and module CacheStatQAU is directly put
Flg_x_got_matched is low, and buffer status depositor does not changes.Band mark for input pod module
{ stb_r, tag_r, dat_r} process similar to sign data.Can be summarized such as lower module by above-mentioned behavior
The false code of CacheStatQAU_PT behavior description:
Wherein, cache_stat is the buffer status depositor that it is internal, and figure place is m;' ' represents XOR fortune
Calculating, ' < < ' represents left shift operation, and the two computing is used for negating the corresponding positions of cache_stat.Module
Postpone pptWhether latch before depending on flg_x_got_matched and flg_r_got_matched output, if lock
Deposit, be 1, be otherwise 0.
Buffer zone module RBuff uses a FIFO being operated in Fall-Through pattern to realize, and indicates
The signal not_empty whether FIFO is empty is used as the band mark on buffer zone module RBuff output port
Sign data to the gating signal of not_empty, dat_o}. MUX module MUX is one 3 and selects 1
Logic circuit.Buffer zone module RBuff is as follows with the behavior description false code of MUX module MUX:
The label of two tape label data of input pod module is effective and identical is a kind of special circumstances.At this
In the case of sample, only { flg_x_matches_r, tag_x, dat_x, dat_r} are effective tape label data pair.
If now container module internal memory contains with them with the data of data set, flg_x_got_matched, tag_x,
Dat_x, dat_x_cache} are with { flg_r_got_matched, tag_r, dat_r, dat_r_cache} also can be marked as
Effectively.While it is true, but from described above, data dat_x and dat_r not write caching mould
Original data are override, in buffer status inquiry and more new module CacheStatQAU in block Cache
The corresponding position of buffer status depositor cache_stat taken twice anti-, its value is constant, buffer area module RBuff
Attonity, the tape label data of MUX module MUX gating to being flg_x_matches_r, tag_x,
dat_x,dat_r}.It is to say, prototype circuit makes the two direct Fall-Through of tape label data arrive
Operator block Operator, computing is still correctly and accurately.So processing is in order to avoid at two ends
Position same in cache module Cache is written and read on mouth simultaneously, to avoid conflict and mistake, with
Time can also reduce the complexity of circuit.
The effect of ctl_read signal is the operation result reading certain data set from module Cache, and resets mould
The corresponding position of buffer status depositor cache_stat in block CacheStatQAU_PT, to reclaim this data set
The label taken.When not reading operation result, ctl_read to be made is low.When the vector of a data set is returned
About computing completes, and in circuit, this data set only exists data in container module Container, slow
The corresponding position depositing status register cache_stat is 1.Read the operation result of this data set, make stb_x,
Tag_x} is the label value that this data set is corresponding, and to make ctl_read be high.Because this result data is this number
According to collection only one data in circuit, will not occur so flg_x_matches_r is high situation, make
Flg_x_got_matched, tag_x, dat_x, dat_x_cache} by as effective tape label data to defeated
Going out, it is 0 that the corresponding position of cache_stat is also reversed.Dat_x_cache is i.e. the operation result of this data set,
Due to dat_o=dat_x_cache, this operation result can be read from port dat_o at circuit external.Meanwhile,
Ctl_read be height make MUX module Mux will not gate flg_x_got_matched, tag_x,
Dat_x, dat_x_cache} are to operator block Operator so that the computing of current data set is interference-free,
And avoid the data return container module of this data set again to take label.
There is not blocked situation in circuit internal pipeline, circuit is complete flowing water.Externally come,
The interface of prototype circuit and all one size of time sequential routine are that the random access storage device of m: tag_x can regard as
Input address port, dat_x is input FPDP, and dat_o is output FPDP, { ctl_read, stb_x}
For read-write enable port, read latency is the delay p of container module Containerc。
The embodiment of invention described above, is not intended that limiting the scope of the present invention.Any
Amendment, equivalent and improvement etc. done within the spiritual principles of the present invention, should be included in this
Within bright claims.
Claims (10)
1. a unordered vector reduction circuit based on label, it is characterised in that all of number in a vector
According to being all accompanied with a signal to indicate its affiliated vector, this signal is the label of data;
Circuit includes container module, buffer zone module, MUX module and operator block;
Container module: in each clock cycle, container module accepts two tape label data, checks container mould
Data with same label are made into tape label data pair, and export by all of data two-by-two in block,
Two tape label data that container module each clock cycle accepts are from unordered vector reduction electricity respectively
The tape label data that tape label data outside road export with operator block;
Buffer zone module: carry out buffer operation according to the quantity of the tape label data pair of container module output;
MUX module: by effective tape label data of buffer zone module gating output to output to computing
Device module;
Operator block: the data of effective tape label data centering of input are carried out computing, by operation result
Constitute tape label data with the label in effective tape label data, return container module.
Circuit the most according to claim 1, it is characterised in that the storage size etc. of container module
In the unordered input vector number that described circuit can process simultaneously.
Circuit the most according to claim 1, it is characterised in that at each clock cycle, container module
The situation of the quantity of output tape label data pair is divided into three kinds: does not export tape label data pair, is designated as state 0;
Export a pair tape label data pair, be designated as state 1;Output two, to tape label data pair, is designated as state 2;
When the quantity situation of container module output tape label data pair belongs to state 0, buffer zone module ejects
A pair tape label data pair;
When the quantity situation of container module output tape label data pair belongs to state 1, buffer zone module keeps
The state of a upper clock cycle is constant;
When the quantity situation of container module output tape label data pair belongs to state 2, buffer zone module is by it
In 1 tape label data to press-in.
Circuit the most according to claim 3, it is characterised in that when container module exports tape label data
To quantity situation belong to state 0 time, buffer zone module ejects a pair tape label data pair;When relief area mould
Block is empty, and the label segment making buffer zone module output port is invalid value.
Circuit the most according to claim 3, it is characterised in that the minimum of described buffer zone module is deep
Degree is a finite value p-1, and wherein p is the pipeline series sum of container module and operator block.
Circuit the most according to claim 1, it is characterised in that operator block is by a general-purpose operation
Device and a signal delay device are constituted, and operator block each clock cycle accepts tape label data pair,
Data division is sent into general-purpose operation device and is carried out computing, and label segment is sent into signal delay device and postponed, will fortune
The label of the data and the output of signal delay device of calculating device output constitutes tape label data, returns container module.
Circuit the most according to claim 6, it is characterised in that general-purpose operation device is according to specifically to carry out
Vector reduction operations select, its pipeline series is arbitrary;The delay period number of described signal delay device
Equal with the pipeline series of general-purpose operation device so that the output data of general-purpose operation device and signal delay device
Output label mates.
Circuit the most according to claim 1, it is characterised in that MUX module is by container module
With buffer zone module output tape label data to gating to operator block;
In each clock cycle, if container module 1 tape label data pair of output, MUX module will
These tape label data are to gating to operator block;
If container module 2 tape label data pair of output, MUX module will not be pressed into described delaying
Rush the tape label data of district's module to gating to operator block;
If container module 0 tape label data pair of output, buffer zone module is ejected by MUX module
Tape label data are to gating to operator block.
Circuit the most according to claim 8, it is characterised in that MUX module is according to container mould
One group of status signal that block provides, effective tape label data container module and buffer zone module exported are to choosing
Passing to operator block, gating scheme can guarantee that correctness and the accuracy of the reduction operations of all vectors.
10. according to the circuit described in any one of claim 1 to 9, it is characterised in that the operation of described circuit
Sequential is identical with general random access storage device.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410240877.8A CN103995688B (en) | 2014-05-30 | 2014-05-30 | A kind of unordered vector reduction circuit based on label |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410240877.8A CN103995688B (en) | 2014-05-30 | 2014-05-30 | A kind of unordered vector reduction circuit based on label |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103995688A CN103995688A (en) | 2014-08-20 |
CN103995688B true CN103995688B (en) | 2016-10-12 |
Family
ID=51309866
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410240877.8A Active CN103995688B (en) | 2014-05-30 | 2014-05-30 | A kind of unordered vector reduction circuit based on label |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103995688B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10108581B1 (en) | 2017-04-03 | 2018-10-23 | Google Llc | Vector reduction processor |
CN111105042B (en) * | 2019-12-13 | 2023-07-25 | 广东浪潮大数据研究有限公司 | Parallel message processing method, system and related device |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102004672A (en) * | 2010-11-25 | 2011-04-06 | 中国人民解放军国防科学技术大学 | Reduction device capable of configuring auto-increment interval of reduction target |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9588766B2 (en) * | 2012-09-28 | 2017-03-07 | Intel Corporation | Accelerated interlane vector reduction instructions |
-
2014
- 2014-05-30 CN CN201410240877.8A patent/CN103995688B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102004672A (en) * | 2010-11-25 | 2011-04-06 | 中国人民解放军国防科学技术大学 | Reduction device capable of configuring auto-increment interval of reduction target |
Non-Patent Citations (1)
Title |
---|
一种适用于向量处理器的新型归约网络;庄巍等;《小型微型计算机系统》;20121115;第33卷(第11期);第2498页至第2502页 * |
Also Published As
Publication number | Publication date |
---|---|
CN103995688A (en) | 2014-08-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110689126B (en) | Device for executing neural network operation | |
CN104571949B (en) | Realize calculating processor and its operating method merged with storage based on memristor | |
CN107169563B (en) | Processing system and method applied to two-value weight convolutional network | |
US11422720B2 (en) | Apparatuses and methods to change data category values | |
CN105740168B (en) | A kind of fault-tolerant directory caching controller | |
CN107301455B (en) | Hybrid cube storage system for convolutional neural network and accelerated computing method | |
CN108475194A (en) | Register communication in on-chip network structure | |
CN107729989A (en) | A kind of device and method for being used to perform artificial neural network forward operation | |
CN107392309A (en) | A kind of general fixed-point number neutral net convolution accelerator hardware structure based on FPGA | |
CN108713196A (en) | The data transmission carried out using bit vector arithmetic unit | |
CN101918925B (en) | Second chance replacement mechanism for a highly associative cache memory of a processor | |
CN107683505A (en) | For calculating the device and method of the cache memory enabled | |
CN104699631A (en) | Storage device and fetching method for multilayered cooperation and sharing in GPDSP (General-Purpose Digital Signal Processor) | |
CN103730149B (en) | A kind of read-write control circuit of dual-ported memory | |
US10114795B2 (en) | Processor in non-volatile storage memory | |
CN103890857A (en) | Shiftable memory employing ring registers | |
CN104679691B (en) | A kind of multinuclear DMA segment data transmission methods using host count for GPDSP | |
CN106372008B (en) | A kind of data cache method and device | |
CN108257078A (en) | Memory knows the source of reordering | |
CN110476212A (en) | Device and method for data switching networks in memory | |
US11705207B2 (en) | Processor in non-volatile storage memory | |
CN103995688B (en) | A kind of unordered vector reduction circuit based on label | |
CN103761072A (en) | Coarse granularity reconfigurable hierarchical array register file structure | |
CN106339327B (en) | A kind of computer system and blade server cabinet | |
CN109800867B (en) | Data calling method based on FPGA off-chip memory |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |