CN103995688A - Disordered vector reduction circuit based on labels - Google Patents

Disordered vector reduction circuit based on labels Download PDF

Info

Publication number
CN103995688A
CN103995688A CN201410240877.8A CN201410240877A CN103995688A CN 103995688 A CN103995688 A CN 103995688A CN 201410240877 A CN201410240877 A CN 201410240877A CN 103995688 A CN103995688 A CN 103995688A
Authority
CN
China
Prior art keywords
module
data
tape label
label data
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410240877.8A
Other languages
Chinese (zh)
Other versions
CN103995688B (en
Inventor
黄以华
韦铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
SYSU CMU Shunde International Joint Research Institute
Original Assignee
Sun Yat Sen University
SYSU CMU Shunde International Joint Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University, SYSU CMU Shunde International Joint Research Institute filed Critical Sun Yat Sen University
Priority to CN201410240877.8A priority Critical patent/CN103995688B/en
Publication of CN103995688A publication Critical patent/CN103995688A/en
Application granted granted Critical
Publication of CN103995688B publication Critical patent/CN103995688B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

The invention discloses a disordered vector reduction circuit based on labels. All data in vectors additionally carry signals to indicate the vectors to which the data belong, and the signals are the labels of the data; the disordered vector reduction circuit comprises a container module, a buffer area module, a multiplexer module and an arithmetical unit module. The disordered vector reduction circuit can simultaneously process a plurality of independent reduction operations with any length, and the data can be input in any sequence; only one universal hardware arithmetical unit is used and completely runs in an assembly line mode; the disordered vector reduction circuit is high in flexibility and scalability, no complex control algorithms or logic exists in the circuit; interfaces are simple, and a time sequential routine is the same as a universal random access storage device.

Description

A kind of unordered vector reduction circuit based on label
Technical field
The present invention relates to a kind of vector reduction circuit field, more specifically, relate to a kind of based on reduction operations label, that can simultaneously carry out a plurality of independent vectors, data input order unordered vector reduction circuit arbitrarily.
Background technology
Vector reduction, refers to the value of a series of input data, and vector, is the process of a scalar value by computing reduction.Such vector calculus form is the basis of a large amount of scientific and engineering computings, its its main operational general satisfaction exchange rate and in conjunction with rate.Common vector reduction operations has the cumulative sum of all elements in vector, tired product, the maximum in searching vector, least member etc.The feature of vector reduction operations is that it is multi-step operation, and later step needs the operation result of preceding step.
Some to the high Embedded Application of algorithm requirement of real-time in, the function of vector reduction operations need to be on the chips such as FPGA, ASIC realizes with the form of hardware circuit.Yet in order to reach higher frequency of operation, complex calculation, as the hardware arithmetical unit of floating add and floating-point multiplication etc., is passed through degree of depth flowing water conventionally mostly, postpone to surpass a clock period.The vector reduction circuit of use multicycle hardware arithmetical unit should be arranged each step of computing properly, reasonably stores and dispatch the operation result of each step, and the streamline of hardware arithmetical unit can be filled as best one can, guarantees the handling capacity of circuit.In addition, consider the level of resources utilization and overall system performance, a hardware arithmetical unit that vector reduction circuit should be used to try one's best few can be processed the reduction operations of a plurality of independent vectors simultaneously, and realizes complete flowing water, does not block input data.
In existing vector reduction circuit, there are some can meet above-mentioned requirements, but while processing at the same time the reduction operations of a plurality of vectors, the data that all require could input after the whole input circuits of data of a vector next vector, that is to say, require input data orderly.Yet in some applications, it is unordered that vector reduction circuit needs the input order of data to be processed, i.e. the data of a plurality of vectors input mixed in together at random.For example, the bus arbiter of vector reduction circuit and a plurality of upper modules arbitration algorithm based on probability by application Lottery algorithm etc. is connected, or has formed network-on-chip, and the order of at this moment inputting data is substantially unpredictable; Existing vector reduction circuit cannot be processed such input data.
In addition, from towards practical application angle, a vector reduction circuit should have standard and friendly interface, without the complicated time sequential routine, to guarantee the favorable compatibility with upper the next module.Existing vector reduction circuit is scarcely paid close attention to the problem of this respect, to practical application, makes troubles.
Summary of the invention
The problem existing for existing vector reduction circuit, the present invention proposes a kind of unordered vector reduction circuit based on label, it can process simultaneously a plurality of independently, the reduction operations of the vector of random length, the input order of data is any; The more dirigibility of this vector reduction circuit, scalability is strong, inner complicated control algolithm and the logic of not existing.
To achieve these goals, technical scheme of the present invention is:
A unordered vector reduction circuit based on label, in vector, all data are all accompanied with a signal to indicate the vector under it, the label that this signal is data;
Circuit comprises container module, buffer zone module, MUX module and operator block;
Container module: in each clock period, container module is accepted two tape label data, checks all data in container module, and the data with same label are made into tape label data pair between two, and output,
Two tape label data that each clock period of container module is accepted are respectively from the tape label data of unordered vector reduction circuit outside and the tape label data of operator block output;
Buffer zone module: carry out buffer operation according to the right quantity of tape label data of container module output;
MUX module: effective tape label data that buffer zone module gating is exported are to exporting operator block to;
Operator block: the data to effective tape label data centering of input are carried out computing, forms tape label data by the label in operation result and effective tape label data, returns to container module.
Its main operational of vector reduction normally meets exchange rate and in conjunction with the dyadic operation (multiplication, addition are got large person, get little person etc.) of rate.The set of all intermediate results when all raw data of a vector of title and reduction operations is a data set.Reduction operations for a plurality of vectors, processing procedure of the present invention a: container module is set, in each clock period, by current input data with from the intermediate result data of arithmetical unit output, put into container, and select from container two data that belong to same data set and partner, send into and in arithmetical unit, carry out computing.If the data of all vectors are input circuit, in container, any two data do not belong to same data set, and the streamline of arithmetical unit has been empty, the reduction operations of so all vectors just completes, and now the data in container are reduction operations results of all vectors.Such processing mode does not obviously need to be concerned about the input order of the data of these vectors.Because input data are the data from a plurality of vectors to be mixed and form, each data must be attached a signal to indicate the vector under it.The label signal that this signal is considered as to data-signal, is attached on data-signal.Container relies on label to distinguish the data of each data set, and data are stored and the operation such as coupling accordingly.The data of same label can be partnered sends into arithmetical unit, and operation result also can be by the same label of affix.Like this, the corresponding unique label value of each data set, invalid data can by additional one with the label of all data sets different label all, invalid to indicate it, do not belong to any one data set, correctness and the accuracy of assurance computing.
Each clock period of described container module is accepted two tape label data outside from circuit and operator block output, and check all internal datas of two tape label data that comprise current input, the data of same label are all made into tape label data pair between two, and all output.
Because paired data can be made into data to output at once, so for each data set, container module is at most only by its data of storage.The storage size that is container module equals the simultaneously treated unordered input vector number of described circuit energy.When simultaneously treated vector number is no more than the storage size of container module, regardless of the input order of its data, circuit can guarantee correctness and the accuracy of computing.
Because tape label data are to only may be by two tape label data inputting to forming, or formed by the tape label data of inputting and the data that are stored in before container module inside, in each clock period, the situation of the quantity that container module output tape label data are right is divided into three kinds: do not export tape label data pair, be designated as state 0; Export a pair of tape label data pair, be designated as state 1; Export two pairs of tape label data pair, be designated as state 2;
Buffer zone module is to carry out buffer operation according to the right quantity of tape label data of described container module output, to solve, exports 2 tape label data to the conflict being caused when container module simultaneously.Detailed process:
When the right quantity situation of container module output tape label data belongs to state 0, buffer zone module ejects a pair of tape label data pair;
When the right quantity situation of container module output tape label data belongs to state 1, buffer zone module keeps the state of a clock period constant;
When the right quantity situation of container module output tape label data belongs to state 2, buffer zone module wherein 1 tape label data to being pressed into.
When the right quantity situation of container module output tape label data belongs to state 0, buffer zone module ejects a pair of tape label data pair; When buffer zone module is empty, the label segment that makes buffer zone module output port is invalid value, to guarantee correctness and the accuracy of computing.
Circuit of the present invention ejects order no requirement (NR) to being pressed into of buffer zone module.The minimum-depth of buffer zone is a finite value p-1 all irrelevant with wanting simultaneously treated vector number, the length of each vector, the order of the data of each vector input, and wherein p is the pipeline series sum of container module and operator block.As long as be not less than this minimum-depth, at vector length and data input order arbitrarily, buffer zone module can not overflow.
Operator block consists of a general-purpose operation device and a signal delay device, and general-purpose operation device is selected according to the vector reduction operations that specifically will carry out, and its pipeline series is arbitrarily; The delay period number of signal delay device equates with the pipeline series of general-purpose operation device, so that the output data of general-purpose operation device are mated with the output label of signal delay device.
Each clock period, operator block is accepted tape label data pair, and data division is sent into general-purpose operation device and carried out computing, and label segment is sent into signal delay device and postponed, the label of the data of arithmetical unit output and the output of signal delay device is formed to tape label data, return to container module.
MUX module by the tape label data of container module and buffer zone module output to gating to operator block.In each clock period, if 1 tape label data pair of container module output, MUX module by these tape label data to gating to operator block; If 2 tape label data pair of container module output, MUX module by the tape label data that are not pressed into described buffer zone module to gating to operator block; If 0 tape label data pair of container module output, the tape label data that MUX module ejects buffer zone module to gating to operator block.
One group of status signal that MUX module provides according to container module, by effective tape label data of container module and buffer zone module output to gating to operator block, gating strategy can guarantee correctness and the accuracy of the reduction operations of all vectors.
Each clock period of container module can be accepted one from outside tape label data, as long as being not less than minimum-depth, buffer zone module just can not overflow, each clock period of operator block only need be processed tape label data pair, and the tape label data of output also can enter container module clog-freely.This represents that circuit of the present invention does not exist the situation of obstruction, is complete flowing water.
During from outside tape label data input pod module, be choke free, the reduction operations result of all vectors is also stored in container module, and because storage space and the data set of container module shines upon one by one, available label is addressing in container module.This makes for the data input and output of circuit, and label signal is equivalent to address signal, and the time sequential routine of described circuit is the same with general random access storage device.
When last tape label data input circuit, through maximum the individual clock period, can be from reading the reduction operations result of all vectors in described container module, wherein p is the described total flowing water progression of inside circuit.
Circuit interface of the present invention is simple, and the time sequential routine is identical with general random access storage device.
Accompanying drawing explanation
Fig. 1 be unordered vector reduction circuit based on label be the circuit theory diagrams of embodiment 1.
Fig. 2 is the circuit theory diagrams of container module Container.
Embodiment
Below in conjunction with accompanying drawing, the present invention will be further described, but embodiments of the present invention are not limited to this.Embodiment 1
Shown in Fig. 1,2 is the circuit theory diagrams of embodiment 1 that the present invention is based on the unordered vector reduction circuit of label, by container module Container, and buffer zone module RBuff, MUX module MUX and operator block Operator form.Stb_x, tag_x, dat_x are the input ports of tape label data, the output port that dat_o is result data, and ctl_read is for reading the control signal of operation result.
Tape label data { stb_x from circuit outside, tag_x, the label segment of dat_x} is { stb_x, tag_x}, { label segment of dat_r} is { stb_r to the tape label data that operator block is returned for stb_r, tag_r, tag_r}, other tape label data in circuit and tape label data are in like manner.{ stb_*, the width of tag_*} (* is asterisk wildcard, lower same) is 1+n to the label segment that all tape label data or tape label data are right.Wherein, stb_* width is 1, is the most significant digit of label signal, is called gating signal, and its value is 1 o'clock, represents that this label is effective, is 0 o'clock, represents that this label is invalid, like this as long as judge that the height of gating signal can judge that whether this label is effective.Tag_* is the value of effective label, and width is n, and available effective label is m=2 nindividual, effectively label span is 0~2 n-1.
Hardware arithmetical unit in operator block Operator is selected according to the vector reduction operations that specifically will carry out, and its pipeline series is any.Operator block signal delay device interior and other parts of circuit all uses d type flip flop to realize.
Container module is comprised of with these two submodules of update module CacheStatQAU_PT and a label comparator circuit cache module Cache, buffer status inquiry.Each of container module is accepted respectively one of the tape label data of and operator block outside from circuit in cycle.Due to for each data set, container module is at most only stored its data, so effectively tape label data are to only having 3 kinds of combinations: consist of two tape label data inputting, by the tape label data from circuit outside and the corresponding data in container, formed, or formed by the corresponding data in the tape label data of exporting from operator block and container.Container module is enumerated out the tape label data pair of these 3 kinds of combinations: { flg_x_matches_r, tag_x, dat_x, dat_r}, { flg_x_got_matched, tag_x, dat_x, dat_x_cache}, { flg_r_got_matched, tag_r, dat_r, dat_r_cache}, validity is indicated by its gating signal flg_x_matches_r, flg_x_got_matched, flg_r_got_matched respectively.MUX module MUX and buffer zone module RBuff distinguish the go forward side by side line correlation operation of their validity according to their gating signal.
The buffer memory of the data division of two tape label data of the responsible current input pod module of cache module Cache is exported two data of storage before simultaneously, for current data pairing.Buffer memory Cache is used a dual-port Ram who is operated under Read First pattern to realize, and read latency is p cc, general p cc=1.Effective label number that its capacity is corresponding available is m=2 n, circuit can be processed the reduction operations of m data set at most simultaneously.Signal tag_* is directly as the address of buffer memory Cache, so that data set to be processed has unique storage space in cache module Cache.When the tape label data of input pod module are invalid, can be that low meeting makes to write and enables as low because of gating signal, the data in cache module Cache can be not influenced.Tape label data { stb_x for input, tag_x, dat_x}, it is address that buffer memory Cache be take the value of tag_x, is addressed to and is storing the position of data that label value is the data set of tag_x, and the data dat_x_cache of last stored is read, be used for forming tape label data to { flg_x_got_matched, tag_x, dat_x, dat_x_cache}.Meanwhile, if { effectively and with another tape label data { label of dat_r} is unequal for stb_r, tag_r, and cache module Cache reads dat_x_cache to cover it with dat_x afterwards for the label of dat_x} for stb_x, tag_x.For the tape label data of input pod module, { dat_r}, processes similar for stb_r, tag_r.The read-write of buffer memory Cache is not subject to the impact of the buffer status of each data set.Although the tape label data of input are made into valid data to rear with the data that are stored in before cache module Cache, its data division still can be stored, but because cache module Cache is operated in Read First pattern, and buffer status inquiry is being recorded buffer status with update module CacheStatQAU_PT, data are repeated to calculate and not calculated data can't be occurred by situations such as new data wash out.
Label comparator circuit provides tape label data to { flg_x_matches_r, tag_x, dat_x, the gating signal flg_x_got_matched of dat_r}.If the label of the tape label data of two of input pod module is all effective and equal, show that dat_x and dat_r are a pair of active datas pair, the value of label is tag_x, it is high to show { flg_x_matches_r that label comparator circuit makes flg_x_matches_r, tag_x, dat_x, dat_r} is effective tape label data pair, otherwise it is low making flg_x_matches_r.
Buffer status inquiry and update module CacheStatQAU_PT according to two of input pod module tape label data with label, inquiry and the store status of renewal corresponding data collection in container module, and provide tape label data to { flg_x_got_matched, tag_x, dat_x, the gating signal flg_x_got_matched of dat_x_cache} and { flg_r_got_matched, tag_r, dat_r, the gating signal flg_r_got_matched of dat_r_cache}.Because the buffer status of each data set in container module only has two kinds: the data that are with or without this data set in container, the buffer status register of a m position is set in module CacheStatQAU_PT, the buffer status of a data set of its every record, carrys out bit addressing with label signal.For the tape label data of input pod module stb_x, tag_x, dat_x}, if be effective label, module CacheStatQAU is according to the corresponding position of its label value tag_x addressing buffer status register.If this position is 1, show that cache module Cache stores the data with the same data set of dat_x, be that dat_x_cache and dat_x are a pair of active datas pair, module CacheStatQAU makes gating signal flg_x_got_matched for height is to show { flg_x_got_matched, tag_x, dat_x, dat_x_cache} is effective tape label data pair, and be 0 this bit flipping, in expression cache module Cache, there is no the data with the same data set of dat_x; If this position is 0, expression buffer memory Cache is not storing the data with the same data set of dat_x, it is low to show { flg_x_got_matched that module CacheStatQAU puts flg_x_got_matched, tag_x, dat_x, dat_x_cache} is invalid tape label data pair, and is 1 this bit flipping, represents the data of existing in cache module Cache and the same data set of dat_x.If { what dat_x} was with is invalid label for stb_x, tag_x, and it is low that module CacheStatQAU directly puts flg_x_got_matched, and buffer status register does not change.For the tape label data of input pod module, { dat_r}, processes similar for stb_r, tag_r.By above-mentioned behavior, can be summarized as the false code of lower module CacheStatQAU_PT behavior description:
input:stb_x,tag_x,stb_r,tag_r
flg_x_got_matched=cache_stat[tag_x]&stb_x
flg_r_got_matched=cache_stat[tag_r]&stb_r
case({stb_x,stb_r})
00:cache_stat=cache_stat
01:cache_stat=cache_stat⊕(1<<tag_x)
10:cache_stat=cache_stat⊕(1<<tag_r)
11:cache_stat=cache_stat⊕(1<<tag_x)⊕(1<<tag_r)
endcase
Wherein, cache_stat is its inner buffer status register, and figure place is m; ' ⊕ ' represents XOR, and ' << ' represents left shift operation, the corresponding positions negate that these two computings are used for to cache_stat.The delay p of module ptbefore depending on flg_x_got_matched and flg_r_got_matched output, whether latch, if latch, be 1, otherwise be 0.
Buffer zone module RBuff is used a FIFO who is operated in Fall-Through pattern to realize, indicate FIFO and be whether empty signal not_empty be used as tape label data on buffer zone module RBuff output port to not_empty, the gating signal of dat_o}. MUX module MUX is one 3 and selects 1 logical circuit.The behavior description false code of buffer zone module RBuff and MUX module MUX is as follows:
The label of two tape label data of input pod module is effectively also identical is a kind of special circumstances.Under these circumstances, only have flg_x_matches_r, tag_x, dat_x, dat_r} is effective tape label data pair.If now stored in container module and they data with data set, and flg_x_got_matched, tag_x, dat_x, dat_x_cache} and flg_r_got_matched, tag_r, dat_r, dat_r_cache} also can be marked as effectively.However, yet from describing above, data dat_x and dat_r do not write and in cache module Cache, override original data, buffer status inquiry and buffer status register cache_stat in update module CacheStatQAU corresponding got twice anti-, its value is constant, buffer area module RBuff attonity, the tape label data of MUX module MUX gating are to being { flg_x_matches_r, tag_x, dat_x, dat_r}.That is to say, prototype circuit makes these two direct Fall-Through of tape label data arrive operator block Operator, and computing is still correct in accurately.Process be like this for fear of on two ports simultaneously to cache module Cache in same position read and write, to avoid conflict and mistake, also can reduce the complexity of circuit simultaneously.
The effect of ctl_read signal is from module Cache, to read the operation result of certain data set, and the corresponding position of the interior buffer status register cache_stat of module CacheStatQAU_PT that resets, the label taking to reclaim this data set.While not reading operation result, it is low making ctl_read.When the vector reduction operations of a data set completes, in circuit, only there are data in this data set in container module Container, and the corresponding position of buffer status register cache_stat is 1.Read the operation result of this data set, make that { stb_x, tag_x} is the label value that this data set is corresponding, and to make ctl_read be high.Because this result data is the unique data of this data set in circuit, so being high situation, flg_x_matches_r can not occur, make { flg_x_got_matched, tag_x, dat_x, dat_x_cache} is used as effective tape label data to output, and it is 0 that the corresponding position of cache_stat is also reversed.Dat_x_cache is the operation result of this data set, due to dat_o=dat_x_cache, in circuit outside, can read this operation result from port dat_o.Simultaneously, ctl_read is that height makes the MUX module Mux can gating { flg_x_got_matched, tag_x, dat_x, dat_x_cache} is to operator block Operator, make the computing of current data set interference-free, and avoid the data of this data set to return to container module again taking label.
There is not situation about getting clogged in inside circuit streamline, circuit is complete flowing water.From outside, it seems, the interface of prototype circuit and time sequential routine all the random access storage device that size is m a: tag_x can regard Input Address port as, dat_x is input FPDP, dat_o is output FPDP, { ctl_read, stb_x} is read-write enable port, and read latency is the delay p of container module Container c.
Above-described embodiments of the present invention, do not form limiting the scope of the present invention.Any modification of having done within spiritual principles of the present invention, be equal to and replace and improvement etc., within all should being included in claim protection domain of the present invention.

Claims (10)

1. the unordered vector reduction circuit based on label, is characterized in that, in vector, all data are all accompanied with a signal to indicate the vector under it, the label that this signal is data;
Circuit comprises container module, buffer zone module, MUX module and operator block;
Container module: in each clock period, container module is accepted two tape label data, checks all data in container module, and the data with same label are made into tape label data pair between two, and output,
Two tape label data that each clock period of container module is accepted are respectively from the tape label data of unordered vector reduction circuit outside and the tape label data of operator block output;
Buffer zone module: carry out buffer operation according to the right quantity of tape label data of container module output;
MUX module: effective tape label data that buffer zone module gating is exported are to exporting operator block to;
Operator block: the data to effective tape label data centering of input are carried out computing, forms tape label data by the label in operation result and effective tape label data, returns to container module.
2. circuit according to claim 1, is characterized in that, the storage size of container module equals described circuit can simultaneously treated unordered input vector number.
3. circuit according to claim 1, is characterized in that, in each clock period, the situation of the quantity that container module output tape label data are right is divided into three kinds: do not export tape label data pair, be designated as state 0; Export a pair of tape label data pair, be designated as state 1; Export two pairs of tape label data pair, be designated as state 2;
When the right quantity situation of container module output tape label data belongs to state 0, buffer zone module ejects a pair of tape label data pair;
When the right quantity situation of container module output tape label data belongs to state 1, buffer zone module keeps the state of a clock period constant;
When the right quantity situation of container module output tape label data belongs to state 2, buffer zone module wherein 1 tape label data to being pressed into.
4. circuit according to claim 3, is characterized in that, when the right quantity situation of container module output tape label data belongs to state 0, buffer zone module ejects a pair of tape label data pair; When buffer zone module is empty, the label segment that makes buffer zone module output port is invalid value.
5. circuit according to claim 3, is characterized in that, the minimum-depth of described buffer zone module is a finite value p-1, and wherein p is the pipeline series sum of container module and operator block.
6. circuit according to claim 1, it is characterized in that, operator block consists of a general-purpose operation device and a signal delay device, each clock period of operator block is accepted tape label data pair, data division is sent into general-purpose operation device and is carried out computing, label segment is sent into signal delay device and is postponed, and the label of the data of arithmetical unit output and the output of signal delay device is formed to tape label data, returns to container module.
7. according to the circuit described in right 6, it is characterized in that, general-purpose operation device is selected according to the vector reduction operations that specifically will carry out, and its pipeline series is arbitrarily; The delay period number of described signal delay device equates with the pipeline series of general-purpose operation device, and the output data of general-purpose operation device are mated with the output label of signal delay device.
8. according to the circuit described in right 1, it is characterized in that, MUX module by the tape label data of container module and buffer zone module output to gating to operator block;
In each clock period, if 1 tape label data pair of container module output, MUX module by these tape label data to gating to operator block;
If 2 tape label data pair of container module output, MUX module by the tape label data that are not pressed into described buffer zone module to gating to operator block;
If 0 tape label data pair of container module output, the tape label data that MUX module ejects buffer zone module to gating to operator block.
9. according to the circuit described in right 8, it is characterized in that, one group of status signal that MUX module provides according to container module, by effective tape label data of container module and buffer zone module output to gating to operator block, gating strategy can guarantee correctness and the accuracy of the reduction operations of all vectors.
10. according to the circuit described in right 1 to 9 any one, it is characterized in that, the time sequential routine of described circuit is identical with general random access storage device.
CN201410240877.8A 2014-05-30 2014-05-30 A kind of unordered vector reduction circuit based on label Active CN103995688B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410240877.8A CN103995688B (en) 2014-05-30 2014-05-30 A kind of unordered vector reduction circuit based on label

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410240877.8A CN103995688B (en) 2014-05-30 2014-05-30 A kind of unordered vector reduction circuit based on label

Publications (2)

Publication Number Publication Date
CN103995688A true CN103995688A (en) 2014-08-20
CN103995688B CN103995688B (en) 2016-10-12

Family

ID=51309866

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410240877.8A Active CN103995688B (en) 2014-05-30 2014-05-30 A kind of unordered vector reduction circuit based on label

Country Status (1)

Country Link
CN (1) CN103995688B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI673648B (en) * 2017-04-03 2019-10-01 美商谷歌有限責任公司 Vector reduction processor
CN111105042A (en) * 2019-12-13 2020-05-05 广东浪潮大数据研究有限公司 Parallel message processing method, system and related device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102004672A (en) * 2010-11-25 2011-04-06 中国人民解放军国防科学技术大学 Reduction device capable of configuring auto-increment interval of reduction target
WO2014051720A1 (en) * 2012-09-28 2014-04-03 Intel Corporation Accelerated interlane vector reduction instructions

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102004672A (en) * 2010-11-25 2011-04-06 中国人民解放军国防科学技术大学 Reduction device capable of configuring auto-increment interval of reduction target
WO2014051720A1 (en) * 2012-09-28 2014-04-03 Intel Corporation Accelerated interlane vector reduction instructions

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
庄巍等: "一种适用于向量处理器的新型归约网络", 《小型微型计算机系统》, vol. 33, no. 11, 15 November 2012 (2012-11-15), pages 2498 - 2502 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI673648B (en) * 2017-04-03 2019-10-01 美商谷歌有限責任公司 Vector reduction processor
US10706007B2 (en) 2017-04-03 2020-07-07 Google Llc Vector reduction processor
US11061854B2 (en) 2017-04-03 2021-07-13 Google Llc Vector reduction processor
US11940946B2 (en) 2017-04-03 2024-03-26 Google Llc Vector reduction processor
CN111105042A (en) * 2019-12-13 2020-05-05 广东浪潮大数据研究有限公司 Parallel message processing method, system and related device
CN111105042B (en) * 2019-12-13 2023-07-25 广东浪潮大数据研究有限公司 Parallel message processing method, system and related device

Also Published As

Publication number Publication date
CN103995688B (en) 2016-10-12

Similar Documents

Publication Publication Date Title
CN105740168B (en) A kind of fault-tolerant directory caching controller
CN104067282B (en) Counter operation in state machine lattice
CN107301455B (en) Hybrid cube storage system for convolutional neural network and accelerated computing method
CN101996147B (en) Method for realizing dual-port RAM (Random-Access memory) mutual exclusion access
CN104011736B (en) For the method and system of the detection in state machine
CN103730149B (en) A kind of read-write control circuit of dual-ported memory
CN108475194A (en) Register communication in on-chip network structure
US7987322B2 (en) Snoop request management in a data processing system
TW201835906A (en) Apparatuses and methods for compute in data path
CN106372008B (en) A kind of data cache method and device
EP3264415B1 (en) Methods and apparatus for smart memory interface
NL7908893A (en) FLOATING, COMMA PROCESSOR, PROVIDED WITH SIMULTANEOUS EXPONENT / MANTISSE OPERATION.
CN101918925B (en) Second chance replacement mechanism for a highly associative cache memory of a processor
CN103890857A (en) Shiftable memory employing ring registers
CN103927270A (en) Shared data caching device for a plurality of coarse-grained dynamic reconfigurable arrays and control method
CN108632624A (en) Image processing method, device, terminal device and readable storage medium storing program for executing
JP6229024B2 (en) A memory having an information search function, a method of using the same, an apparatus, and an information processing method.
CN102736888B (en) With the data retrieval circuit of synchronization of data streams
CN103995688A (en) Disordered vector reduction circuit based on labels
CN113270126A (en) Stream access memory device, system and method
CN106415483A (en) Floating point unit with support for variable length numbers
US3302185A (en) Flexible logic circuits for buffer memory
CN104679670A (en) Shared data caching structure and management method for FFT (fast Fourier transform) and FIR (finite impulse response) algorithms
CN104035898A (en) Memory access system based on VLIW (Very Long Instruction Word) type processor
CN100495384C (en) Data inputting/outputting construction in coarse-grain re-arrangable computing structure

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant