CN103995688B - A kind of unordered vector reduction circuit based on label - Google Patents

A kind of unordered vector reduction circuit based on label Download PDF

Info

Publication number
CN103995688B
CN103995688B CN201410240877.8A CN201410240877A CN103995688B CN 103995688 B CN103995688 B CN 103995688B CN 201410240877 A CN201410240877 A CN 201410240877A CN 103995688 B CN103995688 B CN 103995688B
Authority
CN
China
Prior art keywords
module
tape label
data
label data
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410240877.8A
Other languages
Chinese (zh)
Other versions
CN103995688A (en
Inventor
黄以华
韦铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SYSU CMU Shunde International Joint Research Institute
National Sun Yat Sen University
Original Assignee
SYSU CMU Shunde International Joint Research Institute
National Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SYSU CMU Shunde International Joint Research Institute, National Sun Yat Sen University filed Critical SYSU CMU Shunde International Joint Research Institute
Priority to CN201410240877.8A priority Critical patent/CN103995688B/en
Publication of CN103995688A publication Critical patent/CN103995688A/en
Application granted granted Critical
Publication of CN103995688B publication Critical patent/CN103995688B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of unordered vector reduction circuit based on label, in a vector, all of data are all accompanied with a signal to indicate its affiliated vector, and this signal is the label of data;Unordered vector reduction circuit includes a container module, buffer zone module, MUX module and operator block.This unordered vector reduction circuit can process multiple independent, reduction operations of random length vector simultaneously, and the input order of data is any;Only use 1 common hardware arithmetical unit, and flowing water completely;Motility is strong with scalability, the internal control algolithm that there is not complexity and logic;Interface is simple, and the time sequential routine random access storage device general with is identical.

Description

A kind of unordered vector reduction circuit based on label
Technical field
The present invention relates to a kind of vector reduction circuit field, more particularly, to a kind of based on label, can Carry out simultaneously multiple independent vector reduction operations, the arbitrary unordered vector reduction circuit of data input order.
Background technology
Vector reduction, refers to the value of a series of input data, i.e. vector, is one by computing reduction The process of scalar value.Such vector calculus form is the basis of a large amount of scientific and engineering computing, its core Heart computing general satisfaction exchange rate and combination rate.Common vector reduction operations has the tired of all elements in vector Add and, tired product, find the maximum in vector, least member etc..The feature of vector reduction operations is that it is Multi-step operation, and later step needs the operation result of preceding step.
In the Embedded Application that some are high to algorithm requirement of real-time, the function of vector reduction operations needs On the chips such as FPGA, ASIC, the form with hardware circuit realizes.But, in order to reach higher work frequency Rate, the hardware computation device of most complex calculation such as floating add and floating-point multiplication etc., generally go through degree of depth flowing water, It is delayed over a clock cycle.The vector reduction circuit using multicycle hardware computation device should arrange good luck Each step calculated, reasonably stores and dispatches the operation result of each step so that the flowing water of hardware computation device Line can be filled as best one can, it is ensured that the handling capacity of circuit.Furthermore, it is contemplated that the level of resources utilization and system are total Body performance, a vector reduction circuit should use the fewest hardware computation device, can process simultaneously multiple solely The reduction operations of vertical vector, and realize complete flowing water, do not block input data.
Existing vector reduction circuit there are some can meet above-mentioned requirements, but process multiple vector at the same time Reduction operations time, be desirable that the data of a vector could input next vector after fully entering circuit Data, say, that, it is desirable to input data are orderly.But, in some applications, vector reduction circuit needs The input order of data to be processed is unordered, and the data of the most multiple vectors are the most mixed in together defeated Enter.Such as, vector reduction circuit and multiple upper modules are based on probability secondary by application Lottery algorithm etc. The bus arbiter cutting out algorithm connects, or constitutes network-on-chip, and the order at this moment inputting data substantially cannot Prediction;Existing vector reduction circuit cannot process such input data.
It addition, from towards reality application angle, a vector reduction circuit should have specification and close friend's Interface, without the complicated time sequential routine, to ensure and the favorable compatibility of upper the next module.Existing vector is returned About circuit scarcely pays close attention to the problem of this respect, makes troubles to reality application.
Summary of the invention
The problem existed for existing vector reduction circuit, the present invention proposes a kind of based on label unordered Vector reduction circuit, it can process multiple independent, reduction operations of random length vector, data simultaneously Input order any;The more flexible property of this vector reduction circuit, scalability is strong, and inside does not exist complexity Control algolithm and logic.
To achieve these goals, the technical scheme is that
A kind of unordered vector reduction circuit based on label, in a vector, all of data are all accompanied with one Signal is to indicate its affiliated vector, and this signal is the label of data;
Circuit includes container module, buffer zone module, MUX module and operator block;
Container module: in each clock cycle, container module accepts two tape label data, checks container mould Data with same label are made into tape label data pair, and export by all of data two-by-two in block,
Two tape label data that container module each clock cycle accepts are from unordered vector reduction electricity respectively The tape label data that tape label data outside road export with operator block;
Buffer zone module: carry out buffer operation according to the quantity of the tape label data pair of container module output;
MUX module: by effective tape label data of buffer zone module gating output to output to computing Device module;
Operator block: the data of effective tape label data centering of input are carried out computing, by operation result Constitute tape label data with the label in effective tape label data, return container module.
Its main operational of vector reduction be typically to meet exchange rate and combination rate binary operation (multiplication, addition, Take big person, take little person etc.).Claim all middle junction during all initial datas and the reduction operations of a vector The collection of fruit is combined into a data set.For the reduction operations of multiple vectors, the processing procedure of the present invention: arrange One container module, in each clock cycle by current input data and the intermediate object program from output arithmetical unit Data put into container, and select two from container and belong to the data set of same data set in a pair, send into fortune Calculate in device and carry out computing.If the data of all vectors input circuit, in container, any two data are the most not Belong to same data set, and the streamline of arithmetical unit has been empty, then the reduction operations of all vectors is just Completing, now the data in container are the reduction operations results of all vectors.Such processing mode is obvious It is not required to be concerned about the input order of the data of these vectors.Because input data are the numbers from multiple vectors Forming according to mixing, each data inherently attach a signal to indicate its affiliated vector.By this signal It is considered as the label signal of data signal, is attached in data signal.Container relies on label to distinguish each data The data of collection, store data accordingly and coupling etc. operates.The data of same label can be combined into one To sending into arithmetical unit, operation result also can be attached the same label.So, each data set is corresponding Only one label value, invalid data can be attached a label the most different from the label of all data sets, To indicate that it is invalid, it is not belonging to any one data set, it is ensured that the correctness of computing and accuracy.
Described container module each clock cycle accepts two bands from circuit external with operator block output Label data, and check all internal datas including two the tape label data being currently entered, will The data of same label are made into tape label data pair the most two-by-two, and all export.
Owing to paired data can be made into data to output at once, so for each data set, container Module is the most only by its data of storage.The i.e. storage size of container module is equal to described circuit energy The unordered input vector number simultaneously processed.When the vector number processed is less than the storage of container module simultaneously Space size, regardless of the input order of its data, circuit can guarantee that correctness and the accuracy of computing.
Owing to tape label data are to being only possible to by two the tape label data inputted constituting, or by the band inputted Label data is previously stored in the data within container module and constitutes, at each clock cycle, container module The situation of the quantity of output tape label data pair is divided into three kinds: does not export tape label data pair, is designated as state 0; Export a pair tape label data pair, be designated as state 1;Output two, to tape label data pair, is designated as state 2;
Buffer zone module is that the quantity of the tape label data pair exported according to described container module carries out buffering behaviour Make, to solve to export the conflict to being caused of 2 tape label data when container module simultaneously.Detailed process:
When the quantity situation of container module output tape label data pair belongs to state 0, buffer zone module ejects A pair tape label data pair;
When the quantity situation of container module output tape label data pair belongs to state 1, buffer zone module keeps The state of a upper clock cycle is constant;
When the quantity situation of container module output tape label data pair belongs to state 2, buffer zone module is by it In 1 tape label data to press-in.
When the quantity situation of container module output tape label data pair belongs to state 0, buffer zone module ejects A pair tape label data pair;When buffer zone module is empty, the label segment making buffer zone module output port is Invalid value, to ensure correctness and the accuracy of computing.
The press-in of buffer zone module is ejected order no requirement (NR) by the circuit of the present invention.The minimum-depth of relief area is One order inputted with the data of the vector number to process, the length of each vector, each vector simultaneously The most unrelated finite value p-1, wherein p is the pipeline series sum of container module and operator block.As long as Not less than this minimum-depth, at arbitrary vector length and data input order, buffer zone module will not overflow.
Operator block is made up of a general-purpose operation device and a signal delay device, and general-purpose operation device is according to tool The vector reduction operations that body is to be carried out is selected, and its pipeline series is arbitrary;The delay week of signal delay device Issue is equal with the pipeline series of general-purpose operation device, so that the output data of general-purpose operation device and signal delay The output label coupling of device.
Each clock cycle, operator block accepts tape label data pair, and data division sends into general fortune Calculating device and carry out computing, label segment is sent into signal delay device and is postponed, the data exported arithmetical unit and letter The label of number delayer output constitutes tape label data, returns container module.
The tape label data that container module and buffer zone module are exported by MUX module are to gating to computing Device module.In each clock cycle, if container module 1 tape label data pair of output, MUX mould Block by these tape label data to gating to operator block;If container module 2 tape label data pair of output, MUX module will not be pressed into the tape label data of described buffer zone module to gating to mould arithmetical unit Block;If container module 0 tape label data pair of output, buffer zone module is ejected by MUX module Tape label data are to gating to operator block.
One group of status signal that MUX module provides according to container module, by container module and relief area Effective tape label data of module output are to gating to operator block, and gating scheme can guarantee that all vectors The correctness of reduction operations and accuracy.
Container module each clock cycle can accept one from outside tape label data, relief area mould As long as block would not overflow not less than minimum-depth, operator block each clock cycle only need to process a band Label data pair, and the tape label data exported can also enter container module without blocking.This represents this There is not the situation of obstruction in bright circuit, is complete flowing water.
It is choke free during from outside tape label data input pod module, the reduction operations knot of all vectors Fruit also is stored in container module, owing to memory space and the data set of container module map one by one, available Label addresses in container module.This makes for the data input and output of circuit, and label signal is of equal value In address signal, the time sequential routine of described circuit is as general random access storage device.
When last tape label data input circuit, through at mostTime individual In the clock cycle, can read the reduction operations result of all vectors in described container module, wherein p is The internal total flowing water progression of described circuit.
The circuit interface of the present invention is simple, and the time sequential routine is identical with general random access storage device.
Accompanying drawing explanation
Fig. 1 be unordered vector reduction circuit based on label be the circuit theory diagrams of embodiment 1.
Fig. 2 is the circuit theory diagrams of container module Container.
Detailed description of the invention
The present invention will be further described below in conjunction with the accompanying drawings, but embodiments of the present invention are not limited to this. Embodiment 1
Fig. 1, shown in 2 it is that the circuit of embodiment 1 of the present invention unordered vector reduction circuit based on label is former Reason figure, by container module Container, buffer zone module RBuff, MUX module MUX and fortune Calculate device module Operator to constitute.Stb_x, tag_x, dat_x are the input ports of tape label data, dat_o For the output port of result data, ctl_read is the control signal for reading operation result.
From circuit external tape label data the label segment of stb_x, tag_x, dat_x} be stb_x, Tag_x}, operator block return tape label data the label segment of stb_r, tag_r, dat_r} be stb_r, Tag_r}, other tape label data in circuit and tape label data are in like manner.All tape label data or band mark { width of stb_*, tag_*} (* is asterisk wildcard, lower same) is 1+n to the label segment of label data pair.Wherein, Stb_* width is 1, is the highest order of label signal, referred to as gating signal, when its value is 1, represents this mark It is signed with effect, when being 0, represents this tag deactivation, as long as so judging that the height of gating signal i.e. can determine whether this Label is the most effective.Tag_* is the value of effective label, and width is n, then available effective label is m=2n Individual, effective label span is 0~2n-1。
Hardware computation device in operator block Operator is according to vector reduction operations choosing to be carried out Fixed, its pipeline series is any.Interior and other parts of circuit the signal delay device of operator block all uses D Trigger realizes.
Container module is by cache module Cache, buffer status inquiry and more new module CacheStatQAU_PT The two submodule and a label comparison circuit composition.Container module each cycle accepts from circuit external With the tape label data of operator block each one.Owing to for each data set, container module is the most only Store its data, so effective tape label data are to only 3 kinds of combinations: by two bands inputted Label data is constituted, and is made up of the corresponding data in the tape label data and container of circuit external, or by Corresponding data in the tape label data and container of operator block output is constituted.Container module is set out The tape label data pair of these 3 kinds combinations: flg_x_matches_r, tag_x, dat_x, dat_r}, Flg_x_got_matched, tag_x, dat_x, dat_x_cache}, flg_r_got_matched, tag_r, dat_r, Dat_r_cache}, effectiveness respectively by its gating signal flg_x_matches_r, flg_x_got_matched, Flg_r_got_matched indicates.MUX module MUX and buffer zone module RBuff are according to it Gating signal distinguish their effectiveness and carry out associative operation.
Cache module Cache is responsible for being currently entered the slow of the data division of two tape label data of container module Deposit, simultaneously the data of storage before output two, for current data pair.Caching Cache uses one The individual dual-port Ram being operated under Read First pattern realizes, and read latency is pcc, general pcc=1.Its Effective label number that capacity correspondence is available, for m=2n, then circuit at most can process m data collection simultaneously Reduction operations.Signal tag_* is directly as the address of caching Cache, so that data set to be processed is slow Unique memory space is had in storing module Cache.And the tape label data of input pod module are when being invalid Can be then that low meeting makes to write enable for low because of gating signal, the data in cache module Cache will not be by shadow Ring.For the tape label data of input, { stb_x, tag_x, dat_x}, caching Cache is with the value of tag_x as ground Location, is addressed to store the position of the data of the data set that label value is tag_x, by the data of last stored Dat_x_cache read, be used for constitute tape label data to flg_x_got_matched, tag_x, dat_x, dat_x_cache}.Meanwhile, if the label of stb_x, tag_x, dat_x} effectively and with another tape label data { label of stb_r, tag_r, dat_r} is unequal, and cache module Cache uses dat_x after being read by dat_x_cache Cover it.For the tape label data of input pod module, { stb_r, tag_r, dat_r} process similar.Caching The read-write of Cache is not affected by the buffer status of each data set.Although input tape label data with Being stored in the data of cache module Cache before and be made into valid data to rear, its data division still can be stored Get off, but because cache module Cache is operated in Read First pattern, and buffer status inquiry with more New module CacheStatQAU_PT records buffer status, and data are by double counting and the most calculated data Can't be occurred by situations such as new data wash out.
Label comparison circuit provides tape label data to the { choosing of flg_x_matches_r, tag_x, dat_x, dat_r} Messenger flg_x_got_matched.If the label of the tape label data of the two of input pod module is the most effective And equal, show that dat_x and dat_r is a pair effective data pair, the value of label is tag_x, label ratio Relatively circuit then makes flg_x_matches_r be high to show that { flg_x_matches_r, tag_x, dat_x, dat_r} are One effective tape label data pair, otherwise, it is low for making flg_x_matches_r.
Buffer status inquiry and more new module CacheStatQAU_PT are according to two band marks of input pod module Sign the label that data are carried, inquire about and update corresponding data collection storage state in container module, and being given Tape label data are to { the gating signal of flg_x_got_matched, tag_x, dat_x, dat_x_cache} Flg_x_got_matched and the { gating signal of flg_r_got_matched, tag_r, dat_r, dat_r_cache} flg_r_got_matched.Because the buffer status that each data set is in container module only has two kinds: in container It is with or without the data of this data set, the caching shape of a m position is set in module CacheStatQAU_PT State depositor, the buffer status of one data set of its every record, carrys out bit addressing with label signal.For input The tape label data of container module stb_x, tag_x, dat_x}, and if carried is effective label, module CacheStatQAU is according to the corresponding position of its label value tag_x addressing buffer status depositor.If this position is 1, show that cache module Cache storage has the data of data set same with dat_x, i.e. dat_x_cache Being a pair effective data pair with dat_x, module CacheStatQAU then makes gating signal Flg_x_got_matched is high to show that { flg_x_got_matched, tag_x, dat_x, dat_x_cache} are One effective tape label data pair, and be 0 this bit flipping, do not have in representing cache module Cache The data of data set same with dat_x;If this position is 0, represent that caching Cache does not stores and dat_x The data of same data set, it is low to show that module CacheStatQAU then puts flg_x_got_matched Flg_x_got_matched, tag_x, dat_x, dat_x_cache} are invalid tape label data pair, and It is 1 this bit flipping, the existing data with the same data set of dat_x in representing cache module Cache.If { stb_x, tag_x, dat_x} band is invalid tag, and module CacheStatQAU is directly put Flg_x_got_matched is low, and buffer status depositor does not changes.Band mark for input pod module { stb_r, tag_r, dat_r} process similar to sign data.Can be summarized such as lower module by above-mentioned behavior The false code of CacheStatQAU_PT behavior description:
Wherein, cache_stat is the buffer status depositor that it is internal, and figure place is m;' ' represents XOR fortune Calculating, ' < < ' represents left shift operation, and the two computing is used for negating the corresponding positions of cache_stat.Module Postpone pptWhether latch before depending on flg_x_got_matched and flg_r_got_matched output, if lock Deposit, be 1, be otherwise 0.
Buffer zone module RBuff uses a FIFO being operated in Fall-Through pattern to realize, and indicates The signal not_empty whether FIFO is empty is used as the band mark on buffer zone module RBuff output port Sign data to the gating signal of not_empty, dat_o}. MUX module MUX is one 3 and selects 1 Logic circuit.Buffer zone module RBuff is as follows with the behavior description false code of MUX module MUX:
The label of two tape label data of input pod module is effective and identical is a kind of special circumstances.At this In the case of sample, only { flg_x_matches_r, tag_x, dat_x, dat_r} are effective tape label data pair. If now container module internal memory contains with them with the data of data set, flg_x_got_matched, tag_x, Dat_x, dat_x_cache} are with { flg_r_got_matched, tag_r, dat_r, dat_r_cache} also can be marked as Effectively.While it is true, but from described above, data dat_x and dat_r not write caching mould Original data are override, in buffer status inquiry and more new module CacheStatQAU in block Cache The corresponding position of buffer status depositor cache_stat taken twice anti-, its value is constant, buffer area module RBuff Attonity, the tape label data of MUX module MUX gating to being flg_x_matches_r, tag_x, dat_x,dat_r}.It is to say, prototype circuit makes the two direct Fall-Through of tape label data arrive Operator block Operator, computing is still correctly and accurately.So processing is in order to avoid at two ends Position same in cache module Cache is written and read on mouth simultaneously, to avoid conflict and mistake, with Time can also reduce the complexity of circuit.
The effect of ctl_read signal is the operation result reading certain data set from module Cache, and resets mould The corresponding position of buffer status depositor cache_stat in block CacheStatQAU_PT, to reclaim this data set The label taken.When not reading operation result, ctl_read to be made is low.When the vector of a data set is returned About computing completes, and in circuit, this data set only exists data in container module Container, slow The corresponding position depositing status register cache_stat is 1.Read the operation result of this data set, make stb_x, Tag_x} is the label value that this data set is corresponding, and to make ctl_read be high.Because this result data is this number According to collection only one data in circuit, will not occur so flg_x_matches_r is high situation, make Flg_x_got_matched, tag_x, dat_x, dat_x_cache} by as effective tape label data to defeated Going out, it is 0 that the corresponding position of cache_stat is also reversed.Dat_x_cache is i.e. the operation result of this data set, Due to dat_o=dat_x_cache, this operation result can be read from port dat_o at circuit external.Meanwhile, Ctl_read be height make MUX module Mux will not gate flg_x_got_matched, tag_x, Dat_x, dat_x_cache} are to operator block Operator so that the computing of current data set is interference-free, And avoid the data return container module of this data set again to take label.
There is not blocked situation in circuit internal pipeline, circuit is complete flowing water.Externally come, The interface of prototype circuit and all one size of time sequential routine are that the random access storage device of m: tag_x can regard as Input address port, dat_x is input FPDP, and dat_o is output FPDP, { ctl_read, stb_x} For read-write enable port, read latency is the delay p of container module Containerc
The embodiment of invention described above, is not intended that limiting the scope of the present invention.Any Amendment, equivalent and improvement etc. done within the spiritual principles of the present invention, should be included in this Within bright claims.

Claims (10)

1. a unordered vector reduction circuit based on label, it is characterised in that all of number in a vector According to being all accompanied with a signal to indicate its affiliated vector, this signal is the label of data;
Circuit includes container module, buffer zone module, MUX module and operator block;
Container module: in each clock cycle, container module accepts two tape label data, checks container mould Data with same label are made into tape label data pair, and export by all of data two-by-two in block,
Two tape label data that container module each clock cycle accepts are from unordered vector reduction electricity respectively The tape label data that tape label data outside road export with operator block;
Buffer zone module: carry out buffer operation according to the quantity of the tape label data pair of container module output;
MUX module: by effective tape label data of buffer zone module gating output to output to computing Device module;
Operator block: the data of effective tape label data centering of input are carried out computing, by operation result Constitute tape label data with the label in effective tape label data, return container module.
Circuit the most according to claim 1, it is characterised in that the storage size etc. of container module In the unordered input vector number that described circuit can process simultaneously.
Circuit the most according to claim 1, it is characterised in that at each clock cycle, container module The situation of the quantity of output tape label data pair is divided into three kinds: does not export tape label data pair, is designated as state 0; Export a pair tape label data pair, be designated as state 1;Output two, to tape label data pair, is designated as state 2;
When the quantity situation of container module output tape label data pair belongs to state 0, buffer zone module ejects A pair tape label data pair;
When the quantity situation of container module output tape label data pair belongs to state 1, buffer zone module keeps The state of a upper clock cycle is constant;
When the quantity situation of container module output tape label data pair belongs to state 2, buffer zone module is by it In 1 tape label data to press-in.
Circuit the most according to claim 3, it is characterised in that when container module exports tape label data To quantity situation belong to state 0 time, buffer zone module ejects a pair tape label data pair;When relief area mould Block is empty, and the label segment making buffer zone module output port is invalid value.
Circuit the most according to claim 3, it is characterised in that the minimum of described buffer zone module is deep Degree is a finite value p-1, and wherein p is the pipeline series sum of container module and operator block.
Circuit the most according to claim 1, it is characterised in that operator block is by a general-purpose operation Device and a signal delay device are constituted, and operator block each clock cycle accepts tape label data pair, Data division is sent into general-purpose operation device and is carried out computing, and label segment is sent into signal delay device and postponed, will fortune The label of the data and the output of signal delay device of calculating device output constitutes tape label data, returns container module.
Circuit the most according to claim 6, it is characterised in that general-purpose operation device is according to specifically to carry out Vector reduction operations select, its pipeline series is arbitrary;The delay period number of described signal delay device Equal with the pipeline series of general-purpose operation device so that the output data of general-purpose operation device and signal delay device Output label mates.
Circuit the most according to claim 1, it is characterised in that MUX module is by container module With buffer zone module output tape label data to gating to operator block;
In each clock cycle, if container module 1 tape label data pair of output, MUX module will These tape label data are to gating to operator block;
If container module 2 tape label data pair of output, MUX module will not be pressed into described delaying Rush the tape label data of district's module to gating to operator block;
If container module 0 tape label data pair of output, buffer zone module is ejected by MUX module Tape label data are to gating to operator block.
Circuit the most according to claim 8, it is characterised in that MUX module is according to container mould One group of status signal that block provides, effective tape label data container module and buffer zone module exported are to choosing Passing to operator block, gating scheme can guarantee that correctness and the accuracy of the reduction operations of all vectors.
10. according to the circuit described in any one of claim 1 to 9, it is characterised in that the operation of described circuit Sequential is identical with general random access storage device.
CN201410240877.8A 2014-05-30 2014-05-30 A kind of unordered vector reduction circuit based on label Active CN103995688B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410240877.8A CN103995688B (en) 2014-05-30 2014-05-30 A kind of unordered vector reduction circuit based on label

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410240877.8A CN103995688B (en) 2014-05-30 2014-05-30 A kind of unordered vector reduction circuit based on label

Publications (2)

Publication Number Publication Date
CN103995688A CN103995688A (en) 2014-08-20
CN103995688B true CN103995688B (en) 2016-10-12

Family

ID=51309866

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410240877.8A Active CN103995688B (en) 2014-05-30 2014-05-30 A kind of unordered vector reduction circuit based on label

Country Status (1)

Country Link
CN (1) CN103995688B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10108581B1 (en) 2017-04-03 2018-10-23 Google Llc Vector reduction processor
CN111105042B (en) * 2019-12-13 2023-07-25 广东浪潮大数据研究有限公司 Parallel message processing method, system and related device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102004672A (en) * 2010-11-25 2011-04-06 中国人民解放军国防科学技术大学 Reduction device capable of configuring auto-increment interval of reduction target

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9588766B2 (en) * 2012-09-28 2017-03-07 Intel Corporation Accelerated interlane vector reduction instructions

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102004672A (en) * 2010-11-25 2011-04-06 中国人民解放军国防科学技术大学 Reduction device capable of configuring auto-increment interval of reduction target

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种适用于向量处理器的新型归约网络;庄巍等;《小型微型计算机系统》;20121115;第33卷(第11期);第2498页至第2502页 *

Also Published As

Publication number Publication date
CN103995688A (en) 2014-08-20

Similar Documents

Publication Publication Date Title
CN110689126B (en) Device for executing neural network operation
CN104571949B (en) Realize calculating processor and its operating method merged with storage based on memristor
CN107169563B (en) Processing system and method applied to two-value weight convolutional network
US11422720B2 (en) Apparatuses and methods to change data category values
CN105740168B (en) A kind of fault-tolerant directory caching controller
CN107301455B (en) Hybrid cube storage system for convolutional neural network and accelerated computing method
CN108475194A (en) Register communication in on-chip network structure
CN107729989A (en) A kind of device and method for being used to perform artificial neural network forward operation
CN107392309A (en) A kind of general fixed-point number neutral net convolution accelerator hardware structure based on FPGA
CN108713196A (en) The data transmission carried out using bit vector arithmetic unit
CN101918925B (en) Second chance replacement mechanism for a highly associative cache memory of a processor
CN107683505A (en) For calculating the device and method of the cache memory enabled
CN104699631A (en) Storage device and fetching method for multilayered cooperation and sharing in GPDSP (General-Purpose Digital Signal Processor)
CN103730149B (en) A kind of read-write control circuit of dual-ported memory
US10114795B2 (en) Processor in non-volatile storage memory
CN103890857A (en) Shiftable memory employing ring registers
CN104679691B (en) A kind of multinuclear DMA segment data transmission methods using host count for GPDSP
CN106372008B (en) A kind of data cache method and device
CN108257078A (en) Memory knows the source of reordering
CN110476212A (en) Device and method for data switching networks in memory
US11705207B2 (en) Processor in non-volatile storage memory
CN103995688B (en) A kind of unordered vector reduction circuit based on label
CN103761072A (en) Coarse granularity reconfigurable hierarchical array register file structure
CN106339327B (en) A kind of computer system and blade server cabinet
CN109800867B (en) Data calling method based on FPGA off-chip memory

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant