CN102360281B - Multifunctional fixed-point media access control (MAC) operation device for microprocessor - Google Patents

Multifunctional fixed-point media access control (MAC) operation device for microprocessor Download PDF

Info

Publication number
CN102360281B
CN102360281B CN201110336974.3A CN201110336974A CN102360281B CN 102360281 B CN102360281 B CN 102360281B CN 201110336974 A CN201110336974 A CN 201110336974A CN 102360281 B CN102360281 B CN 102360281B
Authority
CN
China
Prior art keywords
simd
totalizer
multiplying
instruction
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110336974.3A
Other languages
Chinese (zh)
Other versions
CN102360281A (en
Inventor
陈书明
李国强
万江华
李振涛
彭元喜
杨惠
陈胜刚
孙书为
陈海燕
王海波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201110336974.3A priority Critical patent/CN102360281B/en
Publication of CN102360281A publication Critical patent/CN102360281A/en
Application granted granted Critical
Publication of CN102360281B publication Critical patent/CN102360281B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a multifunctional fixed-point multiply-add unit media access control (MAC) operation device for a microprocessor. The multifunctional fixed-point MAC operation device comprises an instruction distribution unit, an instruction decoding unit, a storage unit and an instruction operation unit, wherein the instruction operation unit comprises a quaternary pipeline operation structure for multifunctional fixed-point multiply-add unit MAC operation and a result selection module which is used for acquiring an output result of the quaternary pipeline operation structure and writing the output result back to the storage unit; the quaternary pipeline operation structure, from an input end to an output end, sequentially comprises a secondary multiplier operation station, an adder operation station and a compound operation station which is used for performing complex operation, dot product operation and 32-bit multiplication operation; the secondary multiplier operation station comprises a plurality of single instruction multiple data (SIMD) multipliers which are distributed in parallel; and the adder operation station comprises a plurality of SIMD adders which are distributed in parallel. The multifunctional fixed-point MAC operation device supports various kinds of fixed-point accumulation multiplication and has the advantages of few occupation hardware resources, high hardware reusability, good expandability and small programming code amount.

Description

Multi-functional fixed point multiplicaton addition unit MAC arithmetic unit for microprocessor
Technical field
The present invention relates to the arithmetic unit of microprocessor, be specifically related to a kind of single-instruction multiple-data stream (SIMD) (Single Instruction Multiple Data that is applicable to comprise, SIMD) DSP is at multi-functional fixed point multiplicaton addition unit (Multiply Add Cell, the MAC) arithmetic unit of interior microprocessor.
Background technology
In applications such as image processing, Radar Signal Processing and modern communicationses, because deal with data amount is larger, precision and requirement of real-time that data are calculated are high, conventionally need to use the microprocessor of very-high performance to process.Because these algorithms have high multiplying intensive and additive operation intensity, relate to and comprise that in a large number fixed point takes advantage of the fixed point multiply-accumulate that adds/subtract computing, dot-product operation and complex operation, so the fixed-point data processing power of microprocessor seems important all the more.
At present for above-mentioned application characteristic, in existing research, proposed variously for realizing operating mechanism and the hardware implementation structure of above-mentioned fixed point multiply-accumulate, to make it support a large amount of multiplyings, as the M unit of TIC64 series.But the ubiquitous shortcoming of prior art is: 1) only realized the multiplication of fixed point or fixed point and taken advantage of and some calculation functions such as add, can not support the computings such as addition and subtraction, function singleness; 2) take more hardware resource, hardware multiplexing rate is low, poor expandability, and programming code amount is large.
Summary of the invention
The present invention is directed to the problem of above-mentioned prior art, the multiple fixed point multiply-accumulate of a kind of support is provided, takies that hardware resource is few, hardware multiplexing rate is high, extensibility is good, the multi-functional fixed point multiplicaton addition unit MAC arithmetic unit for microprocessor that programming code amount is little.
In order to solve the problems of the technologies described above, the technical solution used in the present invention is:
A kind of multi-functional fixed point multiplicaton addition unit MAC arithmetic unit for microprocessor, comprise instruction dispatch unit, instruction decoding unit, storage unit and ordering calculation unit, described ordering calculation unit comprises for the level Four flowing water operating structure of multi-functional fixed point multiplicaton addition unit MAC computing with for obtaining the Output rusults of described level Four flowing water operating structure and the result that Output rusults writes back storage unit being selected to module, described level Four flowing water operating structure comprises successively secondary multiplier computing station from input end to output terminal, totalizer computing station and for carrying out plural number, the compound operation station of dot product and 32 multiplyings, described secondary multiplier computing station comprises the SIMD multiplier of a plurality of parallel distributed, described totalizer computing station comprises the SIMD totalizer of a plurality of parallel distributed, described secondary multiplier computing station, totalizer computing station selects module to be connected with described result respectively with compound operation station.
Further improvement as technique scheme of the present invention:
Described SIMD multiplier comprise for realize traditional multiplying and SIMD multiplying one-level multiplying module and for complete sign extended, splicing computing secondary multiplying module, described one-level multiplying module is contacted mutually with secondary multiplying module.
Described SIMD multiplier comprises the SIMD control signal input end of controlling for realizing SIMD multiplying pattern, described SIMD control signal input end respectively with one-level multiplying module, secondary multiplying module is connected, described one-level multiplying module and secondary multiplying module are carried out common multiplying by the operand of input when described SIMD control signal input end input invalid signals, described one-level multiplying module and secondary multiplying module are carried out SIMD multiplying by the operand of input when described SIMD control signal input end input useful signal.
Described SIMD totalizer is 40 SIMD fixed point totalizers, and described SIMD totalizer comprises 58 additive operation modules of serial connection successively.
Described compound operation station comprises plural command process module, dot product command process module and 32 multiplying order processing modules.
Described instruction decoding unit comprises instruction area sub-module, 32 bit instruction decoding modules and 16 bit instruction decoding modules, the input end of the input end of described 32 bit instruction decoding modules, 16 bit instruction decoding modules is connected by instruction area sub-module and instruction dispatch unit respectively, the output terminal of described 32 bit instruction decoding modules is connected with storage unit, ordering calculation unit respectively, and the output terminal of described 16 bit instruction decoding modules is connected with storage unit, ordering calculation unit respectively.
Described storage unit comprises local register and totalizer, and described local register is connected with totalizer difference and instruction arithmetic element.
Described local register and totalizer are dual input dual output structure.
the present invention has following advantage:
1, ordering calculation of the present invention unit comprises level Four flowing water operating structure, SIMD can be taken advantage of, SIMD plus-minus method, SIMD takes advantage of to add to take advantage of and subtracts, 8 sites are long-pending, 16 sites are long-pending, 8 plural numbers and 16 plural numbers are fused in an architecture, support efficient fixed point SIMD multiplying, SIMD addition, SIMD subtraction, SIMD takes advantage of and adds, SIMD takes advantage of and subtracts, dot product and complex operation, both gone for SIMD microprocessor, go for again DSP microprocessor, there is the fixed-point multiplication of support class computing abundant species, divide stack reasonable, take resource little, rate of code reuse is high, operational performance is good, calculation function is many, extensibility is strong, advantage applied widely.
2, secondary multiplier computing of the present invention station comprises the SIMD multiplier of a plurality of parallel distributed, from first multiplying order, take advantage of to add to take advantage of and subtract instruction, dot product instruction and plural instruction and all realize by multiplexing SIMD multiplier; Totalizer computing station comprises the SIMD totalizer of a plurality of parallel distributed, and add instruction, subtraction instruction and taking advantage of adds to take advantage of and subtract instruction and dot product and plural instruction and all by multiplexing SIMD totalizer, realize, so the reusability of code is high.
3, instruction decoding unit of the present invention further comprises instruction area sub-module, 32 bit instruction decoding modules and 16 bit instruction decoding modules, can treat with a certain discrimination for the instruction of isotopic number not, is conducive to improve the utilization ratio of hardware.
4, storage unit of the present invention further comprises local register and totalizer, local register is connected with totalizer difference and instruction arithmetic element, therefore can be competent at the fixed-point multiplication class computing of various DSP, scheduling and the efficiency of selection of operand be high, can complete the computing of the various multiplication class of single parts dependent instruction, and when register is not enough, can also operand is dispatched and be selected and use as register when totalizer.
5, local register of the present invention and totalizer are further dual input dual output structure, therefore Parameter storage that both can the multiple SIMD addition of solid line, and can, for the Parameter storage of SIMD multiplication, therefore have advantages of that hardware multiplexing rate is high, function is many, extensibility is strong again.
Accompanying drawing explanation
Fig. 1 is embodiment of the present invention framed structure schematic diagram.
Fig. 2 is the framed structure schematic diagram of embodiment of the present invention level Four flowing water operating structure.
Fig. 3 is the detailed circuit framed structure schematic diagram of the embodiment of the present invention.
Fig. 4 is the framed structure schematic diagram of embodiment of the present invention SIMD multiplier.
Fig. 5 is the framed structure schematic diagram of embodiment of the present invention SIMD totalizer.
Fig. 6 is the framed structure schematic diagram of embodiment of the present invention instruction decoding unit.
Marginal data: 1, instruction dispatch unit; 2, instruction decoding unit; 21, instruction area sub-module; 22,32 bit instruction decoding modules; 23,16 bit instruction decoding modules; 3, storage unit; 31, local register; 32, totalizer; 4, ordering calculation unit; 5, level Four flowing water operating structure; 51, secondary multiplier computing station; 511, SIMD multiplier; 512, one-level multiplying module; 513, secondary multiplying module; 52, totalizer computing station; 521, SIMD totalizer; 522, additive operation module; 53, compound operation station; 531, plural command process module; 532, dot product command process module; 533,32 multiplying order processing modules; 6, result is selected module; 71, one clap director data and control signal bus; 72, one clap and triple time instruction write-back result and address signal bus; 73, two clap instruction write-back result and address signal bus; 74, write back bus.
Embodiment
As Fig. 1, shown in Fig. 2 and Fig. 3, the embodiment of the present invention comprises instruction dispatch unit 1 for the multi-functional fixed point multiplicaton addition unit MAC arithmetic unit of microprocessor, instruction decoding unit 2, storage unit 3 and ordering calculation unit 4, ordering calculation unit 4 comprises for the level Four flowing water operating structure 5 of multi-functional fixed point multiplicaton addition unit MAC computing with for obtaining the Output rusults of level Four flowing water operating structure 5 and the result that Output rusults writes back storage unit 3 being selected to module 6, level Four flowing water operating structure 5 comprises successively secondary multiplier computing station 51 from input end to output terminal, totalizer computing station 52 and for carrying out plural number, the compound operation station 53 of dot product and 32 multiplyings, secondary multiplier computing station 51 comprises the SIMD multiplier 511 of 4 parallel distributed, totalizer computing station 52 comprises the SIMD totalizer 521 of 2 parallel distributed, secondary multiplier computing station 51, totalizer computing station 52 selects module 6 to be connected with result respectively with compound operation station 53.
As shown in Figure 3, the present embodiment adopts 4 grades of streamlined designs, level Four flowing water operating structure 5 comprises successively secondary multiplier computing station 51, totalizer computing station 52 and for carrying out the compound operation station 53 of plural number, dot product and 32 multiplyings from input end to output terminal, secondary multiplier computing station 51 comprises the SIMD multiplier 511 of 4 parallel distributed, totalizer computing station 52 comprises the SIMD totalizer 521 of 2 parallel distributed, and secondary multiplier computing station 51, totalizer computing station 52 and compound operation station 53 select module 6 to be connected with result respectively.Wherein secondary multiplier computing station 51 completes multiplying, totalizer computing station 52 completes the saturated processing of signed magnitude arithmetic(al) and correlated results, compound operation station 53 is for completing result displacement, splicing, correction, the saturated and processing of rounding off, and compound operation station 53 comprises plural command process module 531, dot product command process module 532 and 32 multiplying order processing modules 533.The output terminal at secondary multiplier computing station 51 is clapped instruction write-back result by two and is selected module 6 to be connected with address signal bus 73 and result; The input end at totalizer computing station 52 is clapped director data by one and is connected with storage unit 3 with control signal bus 71 and instruction decoding units 2, the output terminal at totalizer computing station 52 by one clap with triple time instruction write-back result be connected with address signal bus 72 and result selection module 6.In addition, it is more that secondary multiplier computing station 51 comprises that the quantity of SIMD multiplier 511 and the quantity that totalizer computing station 52 comprises SIMD totalizer 521 also can adopt, and its principle is identical with the present embodiment.
In the present embodiment, storage unit 3 comprises local register 31 and totalizer 32, and local register 31 is connected with totalizer 32 difference and instruction arithmetic elements 4.In the present embodiment, local register 31 and totalizer 32 are dual input dual output structure, local register 31 is connected by two-way input port and instruction decoding unit 2 respectively with totalizer 32, and local register 31 is connected with and instruction arithmetic element 4 by two-way output port respectively with totalizer 32.The SIMD addition of 8 of take is example, the SIMD addition of 8 is that Src1 and Src2 are respectively with n 8 corresponding additions, because there are two SIMD totalizers 521 in totalizer computing station 52, therefore by dual input dual output structure, can realize the SIMD addition of two kinds 8: adopt two input ports to realize 48 corresponding additions, or adopt 4 input ports to realize 88 corresponding additions.Equally, dual input dual output structure also goes for SIMD multiplying, and dual input dual output structure has advantages of that hardware multiplexing rate is high, diverse in function, extensibility are strong.
As shown in Figure 4, SIMD multiplier 511 comprise for realize traditional multiplying and SIMD multiplying one-level multiplying module 512 and for complete sign extended, splicing computing secondary multiplying module 513, one-level multiplying module 512 is contacted mutually with secondary multiplying module 513, the input end of one-level multiplying module 512 respectively and instruction decoding unit 2, storage unit 3 is connected, and the output terminal of secondary multiplying module 513 is connected with totalizer computing station 52.
SIMD multiplier 511 comprises the SIMD control signal input end of controlling for realizing SIMD multiplying pattern, SIMD control signal input end is connected with one-level multiplying module 512, secondary multiplying module 513 respectively, one-level multiplying module 512 is carried out common multiplying by the operand of input with secondary multiplying module 513 when SIMD control signal input end is inputted invalid signals, and one-level multiplying module 512 is carried out SIMD multiplying by the operand of input with secondary multiplying module 513 when SIMD control signal input end is inputted useful signal.In the present embodiment, from the SIMD control signal of SIMD control signal input end input, SIMD control signal, in one-level multiplying module 512, is distinguished the treatment step of Attended Operation numerical symbol expansion and processing, symbol pre-service, latches; SIMD control signal, in secondary multiplying module 513, participates in the treatment step that multiplex adapter (MUX) carries out multiplexing.Except SIMD control signal (SIMD), the input signal of SIMD multiplier 511 also comprises two-way operand Src1 and Src1, two-way symbolic number signal Sign1 and Sign2.SIMD multiplier 511 has been controlled SIMD computing and common computing by SIMD control signal.
(1) under SIMD pattern, 28 the SIMD multiplyings that walked abreast of this multiplier:
Dst[15:0]=?Src1[7:0]×Src2[7:?0]
Dst[31:16]=?Src1[15:8]×Src2[15:?8]
Still obtain the product of 1 32,16 results of depositing respectively 28 multiplication of its height.Now the multiplier of two 16 * 8 completes respectively the computing of 8 * 8.
While calculating low 16 results, only the least-significant byte of Src1 need be carried out to sign extended to 16 and send into first 16 * 8 multiplier with the least-significant byte of Src2, low 16 of 24 result Dst_L of gained are our required SIMD low level result.
While calculating high 16 results, at this moment the least-significant byte of Src1 is got to zero most-significant byte rear and Src2 and sent into second 16 * 8 multiplier, high 16 of 24 result Dst_H of gained are the high-order result of our required SIMD.
(2), under general mode, this multiplier, as traditional multiplier, completes traditional multiplying of 16:
Dst[31:0]=?Src1[15:0]×Src2[15:?0]
The multiplier of two 16 * 8 completes respectively Src1[15:0] * Src2[15:8] and Src1[15:0] * Src2[7:0] computing, operation result is Dst_H and Dst_L.To Dst_L[23:8] to carry out sign extended be 24, then be added with Dst_H, obtain the result of 24.This result and Dst_L[7:0] splicing is our required Dst[31:0].
As shown in Figure 5, SIMD totalizer 521 is 40 SIMD fixed point totalizers, SIMD totalizer 521 comprises 58 additive operation modules 522 of serial connection successively, and SIMD totalizer 521 can complete 48 and have or not symbol plus-minus method or 2 16 to have or not symbol plus-minus method or 1 32 to have or not symbol plus-minus method or one 40 to have or not symbol plus-minus method in a bat.
As shown in Figure 6, instruction decoding unit 2 comprises instruction area sub-module 21,32 bit instruction decoding modules 22 and 16 bit instruction decoding module 23 and latchs, the input end of the input end of 32 bit instruction decoding modules 22,16 bit instruction decoding modules 23 is connected by instruction area sub-module 21 and instruction dispatch unit 1 respectively, the output terminal of 32 bit instruction decoding modules 22 is connected with storage unit 3, ordering calculation unit 4 respectively, and the output terminal of 16 bit instruction decoding modules 23 is connected with storage unit 3, ordering calculation unit 4 respectively.Instruction decoding unit 2, by treating with a certain discrimination for the instruction of isotopic number not, is conducive to improve the utilization ratio of hardware.Instruction area sub-module 21 is used for distinguishing the instruction that length is 16 and 32, and illegal command.32 bit instruction decoding modules 22 are used for decoding 32 bit instructions, and 16 bit instruction decoding modules 23 are used for decoding 16 bit instructions, the read signal that decoding obtains and read address and send to local register 31 or totalizer 32.Latch is exported to ordering calculation unit 4 after control signal is latched to a bat.Decoding is carried out in the instruction that instruction decoding unit 2 distributes instruction dispatch unit 1, counts request and reads address, and will carry out after control signal latch a bat and send to the instruction of ordering calculation unit 4 transmissions to local register 31 and totalizer 32 transmission read operations.Ordering calculation unit 4 is for carrying out various computings to described operand and control signal, obtains operation result and operation result is write back to the local register 31 of storage unit 3 or totalizer 32 etc.
Take the specific works process of specific algorithm as example explanation embodiment below.
1, SIMD multiplying order.
SIMD multiplying order is 8,16 and 32 multiplying orders, and multiplying order is divided into without symbol to be taken advantage of without symbol, without symbol, taken advantage of symbol, had symbol take advantage of without symbol and have symbol to take advantage of symbol.
8 and 16 multiplying orders are all by completing computing in one-level multiplying module 512 and secondary multiplying module 513.
The algorithm idea of 32 multiplication is: Dst=(HH<<32)+(HL<<16)+(LH<<16)+LL, wherein Dst is 64 bit arithmetic results, HH is high 16 32 results that multiply each other of high 16 and the operand 2 of operand 1, HL is low 16 32 results that multiply each other of high 16 and the operand 2 of operand 1, LH is high 16 32 results that multiply each other of low 16 and the operand 2 of operand 1, LL is low 16 32 results that multiply each other of low 16 and the operand 2 of operand 1.First 32 multiplying orders carry out multiplying in 4 SIMD multipliers 511, then in totalizer computing station 52, complete shifter-adder for the first time, then in compound operation station 53, complete last shifter-adder and revise, and its concrete methods of realizing is as follows:
1) in 4 SIMD multipliers 511, carry out following computing:
HH=Src1_H*Src2_H, HL=Src1_H*Src2_L, LH=Src1_L*Src2_H, LL=Src1_L*Src2_L, wherein Src1_H is the high 16 of operand 1, Src2_H is the high 16 of operand 2, and Src1_L is the low 16 of operand 1, and Src2_L is the low 16 of operand 2.
2) in totalizer computing station 52, high 16 bit signs of HL expand to 32, then in first SIMD totalizer, complete addition with the HH of 32, obtain the result of 32, and low 16 splicings of this result and HL obtain the result of 48.In second SIMD totalizer, adopt to use the same method and realized (LH<<16)+LL, obtain 32 results.
3) result of 32 in compound operation station 53, second SIMD totalizer 521 in totalizer computing station 52 being obtained expands to 48, then with totalizer computing station 52 in 48 results added.This process completes in multiplying order processing module, adopts specifically 48 totalizers just passable, and low 16 splicings of its result and LL can obtain the result of 64 of 32 multiplication.
32 take advantage of the algorithm idea of 16 multiplication to be: Dst=(HL<<16)+LL, wherein Dst is that 48 bit arithmetic result HL are that low 16 32 results, LL that multiply each other of high 16 and the operand 2 of operand 1 are low 16 32 results that multiply each other of low 16 and the operand 2 of operand 1.In 2 SIMD multipliers, calculate and can complete HL and LL computing, in totalizer station, by high 16 of LL, expand to 32, complete to be added obtaining the result of 32 with HL, then obtain last 48 result Dst with low eight splicings of LL.16 take advantage of the thought of 32 multiplication to analogize to obtain it.
2, SIMD add instruction.
SIMD subtraction instruction is divided into 8,16,32 and 40 add instructions, supports respectively have symbol and without symbol addition, and immediate addition.Command operating from instruction decoding unit 2 out, send from storage unit 3 by source operand, and both are directly sent to totalizer computing station 52 and complete computing in SIMD totalizer 521.
3, SIMD subtraction instruction.
SIMD subtraction instruction is divided into 8,16,32 and 40 subtraction instructions, supports respectively have symbol and without symbol subtraction, and immediate subtraction.Command operating from instruction decoding unit 2 out, send from storage unit 3 by source operand, and both are directly sent to totalizer computing station 52 and complete computing in SIMD totalizer 521.
4, SIMD takes advantage of and adds instruction.
SIMD takes advantage of and subtracts instruction and be divided into 8 and 16 and take advantage of and add instruction.SIMD takes advantage of that to add instruction be 3 operand instruction, operand comprises multiplier, multiplicand and addend, wherein multiplier and multiplicand are from local register 31, addend is from totalizer 32, the multiplication part of multiplier and multiplicand completes in SIMD multiplier 511, its result completes addition section in the SIMD totalizer 521 at totalizer computing station 52, by a bat and triple time instruction write-back result and address signal bus 72 in 3 clap instruction write-back buses and be sent to result and select module 6, then write back totalizer 32 by writing back bus 74.
5, SIMD takes advantage of and subtracts instruction.
SIMD takes advantage of and subtracts instruction and be divided into 8 and 16 and take advantage of and subtract instruction.SIMD takes advantage of that to subtract instruction be 3 operand instruction, operand comprises multiplier, multiplicand and subtrahend, wherein multiplier and multiplicand are from local register 31, subtrahend is from totalizer 32, the multiplication part of multiplier and multiplicand completes in SIMD multiplier 511, its result completes subtraction part in the SIMD totalizer 521 at totalizer computing station 52, by a bat and triple time instruction write-back result and address signal bus 72 in 3 clap instruction write-back buses and be sent to result and select module 6, then write back totalizer 32 by writing back bus 74.
6, dot product class instruction.
The instruction of dot product class is divided into the long-pending instruction in 8 Ji He16 sites, site.The multiplication part of dot product class ordering calculation completes in SIMD multiplier 511, and summation part completes in totalizer computing station 52, and the result correction portion that whether rounds off is graded and operated in compound operation station 53 and complete.
7, plural class instruction.
The instruction of plural number class is divided into 8 plural numbers and 16 plural instructions.The multiplication part of plural number class ordering calculation completes in SIMD multiplier 511, and summation part completes in totalizer computing station 52, and the result correction portion that whether rounds off is graded and operated in compound operation station 53 and complete.
Wherein, SIMD addition and SIMD subtraction are that 1 bat completes, and in SIMD multiplication 8 and 16 multiplication are 2 bat instructions, 32 to take advantage of 16 multiplication and 16 to take advantage of 32 multiplication be that 3 bats complete, 32 multiplying orders are that 4 bats complete, and SIMD takes advantage of to add with SIMD and takes advantage of and be kept to 3 bats and complete, and dot product and plural instruction are that 4 bats complete.Result is selected 6 of modules to complete the instruction that each station has been calculated and is selected processing, avoid causing the instruction of different beats to flow out simultaneously, be sent to afterwards and write back bus 74, write back bus 74 access local register 31 and totalizers 32, thereby result is write back to local register 31 or totalizer 32.
The above is only the preferred embodiment of the present invention, and protection scope of the present invention is also not only confined to above-described embodiment, and all technical schemes belonging under thinking of the present invention all belong to protection scope of the present invention.It should be pointed out that for those skilled in the art, some improvements and modifications without departing from the principles of the present invention, these improvements and modifications also should be considered as protection scope of the present invention.

Claims (2)

1. the multi-functional fixed point multiplicaton addition unit MAC arithmetic unit for microprocessor, comprise instruction dispatch unit (1), instruction decoding unit (2), storage unit (3) and ordering calculation unit (4), it is characterized in that: described ordering calculation unit (4) comprises for the level Four flowing water operating structure (5) of multi-functional fixed point multiplicaton addition unit MAC computing with for obtaining the Output rusults of described level Four flowing water operating structure (5) and the result that Output rusults writes back storage unit (3) being selected to module (6), described storage unit (3) comprises local register (31) and totalizer (32), described local register (31) is connected with totalizer (32) difference and instruction arithmetic element (4), described local register (31) and totalizer (32) are dual input dual output structure, described local register (31) is connected by two-way input port and instruction decoding unit (2) respectively with totalizer (32), described local register (31) is connected by two-way output port and instruction arithmetic element (4) respectively with totalizer (32), described level Four flowing water operating structure (5) comprises successively secondary multiplier computing station (51) from input end to output terminal, totalizer computing station (52) and for carrying out plural number, the compound operation station (53) of dot product and 32 multiplyings, described secondary multiplier computing station (51) comprises the SIMD multiplier (511) of a plurality of parallel distributed, described totalizer computing station (52) comprises the SIMD totalizer (521) of a plurality of parallel distributed, described secondary multiplier computing station (51), totalizer computing station (52) selects module (6) to be connected with described result respectively with compound operation station (53), described SIMD multiplier (511) comprise for realize traditional multiplying and SIMD multiplying one-level multiplying module (512) and for complete sign extended, splicing computing secondary multiplying module (513), described one-level multiplying module (512) is contacted mutually with secondary multiplying module (513), described SIMD multiplier (511) comprises the SIMD control signal input end of controlling for realizing SIMD multiplying pattern, described SIMD control signal input end respectively with one-level multiplying module (512), secondary multiplying module (513) is connected, described one-level multiplying module (512) is carried out common multiplying by the operand of input with secondary multiplying module (513) when described SIMD control signal input end is inputted invalid signals, described one-level multiplying module (512) is carried out SIMD multiplying by the operand of input with secondary multiplying module (513) when described SIMD control signal input end is inputted useful signal, described SIMD totalizer (521) is 40 SIMD fixed point totalizers, and described SIMD totalizer (521) comprises 58 additive operation modules (522) of serial connection successively, described compound operation station (53) comprises plural command process module (531), dot product command process module (532) and 32 multiplying order processing modules (533).
2. the multi-functional fixed point multiplicaton addition unit MAC arithmetic unit for microprocessor according to claim 1, it is characterized in that: described instruction decoding unit (2) comprises instruction area sub-module (21), 32 bit instruction decoding modules (22) and 16 bit instruction decoding modules (23), the input end of described 32 bit instruction decoding modules (22), the input end of 16 bit instruction decoding modules (23) is connected by instruction area sub-module (21) and instruction dispatch unit (1) respectively, the output terminal of described 32 bit instruction decoding modules (22) respectively with storage unit (3), ordering calculation unit (4) is connected, the output terminal of described 16 bit instruction decoding modules (23) respectively with storage unit (3), ordering calculation unit (4) is connected.
CN201110336974.3A 2011-10-31 2011-10-31 Multifunctional fixed-point media access control (MAC) operation device for microprocessor Active CN102360281B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110336974.3A CN102360281B (en) 2011-10-31 2011-10-31 Multifunctional fixed-point media access control (MAC) operation device for microprocessor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110336974.3A CN102360281B (en) 2011-10-31 2011-10-31 Multifunctional fixed-point media access control (MAC) operation device for microprocessor

Publications (2)

Publication Number Publication Date
CN102360281A CN102360281A (en) 2012-02-22
CN102360281B true CN102360281B (en) 2014-04-02

Family

ID=45585615

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110336974.3A Active CN102360281B (en) 2011-10-31 2011-10-31 Multifunctional fixed-point media access control (MAC) operation device for microprocessor

Country Status (1)

Country Link
CN (1) CN102360281B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106227508A (en) * 2016-07-25 2016-12-14 中国科学院计算技术研究所 A kind of without back edge data stream round-robin method, system, device, chip

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104243131B (en) * 2014-09-29 2017-12-08 瑞斯康达科技发展股份有限公司 A kind of clock synchronizing method and device
CN104901651A (en) * 2015-06-25 2015-09-09 福州瑞芯微电子有限公司 Realizing circuit and method of digital filter
CN106775579B (en) * 2016-11-29 2019-06-04 北京时代民芯科技有限公司 Floating-point operation accelerator module based on configurable technology
WO2019023910A1 (en) * 2017-07-31 2019-02-07 深圳市大疆创新科技有限公司 Data processing method and device
CN109214273A (en) * 2018-07-18 2019-01-15 平安科技(深圳)有限公司 Facial image comparison method, device, computer equipment and storage medium
CN108985232A (en) * 2018-07-18 2018-12-11 平安科技(深圳)有限公司 Facial image comparison method, device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1147307A (en) * 1994-05-03 1997-04-09 先进Risc机器有限公司 Data processing with multiple instruction sets
CN1366234A (en) * 2000-12-19 2002-08-28 国际商业机器公司 Operation circuit and operation method
CN1481526A (en) * 2000-12-13 2004-03-10 �����ɷ� Cryptographic processor
CN1598757A (en) * 2004-09-02 2005-03-23 中国人民解放军国防科学技术大学 Design method of number mixed multipler for supporting single-instruction multiple-operated
CN1963745A (en) * 2006-12-01 2007-05-16 浙江大学 High speed split multiply accumulator apparatus

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7660841B2 (en) * 2004-02-20 2010-02-09 Altera Corporation Flexible accumulator in digital signal processing circuitry

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1147307A (en) * 1994-05-03 1997-04-09 先进Risc机器有限公司 Data processing with multiple instruction sets
CN1481526A (en) * 2000-12-13 2004-03-10 �����ɷ� Cryptographic processor
CN1366234A (en) * 2000-12-19 2002-08-28 国际商业机器公司 Operation circuit and operation method
CN1598757A (en) * 2004-09-02 2005-03-23 中国人民解放军国防科学技术大学 Design method of number mixed multipler for supporting single-instruction multiple-operated
CN1963745A (en) * 2006-12-01 2007-05-16 浙江大学 High speed split multiply accumulator apparatus

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106227508A (en) * 2016-07-25 2016-12-14 中国科学院计算技术研究所 A kind of without back edge data stream round-robin method, system, device, chip

Also Published As

Publication number Publication date
CN102360281A (en) 2012-02-22

Similar Documents

Publication Publication Date Title
CN102360281B (en) Multifunctional fixed-point media access control (MAC) operation device for microprocessor
US10445451B2 (en) Processors, methods, and systems for a configurable spatial accelerator with performance, correctness, and power reduction features
US10514912B2 (en) Vector multiplication with accumulation in large register space
CN102231102B (en) Method for processing RSA password based on residue number system and coprocessor
TWI537823B (en) Methods, apparatus, instructions and logic to provide vector population count functionality
CN101916177B (en) Configurable multi-precision fixed point multiplying and adding device
WO2009035185A1 (en) Reconfigurable array processor for floating-point operations
CN104111816A (en) Multifunctional SIMD structure floating point fusion multiplying and adding arithmetic device in GPDSP
CN102122275A (en) Configurable processor
CN102012893B (en) Extensible vector operation device
CN102495719A (en) Vector floating point operation device and method
CN102411558A (en) Vector processor oriented large matrix multiplied vectorization realizing method
CN102184092A (en) Special instruction set processor based on pipeline structure
CN103970720A (en) Embedded reconfigurable system based on large-scale coarse granularity and processing method of system
US20200334042A1 (en) Method and device (universal multifunction accelerator) for accelerating computations by parallel computations of middle stratum operations
WO2014004394A1 (en) Vector multiplication with operand base system conversion and re-conversion
CN105607889A (en) Fixed-point and floating-point operation part with shared multiplier structure in GPDSP
CN102279818A (en) Vector data access and storage control method supporting limited sharing and vector memory
CN102339217A (en) Fusion processing device and method for floating-point number multiplication-addition device
CN106575219A (en) Instruction and logic for a vector format for processing computations
CN100367191C (en) Fast pipeline type divider
CN101630244B (en) System and method of double-scalar multiplication of streamlined elliptic curve
CN104407836A (en) Device and method of carrying out cascaded multiply accumulation operation by utilizing fixed-point multiplier
CN102629238B (en) Method and device for supporting vector condition memory access
CN102012802B (en) Vector processor-oriented data exchange method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant