CN105512088B

CN105512088B - A kind of restructural processor architecture and its reconstructing method

Info

Publication number: CN105512088B
Application number: CN201510868187.1A
Authority: CN
Inventors: 刘小明; 洪; 洪一; 马强; 黄光红; 王媛; 刘谷; 李岩; 万晓佳
Original assignee: CETC 38 Research Institute
Current assignee: CETC 38 Research Institute
Priority date: 2015-11-27
Filing date: 2015-11-27
Publication date: 2018-08-10
Anticipated expiration: 2035-11-27
Also published as: CN105512088A

Abstract

A kind of restructural processor architecture, its it is internal comprising 4 instruction buffers, 16 operations are macro, shared data memory, shared routing memory and peripheral hardware, 16 operations are macro to be connected to instruction buffer by 4 instruction bus, instruction buffer is connected to crossbar switch, each peripheral hardware is also connect with crossbar switch, operation is macro, crossbar switch is connected to shared data memory, and instruction buffer is connected to shared routing memory, and instruction bus is by the way of flowing water；The restructural processor includes two kinds of operating modes：Discrete and reform patterns；Reform patterns are macro by operation and instruction flow line recombinates, and build the Logic Core of different scales.The present invention also provides a kind of reconstructing methods.The advantage of the invention is that：For different application demand, processor structure recombination is carried out, realizes that the computing platform generalization under different application feature, unified processor framework facilitate user to learn.

Description

A kind of restructural processor architecture and its reconstructing method

Technical field

The present invention relates to a kind of restructural processor architectures, belong to microprocessor Design technical field.

Background technology

In modern signal processing application, the case where parallel two kinds of tupes of task-level parallelism and data level produce a polarization day It is beneficial serious.In Project Realization, for the requirement in different application field, different processor hardware platforms need to be built.Oriented mission Grade is parallel, need to build multi-core processor platform, adapt to the requirement of Multi-task Concurrency；And data level is parallel, then needs at structure monokaryon Device platform is managed, towards single task application, the single core processor for selecting processing capacity strong builds system.The diversity of mission requirements, Cause processor architecture various, it is inconvenient that user learns to realize.

Processor with reconstruction structure can be directed to different application demand, carry out processor structure recombination, make every effort to solve Under different application feature, the unitized problem of computing platform is current domestic and international research hotspot.

Invention content

The technical problem to be solved of the present invention is to provide one kind under different application feature, and computing platform can lead to With the restructural processor architecture of change.

The present invention solves above-mentioned technical problem using following technical scheme：A kind of restructural processor architecture, in Portion includes macro 4 instruction buffers, 16 operations, shared data memory, shared routing memory and peripheral hardware, 16 operations Macro to be connected to instruction buffer by 4 instruction bus, instruction buffer is connected to crossbar switch, and each peripheral hardware also connects with crossbar switch It connects, operation is macro, crossbar switch is connected to shared data memory, and instruction buffer is connected to shared routing memory, instruction bus By the way of flowing water；

The restructural processor includes two kinds of operating modes：Discrete and reform patterns；

Discrete mode：Processor under the pattern forms a 4 core homogenous-processors, includes that 4 operations are macro per core, and 1 A instruction buffer and 1 instruction bus；

Reform patterns：The pattern is macro by operation and instruction flow line recombinates, and builds the Logic Core of different scales, processor Internal selection as needed is 1 core either 2 cores or 3 cores or 4 cores, includes 1-16 inside each core Operation is macro.

As further specific technical solution, every instruction bus can once emit 8 48bit coding lines, singly refer to Order is at best able to macro participations of 16 operations of driving and calculates, the internal instruction flow line not used and operation is macro is arranged to low-power consumption mode.

As further specific technical solution, each operation it is macro comprising privately owned data storage, distribution decoding logic, Arithmetic unit and direct memory access controller；

Arithmetic unit by a 128*32bit word local general register group, 8 arithmetic logic units, 8 multiplication Device, 4 shift units and 1 are super, and device of calculating is constituted, and is supported 16bit, 32bit, single precision and double-precision floating point formatted data to calculate, is led to Register is used to be responsible for the data interaction between arithmetic unit and macro interior private data memory as data temporary storage equipment, operation is macro Inside includes 3 independent 64bit data/address bus.

As further specific technical solution, the instruction bus of the restructural processor architecture uses flowing water side Formula, is divided into two parts of leading portion and back segment, 7 grades of flowing water of leading portion, 8 grades of back segment by totally 15 grades, wherein instruction bus leading portion is to patrol Collect the common portion of core, each Logic Core 1 totally 4；Instruction bus back segment is in the macro inside of each operation, is each operation It is macro privately owned, totally 16.

As further specific technical solution, data and program are stored according to the bit wide of 32bit, using byte Addressing, it is 48bit that processor, which uses 48bit instruction set, machine code bit wide, and Logic Core each clock cycle at most receives 8 The entry address of 48bit single-word instructions, all subprograms and interrupt service routine is 16 32bit words alignment.

As further specific technical solution, which includes macro privately owned, the publicly-owned and stream of operation Waterline synchronous control three classes program control register, the macro interior privately owned program control register of operation is according to the macro information of this operation It is configured, the isochronous controller of the macro interior common program control register and 4 assembly lines of operation is macro according to all operations Privately owned register is updated, and operation is macro and the simultaneously operating of assembly line leading portion is controlled by macro interior publicly-owned and pipeline synchronization Control register.

As further specific technical solution, the privately owned data storage in the macro inside of operation interior is used for data exchange with macro Register file between be connected by 3 data buses, the data bit width position 64bit of every bus, macro interior private data storage Device is divided into 6 data block, each block containing 2 data bank, each bank data bit widths 32bit, data bank by Dual-port SRAM is built.

As further specific technical solution, the shared data memory is divided into 4 storage block, each data Block is divided into 32 storage bank, and operation is connected to shared data memory by shared data memory interface.

The invention also discloses the reconstructing methods of the restructural processor architecture described in any of the above-described scheme, including under State step：

Step 1：A reconstruct configuration module is added, is articulated on crossbar switch as shared peripheral hardware, reconstruct configuration Module includes one group of configuration register for reconfiguration information to be arranged；

Step 2：After electrification reset, processor enters the loading procedure of personal code work, and processor is in discrete work at this time Pattern waits the same 4 core homogenous-processor loading procedure；

Step 3：Access instruction setting reconstruct configuration module, after all configuration registers are provided with, and loading procedure After by restructuring directive complete structural rearrangement, just start the execution of personal code work after completing restructuring directive.

As further specific technical solution, the instruction bus of the restructural processor architecture uses flowing water side Formula, is divided into two parts of leading portion and back segment, 7 grades of flowing water of leading portion, 8 grades of back segment by totally 15 grades, wherein instruction bus leading portion is to patrol Collect the common portion of core, each Logic Core 1 totally 4；Instruction bus back segment is in the macro inside of each operation, is each operation Macro privately owned, totally 16, the leading portion of instruction bus is responsible for the acquisition instruction packet from instruction buffer, and being extracted in being wrapped from instruction can be simultaneously The instruction execution row of execution is transmitted to the processing of instruction bus back segment, while being believed according to the synchronous control of instruction bus back segment feedback Breath, the execution of control instruction bus leading portion, instruction bus back segment judge that this operation is macro and which are belonged to according to reconstruct configuration information Logic Core, and choose corresponding instruction bus leading portion and execute user program.

The advantage of the invention is that：The restructural processor architecture of the present invention can be directed to different application demand, be handled Device structural rearrangement realizes that the computing platform generalization under different application feature, unified processor framework facilitate user to learn.

Description of the drawings

Fig. 1 is the restructural schematic diagram of restructural processor architecture of the invention；

Fig. 2 is the restructural processor architecture figure of the present invention；

Fig. 3 is the macro micro-architecture figure of the operation in the restructural processor architecture of the present invention；

Fig. 4 is privately owned, publicly-owned and pipeline synchronization control register setting relational graph；

Fig. 5 is 48bit machine instruction forms；

Fig. 6 is assembly instruction form.

Specific implementation mode

The present invention is described in detail below in conjunction with attached drawing.

A kind of restructural processor architecture, the operation built with one group of operation and storage unit are macro for minimal reconstruction Grain is completed the recombination of processor inside structure, is applicable in and does not have to application scenarios demand using static reconfiguration mode.

Include 4 instruction buffers (Instruction Cache) inside the restructural processor architecture, 16 operations are macro, share number According to memory, shared routing memory and peripheral hardware.16 operations are macro to be connected to instruction buffer by instruction bus, instruction Caching is connected to crossbar switch, and each peripheral hardware is also connect with crossbar switch, and operation is macro, crossbar switch is connected to shared data storage Device, instruction buffer are connected to shared routing memory.Instruction bus is by the way of flowing water.

Peripheral hardware in the present embodiment includes extended menory, interrupt control unit, serializer/decoder, timer, general defeated Enter/delivery outlet, asynchronous receiving-transmitting transmitter.

Refering to Figure 1, the restructural processor includes mainly two kinds of operating modes：Discrete and reform patterns.

Discrete mode：Processor under the pattern forms a 4 core homogenous-processors, includes that 4 operations are macro per core, and 1 A instruction buffer and 1 instruction bus, 4 operations are macro to be：Macro 0~macro 3 be Logic Core 0；Macro 4~macro 7 be Logic Core 1；Macro 8~ Macro 11 be Logic Core 2；Macro 12~macro 15 be Logic Core 3.

Reform patterns：The pattern can operation is macro and instruction flow line recombinate, build the Logic Core of different scales.Herein Under the conditions of, it can be 1 core as needed inside processor, can also be 2 cores, 3 cores or 4 cores, in each core The number that the operation in portion is macro can be different, and each processor Logic Core includes that 1-16 operation is macro, and every instruction bus is primary Can emit 8 48bit coding lines, single instrction can at most drive the macro participation of 16 operations to calculate, instruction flow line that inside does not use and Operation is macro to can be set to low-power consumption mode.

It includes the macro core of 16 operations that recombination structure shown on the right side of Fig. 1, which is one, only uses an instruction flow line.

The reconstructing method of the restructural processor architecture includes the following steps：

Step 1：A reconstruct configuration module (core_fusion_config) is added, friendship is articulated in as shared peripheral hardware Fork is shut, which includes one group of configuration register, for reconfiguration information to be arranged, as the selection of instruction flow line is posted Storage；

Step 3：Access instruction setting reconstruct configuration module, after all configuration registers are provided with, and loading procedure After by restructuring directive (core_fusion) complete structural rearrangement, just start personal code work after completing restructuring directive It executes.

It is minimum reconstitutable particles that above-mentioned operation is macro, as shown in figure 3, each operation it is macro comprising privately owned data storage, Distribute decoding logic, arithmetic unit and DMA (Direct Memory Access, direct memory access) controller.Arithmetic unit By the local general register group of a 128*32bit word, 8 arithmetic logic units (ALU), 8 multipliers, 4 shift units It is constituted with 1 super device (SPU) of calculating, 16bit, 32bit, single precision and double-precision floating point formatted data is supported to calculate.General deposit Device is responsible for the data interaction between arithmetic unit and macro interior private data memory as data temporary storage equipment.The macro internal packet of operation Containing 3 independent 64bit data/address bus, it at most can be achieved at the same time 2 and write 1 reading or 21 write operations of reading.Macro interior data space is drawn It is divided into 6 block, each block sizes are 64KB.

The instruction bus of the restructural processor architecture uses pipeline mode, totally 15 grades, is divided into leading portion and back segment Two parts, 7 grades of flowing water of leading portion, 8 grades of back segment.Wherein, instruction bus leading portion is the common portion of Logic Core, each Logic Core 1 Item totally 4；Instruction bus back segment is in the macro inside of each operation, is that each operation is macro privately owned, totally 16.Before instruction bus Section is responsible for the acquisition instruction packet from command cache, and extracts the instruction execution row that can be performed simultaneously in being wrapped from instruction, is transmitted to finger The synchronically controlling information for enabling bus back segment handle, while being fed back according to instruction bus back segment, the execution of control instruction bus leading portion. Instruction bus back segment judges that this operation is macro and which Logic Core is belonged to according to reconstruct configuration information, and it is total to choose corresponding instruction Line leading portion executes user program.

The data and program of processor are using unified shared memory space, and data and program are according to the position of 32bit Width is stored, using byte addressing.Processor uses 48bit instruction set, and machine code bit wide is 48bit, and Logic Core is each Clock cycle at most can receive 8 48bit single-word instructions.Due to 48 non-2ⁿ, to ensure that a fetching length is not less than 8*48bit (i.e. 12*32bit) and ensure that instruction buffer design is simple (if a fetching 12*32bit, because of 12 non-2n, so instruction buffer Missing decision logic will be extremely complex, and instruct the production device of PC, need to introduce plus 12 operations), during practical fetching, control Logic processed carries out fetching from privately owned first-level instruction caching in such a way that 16 32bit are aligned, and before putting it into assembly line Section caching.The complete row that executes can be emitted in Logic Core by assembly line leading portion according to the EOL mark of instruction.If control Fetching process of logic can not obtain the complete row that executes and just need the splicing for completing to execute row with next fetching row.For letter The entry address of the splicing logic of change instruction, all subprograms and interrupt service routine is both needed to be aligned for 16 32bit words.

One of the most significant problems that restructural processor architecture faces be exactly adhere to separately same logic kernel operation it is macro between Operation synchronizes.Cause stationary problem main conditions include：

1, the related how macro stationary problem caused of single macro internal data；

2, the how macro synchronization that program control causes；

Synchronous control is solved the problems, such as by two aspects of processor architecture and programming constraint.In terms of programming constraint, processor Data correlation change and handled by software detection.When being programmed using assembler language, compilation tools chain once detects data correlation It can remind programming personnel, programming personnel that can selectively modify by way of compiling and alerting, if advanced using C etc. Programming with Pascal Language, compiler can voluntarily release correlation.

In terms of processor architecture, signal localization will be controlled by using program control class register, avoided as possible complete The generation of office's property control signal.Distribution decoding logic is distributed to the macro inside of each operation, it is macro that simultaneously operating splits into operation Internal and instruction flow line two class of leading portion.When there are the simultaneously operatings such as assembly line removing and pause, assembly line leading portion and operation It is macro respectively to be handled according to respective control register.

As shown in figure 4, program control class register can be divided into macro privately owned, the publicly-owned and pipeline synchronization control three of operation Class.Privately owned register root in operation is macro is configured according to the macro information of this operation, the macro interior common register of operation and 4 streams The isochronous controller of waterline can be updated according to the macro privately owned register of all operations.Operation is macro and the synchronization of assembly line leading portion Operation is controlled by macro interior publicly-owned and pipeline synchronization control register.The localization of synchronically controlling information, avoided the occurrence of More global control, influences chip sequential, meanwhile, it is convenient for reconstruct arbitrary number operation macro in Logic Core.

Memory in the present invention is broadly divided on piece and piece external expansion memory two types, uses depositing for stratification Storage structure.Have three storage levels：Each instruction flow line possesses the first-level instruction caching of oneself, each operation is macro possess it is privately owned Data storage；All instructions assembly line, the data that operation is macro and peripheral hardware is shared and program storage；External extended menory, Such as：DDR3 etc..

Operation is macro to carry out the minimal parts of Logic Core reconstruct, and all arithmetic logical operations concentrate on macro interior arithmetic unit Middle progress.It can flexibly reconstruct, and arithmetic unit oepration at full load, be provided in the macro inside of operation privately owned to ensure that operation is macro Data storage, being connected by 3 data buses between the macro interior register file for data exchange, (two, which read one, writes or two Write a reading), the data bit width position 64bit of every bus.Macro interior private data memory is divided into 6 data block, each Block contains 2 data bank, and each bank data bit widths 32bit, data bank are built by dual-port SRAM.

16 operations are macro, On-Chip peripheral passes through extended menory interface and on piece shared SRAM and piece external expansion memory DDR3 is into row data communication.Because of the case where shared data memory accesses memory simultaneously there are multiple primary processors, in order to The efficiency of memory access is improved, shared data memory, which is divided into 4, in the framework stores block, and each data block is drawn It is divided into 32 storage bank.User instruction can also be stored in external extension in addition to it can be stored in shared routing memory In DDR3.When the address space that instruction buffer accesses not in program memory, DDR3 can be connected by crossbar switch.

Assembly instruction in the present invention is mainly made of predicate information, the macro information of operation and operation description information, and machine refers to It enables as shown in Figure 5.Arbitrary instruction can be according to the macro interior one group of 16bit registers-P [15 of operation：0] a certain position is (by machine code in In the 35th~32 it is specified) choose whether really to execute.The scale of each Logic Core can be macro according to operation according to actual demand Carry out arbitrary recombination, each Logic Core may include that the macro number of operation is differed by 1~16, the operation of every order-driven it is macro by The macro information of operation in assembly instruction is specified, and the operation of all instructions driving is macro must be continuous on logical sequence number, C in Fig. 6, D indicates the macro start-stop serial number of operation.

Each the macro interior instruction operation of operation is only by the macro interior P [15 of this operation:0] it controls.The of 16bit control registers 0-P0 is a specified register, and programmer can only read, and is unable to assignment, this set after electrification reset.Operation used is macro in journey It is controlled by P0 in the case of the not specified predicate register file of sequence person.Such as：[P1] M0_1R3=R2+R1 indicates operation macro 0 and macro 1 The 32bit add operations of interior execution are controlled by respective macro interior P1 registers；And M0_1R3=R2+R1 then indicates operation macro 0 And the 32bit add operations executed in macro 1 are controlled by respective macro interior P0 registers.Processor can select meaning by parameter Word register is ' 1 ' execution, or is executed for ' 0 '.Such as：[P1] M0_1R3=R2+R1 indicates that P1 executes the addition for ' 1 ' and refers to It enables, [！P1] M0_1R3=R2+R1 indicate P1 be ' 0 ' execute addition instruction.When the description for omitting predicate register file in assembly instruction When information, indicate to execute the instruction when P0 is ' 1 ', i.e. M0_1R3=R2+R1 and [P0] M0_1R3=R2+R1 is of equal value.Work as remittance Volume omits the macro description information of operation in instructing when, the macro execution instruction, i.e. [P2] of operation for only having logical labels for " 0 " is indicated R3=R2+R1 and [P2] M0_0R3=R2+R1 is of equal value.

The foregoing is merely the preferred embodiments of the invention, are not intended to limit the invention creation, all at this All any modification, equivalent and improvement etc., should be included in the invention made by within the spirit and principle of innovation and creation Protection domain within.

Claims

1. a kind of restructural processor architecture, it is characterised in that：It is internal macro, shared comprising 4 instruction buffers, 16 operations Data storage, shared routing memory and peripheral hardware, 16 operations are macro to be connected to instruction buffer by 4 instruction bus, Instruction buffer is connected to crossbar switch, and each peripheral hardware is also connect with crossbar switch, and operation is macro, crossbar switch is connected to shared data and deposits Reservoir, instruction buffer are connected to shared routing memory, and instruction bus is by the way of flowing water；

Discrete mode：Processor under the pattern forms a 4 core homogenous-processors, includes that 4 operations are macro per core, 1 finger Enable caching and 1 instruction bus；

Reform patterns：The pattern is macro by operation and instruction flow line recombinates, and builds the Logic Core of different scales, inside processor Selection is 1 core either 2 cores or 3 cores or 4 cores as needed, and inside each core includes 1-16 operation It is macro.

2. restructural processor architecture according to claim 1, it is characterised in that：Every instruction bus can once be sent out 8 48bit coding lines are penetrated, single instrction is at best able to the macro participation of 16 operations of driving and calculates, the instruction flow line and fortune that inside does not use It calculates and macro is arranged to low-power consumption mode.

3. restructural processor architecture according to claim 1, it is characterised in that：Macro each operation includes privately owned number According to memory, distribution decoding logic, arithmetic unit and direct memory access controller；

Arithmetic unit by the local general register group of a 128*32bit word, 8 arithmetic logic units, 8 multipliers, 4 Shift unit and 1 super device of calculating are constituted, and support 16bit, 32bit, single precision and double-precision floating point formatted data to calculate, general deposit Device is responsible for the data interaction between arithmetic unit and macro interior private data memory, the macro internal packet of operation as data temporary storage equipment Containing 3 independent 64bit data/address bus.

4. restructural processor architecture according to claim 3, it is characterised in that：The restructural processor architecture Instruction bus use pipeline mode, totally 15 grades, be divided into two parts of leading portion and back segment, 7 grades of flowing water of leading portion, 8 grades of back segment, In, instruction bus leading portion is the common portion of Logic Core, each Logic Core 1 totally 4；Instruction bus back segment is in each operation Macro inside is that each operation is macro privately owned, totally 16.

5. restructural processor architecture according to claim 1, it is characterised in that：Data and program are according to 32bit Bit wide stored, using byte addressing, processor uses 48bit instruction set, and machine code bit wide is 48bit, Logic Core The entry address of the at most 8 48bit single-word instructions of reception of each clock cycle, all subprograms and interrupt service routine is 16 A 32bit words alignment.

6. restructural processor architecture according to claim 4, it is characterised in that：Including operation it is macro it is privately owned, it is publicly-owned and Pipeline synchronization controls three classes program control register, and the macro interior privately owned program control register of operation is according to the macro letter of this operation Breath is configured, and the isochronous controller of the macro interior common program control register and 4 assembly lines of operation is macro according to all operations Privately owned register be updated, operation is macro and the simultaneously operating of assembly line leading portion to be controlled by macro interior publicly-owned and assembly line same Walk control register.

7. restructural processor architecture according to claim 3, it is characterised in that：The privately owned data in the macro inside of operation are deposited It is connected by 3 data buses between reservoir and the macro interior register file for data exchange, the data bit width of every bus 64bit, macro interior private data memory are divided into 6 data block, each block and contain 2 data bank, each bank numbers According to bit wide 32bit, data bank is built by dual-port SRAM.

8. restructural processor architecture according to claim 1, it is characterised in that：The shared data memory is divided into 4 storages block, each data block are divided into 32 storage bank, and operation is macro to be connected by shared data memory interface To shared data memory.

9. a kind of reconstructing method of the processor architecture restructural according to claim 1 to 8 any one of them, feature exist In：Include the following steps：

Step 1：A reconstruct configuration module is added, is articulated on crossbar switch as shared peripheral hardware, the reconstruct configuration module Include one group of configuration register for reconfiguration information to be arranged；

Step 2：After electrification reset, processor enters the loading procedure of personal code work, and processor is in discrete operating mode at this time, Etc. the same 4 core homogenous-processor loading procedure；

Step 3：Access instruction setting reconstruct configuration module, after all configuration registers are provided with, and loading procedure terminates Structural rearrangement is completed by restructuring directive afterwards, just starts the execution of personal code work after completing restructuring directive.

10. the reconstructing method of restructural processor architecture according to claim 9, it is characterised in that：It is described restructural The instruction bus of processor architecture use pipeline mode, totally 15 grades, be divided into two parts of leading portion and back segment, 7 grades of streams of leading portion Water, 8 grades of back segment, wherein instruction bus leading portion is the common portion of Logic Core, each Logic Core 1 totally 4；After instruction bus Section is that each operation is macro privately owned in the macro inside of each operation, totally 16, and the leading portion of instruction bus is responsible for from instruction buffer Acquisition instruction packet, and the instruction execution row that can be performed simultaneously is extracted in being wrapped from instruction, it is transmitted to the processing of instruction bus back segment, simultaneously According to the synchronically controlling information that instruction bus back segment is fed back, the execution of control instruction bus leading portion, instruction bus back segment is according to weight Structure configuration information judges that this operation is macro and which Logic Core is belonged to, and chooses corresponding instruction bus leading portion and execute user program.