Summary of the invention
Problem to be solved by this invention provides a kind of risc processor and data access method thereof, and it accelerates the virtual machine context switching, improves computing power.
A kind of risc processor for realizing that the object of the invention provides comprises instruction module, physical register heap, code translator, performance element and storer;
Described instruction module comprises memory access extended instruction module, and described extended instruction module comprises carries out access instruction to many haplotype datas width;
Described code translator comprises judge module, and many times of storages decoding module and many times read decoding module, wherein:
Described judge module is used to judge the instruction type that is input to code translator;
Described many times of storage decoding modules are used for when the instruction of input is the storage operation instruction of memory access extended instruction source-register being extended to a plurality of adjacent registers by one, output to performance element then and carry out;
Described many times are read decoding module, be used for when the instruction of input is the read operation instruction of memory access extended instruction, with this read operation instruction decode is many internal manipulation instructions, destination register is extended to a plurality of adjacent registers by one, be assigned to then in described many built-in functions, output to performance element and carry out.
Described performance element comprises merge cells, is used for reading decoding module to after reading the internal memory operation instruction decode at many times, before performance element is carried out, a plurality of built-in functions is merged.
Described many haplotype datas width access instruction comprises reading internal memory, write memory and many haplotype datas width of flating point register read internal memory, write memory totally four many haplotype datas of class width memory access extended instructions many haplotype datas width.
The method of described merging is to be that many haplotype datas width load instruction decoding forms if find a plurality of operations of the adjacent emission formation that will enter performance element, then successor operation does not enter described emission formation, and its target physical register number is saved as the high-order target physical register number of first operation.
For realizing that the object of the invention also provides a kind of data access method of risc processor, comprises the following steps:
Steps A, processor are at first taken out an instruction and are input to code translator;
Step B, many haplotype datas width instructions is discerned and deciphered to code translator decision instruction type;
Step C sends to the performance element executable operations with the many haplotype datas width instructions after the decoding.
Described step B comprises the following steps:
If the instruction of input is the instruction in the existing instruction set, code translator just is translated into built-in function with it;
If the instruction of input is many times of memory access extension storage instructions, code translator is extended to a plurality of adjacent registers with source-register by one;
If input instruction is the reading command of many times of memory access expansions, code translator is extended to a plurality of adjacent registers with destination register by one, and read operation is assigned in a plurality of built-in functions, a plurality of adjacent registers are respectively the destination register of these a plurality of built-in functions.
Described step C comprises the following steps:
After the reading command decoding of code translator, before performance element is carried out, described a plurality of built-in functions are merged many haplotype datas width.
If the method for described merging is that the decoding of many haplotype datas width load instruction forms for a plurality of operations of finding the adjacent emission formation that will enter performance element, then successor operation does not enter described emission formation, and its target physical register number is saved as the high-order target physical register number of first operation.
Described many haplotype datas width access instruction comprises reading internal memory, write memory and many haplotype datas width of flating point register read internal memory, write memory totally four many haplotype datas of class width memory access extended instructions many haplotype datas width.
The invention has the beneficial effects as follows: risc processor of the present invention and data access method thereof, it provides data width is the memory access extended instruction of many times of legacy data width, accelerates also can improve when virtual machine context is switched the performance of computing machine.
Embodiment
In order to make purpose of the present invention, technical scheme and advantage clearer,, a kind of risc processor of the present invention and data access method thereof are further elaborated below in conjunction with drawings and Examples.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.
The embodiment of the invention is with the risc processor device of MIPS64 instruction set, with double data width access instruction is example, and to risc processor device of the present invention and data access method thereof and describe, still, what should indicate is that it is not the qualification to claim of the present invention.
The invention provides a kind of risc processor, as shown in Figure 1, comprise instruction module 1, physical register heap 2, code translator 3, performance element 4 and storer (internal memory) 5.
Described instruction module 1 comprises the instruction of MIPS instruction set;
Described physical register heap 2 comprises general-purpose register and flating point register.
Described instruction module 1 also comprises memory access extended instruction module 11, and described extended instruction module 11 comprises carries out access instruction to many haplotype datas width.Described many haplotype datas width access instruction comprises reading internal memory, write memory and many haplotype datas width of flating point register read internal memory, write memory totally four many haplotype datas of class width memory access extended instructions many haplotype datas width.
The input subclass that described memory access extended instruction is a code translator 3, code translator 3 receives the memory access extended instruction, and to this memory access extended instruction decoding.The output of decoding back code translator 3 is decoded as many built-in functions and a built-in function respectively according to reading with the different instruction of write memory.
Described code translator 3 comprises judge module 31, and many times of storages decoding module 32 and many times read decoding module 33, wherein,
Described judge module 31 is used to judge the instruction type that is input to code translator 3.
Described many times of storage decoding modules 32 are used for when the instruction of input is the storage operation instruction of memory access extended instruction source-register being extended to a plurality of adjacent registers by one, output to performance element then and carry out;
Described many times are read decoding module 33, be used for when the instruction of input is the read operation instruction of memory access extended instruction, with this read operation instruction decode is many internal manipulation instructions, destination register is extended to a plurality of adjacent registers by one, be assigned to then in described many built-in functions, output to performance element and carry out.
Described performance element 4 comprises merge cells 41, is used for reading decoding module to after reading internal memory operation decoding at many times, before performance element is carried out, a plurality of built-in functions is merged.
As a kind of embodiment, the present invention has proposed the memory access extended instruction of haplotype data width more than four altogether, what comprise double data width reads the instruction of internal memory, write memory, and the double data width of flating point register read internal memory, write memory instruction.
As a kind of embodiment, the coded format of many haplotype datas width extended instruction provided by the invention as shown in Figure 2, the LWC2 that keeps in the existing MIPS64 instruction set of its utilization and the value of SWC2 dead slot, wherein high 6 (31bit:26bit) of 32 bit instructions are the opcode territory.Wherein LWC2 (opcode is 110010) and SWC2 (opcode is 111010) instruction slots all are can defining by the user is autonomous of MIPS regulation.The addressing mode of its memory access is the addressing mode of base+8 bits offset.
Wherein, 5 base domain representation plot, 5 rt (Register Target (Source/Destination)) domain representation source/destination register, offset represents offset address, last 6 func territory is used to distinguish each bar extended instruction.
Four extended instructions of the present invention are custom instructions of expanding to the user in the MIPS64 instruction set.
For number storage order SQ, the sq operation of an inside of code translator 3 outputs;
For load instruction LQ, two adjacent built-in function lq1 and lq2 of code translator 3 output, wherein lq1 has low 64 logic registers number of LQ instruction, and lq2 has high 64 logic registers number of LQ instruction.
Deliver to the emission formation (not shown) of performance element 4 after 3 pairs of instructions of code translator are deciphered, launch the therefrom ready operation of selection operation number of formation, be transmitted into the memory access parts (not shown) of performance element 4.
For totally 128 load instruction operations of four words that read internal memory, the merging module 41 of described performance element merges two built-in functions when entering the emission formation.The method that merges is to be that 128 load instructions decodings of four words form if find adjacent two operations that will enter formation, a then back operation does not enter the emission formation, and its target physical register number is deposited in high 64 target physical register number of last operation.
Peek operation after the merging has two target physical register number, and the memory access parts that can be transmitted into performance element when determining (corresponding source physical register is ready to) in the address are carried out.
The memory access parts are parts of carrying out access data, its from internal memory, take out data according to instruction or deposit data in internal memory.This process is existing standard technique, it will be apparent to those skilled in the art that, therefore, describes in detail no longer one by one in the embodiment of the invention.
Below to double data width read the instruction of internal memory, write memory, and to the double data width of flating point register read internal memory, write memory instruction altogether four instructions describe:
GsLQ rt, offset (base)/read four digital data to internal memory
From internal memory, get four words and be deposited in the register, earlier with signed 8 offset and GPR[base] the content addition obtain effective address, from internal memory, take out 128 four words according to the effective address of this alignment again, be stored in 2 adjacent general-purpose registers.
If rt is an even number, then be stored among register rt and the rt+1; If rt is an odd number, then be stored among register rt-1 and the rt.
Effective address is alignd, if the wrong exception in address then takes place any non-zero in low 4 bit address.
GsLQC1 ft, offset (base)/four digital data write memory
Earlier with signed 8 offset and GPR[base] the content addition obtain effective address, from internal memory, take out 128 four words according to the effective address of this alignment again, be stored in 2 adjacent flating point registers.
If ft is an even number, then be stored among register ft and the ft+1; If ft is an odd number, then be stored among register ft-1 and the ft.
Effective address is alignd, if the wrong exception in address then takes place any non-zero in low 4 bit address.
GsSQ rt, offset (base)/deposit four digital data to internal memory
Earlier with signed 8 offset and GPR[base] the content addition obtain effective address, then four words in adjacent 2 general-purpose registers are stored in the effective address in the internal memory.
If rt is an even number, the value of then taking out among register rt and the rt+1 is stored in internal memory; If rt is an odd number, the value of then taking out among register rt-1 and the rt is stored in internal memory.
Effective address is alignd, if the wrong exception in address then takes place any non-zero in low 4 bit address.
GsSQC1 ft, offset (base)/deposit four words to internal memory from flating point register
Earlier with signed 8 offset and GPR[base] the content addition obtain effective address, then four words in adjacent 2 flating point registers are stored in the effective address in the internal memory.If ft is an even number, the value of then taking out among register ft and the ft+1 is stored in internal memory; If ft is an odd number, the value of then taking out among register ft-1 and the ft is stored in internal memory.
Effective address must be alignd, if the wrong exception in address then takes place any non-zero in low 4 bit address.
A kind of data access method of risc processor also is provided according to the present invention, and this method comprises the following steps:
As shown in Figure 3, be the data access method process flow diagram of risc processor, it comprises the following steps:
Step S100, processor at first take out an instruction and are input to code translator 3;
Step S200, many haplotype datas width instructions is discerned and deciphered to code translator 3 decision instruction types;
If the instruction in the existing MIPS instruction set, code translator 3 just is translated into built-in function with it, such as providing corresponding OP, source-register and destination register etc.;
Many times of memory access extended instructions operation that if the instruction of input is the present invention to be proposed, code translator 3 with the source/or destination register become two paired registers by an automatic expansion;
If rt is an even number, then be stored among register rt and the ft+1; If rt is an odd number, then be stored among register rt-1 and the rt.
When if input instruction is the reading command of many haplotype datas width, code translator 3 becomes a plurality of adjacent registers with destination register by an automatic expansion, and read operation is assigned in a plurality of built-in functions, a plurality of paired registers are respectively the destination register of these a plurality of built-in functions.
When if input instruction is the storage instruction of many haplotype datas width, code translator is extended to a plurality of adjacent registers with source-register by one.
Processor at first takes out an instruction and is input to code translator, code translator decision instruction type, if the instruction in original MIPS instruction set, code translator just is translated into built-in function with it, for example provide corresponding operation (OP), source-register tool and destination register etc. output to the storage performance element and carry out; If the instruction of input is the storage operation in the memory access extended instruction that proposes of the present invention, code translator with the source/or destination register become two paired registers by an automatic expansion, output to the storage performance element then and carry out.
Processor at first takes out an instruction and is input to code translator 3, code translator 3 decision instruction types, and convert built-in function to.The coding of built-in function is more regular than instruction for the functional part of processor, helps simplifying internal logic.In the risc processor, outside usually instruction is to shine upon one by one to built-in function.The built-in function of code translator output is made up of several territories, as operational code (op), extended operation sign indicating number (fmt), source-register number, destination register number, several etc. immediately.
Step S300 sends to performance element 4 executable operations with the many haplotype datas width instructions after the decoding.
After the decoding of the 3 pairs of input instructions of code translator, send to performance element 4, in performance element 4, if the read operation instruction, 2 built-in function lq1 of read operation instruction LQ then, lq2 is merged into an operation, delivers to the memory access parts of performance element and carries out.
If the instruction of input is four words totally 128 the number storage order SQ in the memory access extended instruction that proposes of the present invention, code translator 3 number is extended to two paired register number by one with source-register.Be extended to 4,5 two register number such as No. 4 registers; No. 7 register is extended to 6,7 two.Number deliver to the emission formation of performance element as two source-registers of built-in function.
The read operation in the memory access extended instruction that if the instruction of input is the present invention to be proposed, code translator 3 is decoded as two built-in functions with this read operation, same is two paired registers with destination register by an automatic expansion, be assigned to then in above-mentioned two built-in functions, output to performance element 4 and carry out, then it is merged into an operation and delivers in the memory access parts of performance element and carry out.
That is to say that if the instruction of input is four words totally 128 the number storage order LQ in the memory access extended instruction that proposes of the present invention, code translator number is extended to two paired register number by one with destination register.And split into two adjacent built-in function lq1, lq2, be with this two destination registers number respectively, send into the emission formation of performance element 4.Performance element receives the built-in function from code translator, and selecting wherein, physical register ready those in source are transmitted into the memory access parts of performance element.The LQ instruction is transmitted into the memory access parts as an accessing operation, is therefore finished the merging of two built-in functions by the merging module of performance element.Method is that the target physical register of lq2 is deposited in high 64 target physical registers of lq1 operation, and lq2 itself does not enter the emission formation.The memory access parts are carried out accessing operation, for lq1, two codomains are arranged in the result, write the corresponding physical register respectively.
Risc processor of the present invention and data access method thereof, it provides data width is the memory access extended instruction of many times of legacy data width, accelerates also can improve when virtual machine context is switched the performance of computing machine.
In conjunction with the drawings to the description of the specific embodiment of the invention, others of the present invention and feature are conspicuous to those skilled in the art.
More than specific embodiments of the invention are described and illustrate it is exemplary that these embodiment should be considered to it, and be not used in and limit the invention, the present invention should make an explanation according to appended claim.