WO2020108212A1 - 寄存器访问时序的管理方法、处理器、电子设备及计算机可读存储介质 - Google Patents

寄存器访问时序的管理方法、处理器、电子设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2020108212A1
WO2020108212A1 PCT/CN2019/114336 CN2019114336W WO2020108212A1 WO 2020108212 A1 WO2020108212 A1 WO 2020108212A1 CN 2019114336 W CN2019114336 W CN 2019114336W WO 2020108212 A1 WO2020108212 A1 WO 2020108212A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
register
clock cycle
access
executing
Prior art date
Application number
PCT/CN2019/114336
Other languages
English (en)
French (fr)
Inventor
曹庆新
李炜
Original Assignee
深圳云天励飞技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳云天励飞技术有限公司 filed Critical 深圳云天励飞技术有限公司
Publication of WO2020108212A1 publication Critical patent/WO2020108212A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/30141Implementation provisions of register files, e.g. ports
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating

Definitions

  • the present application relates to the field of register resource management, and in particular, to a register access timing management method, processor, electronic device, and computer-readable storage medium.
  • an instruction is usually divided into multiple stages to be executed in order to achieve the purpose of executing multiple instructions in parallel, thereby improving the execution efficiency of program instructions and achieving the highest frequency of the processor .
  • instruction 2 needs to read the data written back by instruction 1 from general register R1 in the third clock cycle, and instruction 1 can write the data required by instruction 2 back to R1 in the fourth clock cycle, so instruction 2 The data read from R1 in the third clock cycle is not really needed.
  • instruction 1 and instruction 2 write data back to the same register in the same clock cycle, causing access time conflicts.
  • the processor needs to manage the register access timing of different instructions to avoid register access conflicts.
  • the existing management method of register access timing based on hardware implementation is difficult to design and high in implementation cost.
  • the management of register access timing based on the compiler will cause the compiler to take into account both optimization compilation and elimination of register access conflicts, thereby increasing the design difficulty of the compiler and affecting the overall performance of the processor.
  • Embodiments of the present application provide a register access timing management method, device, electronic device, and computer-readable storage medium, which can reduce the complexity of processor hardware design and increase the resource utilization rate of registers.
  • a first aspect of an embodiment of the present application provides a method for managing register access timing, including:
  • the timing relationship between executing the first instruction to access the register and executing the second instruction to access the register is determined.
  • the second aspect of the embodiments of the present application provides a processor, including:
  • An obtaining module configured to obtain first access information for executing a first instruction to access a register in each clock cycle of multiple clock cycles, the first instruction is determined to take effect when the arrival of each clock cycle instruction;
  • a decoding module configured to, when receiving the second instruction, determine to execute the second access information of the second instruction to access the register
  • the detection module is configured to determine the timing relationship between executing the first instruction to access the register and executing the second instruction to access the register according to the first access information and the second access information.
  • a third aspect of the embodiments of the present application discloses an electronic device, including: a processor, a memory, a communication interface, and a bus;
  • the processor, the memory, and the communication interface are connected through the bus and complete communication with each other;
  • the memory stores executable program code
  • the processor runs the program corresponding to the executable program code by reading the executable program code stored in the memory, for performing a management of register access timing disclosed in the first aspect of the embodiments of the present application Operation in the method.
  • an embodiment of the present application provides a storage medium, wherein the storage medium is used to store an application program, and the application program is used to execute a register access sequence disclosed in the first aspect of the embodiment of the present application at runtime. Management method.
  • an embodiment of the present application provides an application program, wherein the application program is used to execute a register access timing management method disclosed in the first aspect of the embodiment of the present application at runtime.
  • the timing relationship between accessing the register and executing the second instruction It is possible to save the access information of the instruction access register and the status of the register access port executed in each clock cycle in a hardware information table, so that the access timing of the register can be managed by simply maintaining the hardware information table.
  • the code stage performs conflict detection based on the hardware information table.
  • the hardware information table can also be used to detect and utilize idle register resources in a timely manner, thereby effectively improving the resource utilization rate of the registers.
  • FIG. 1 is a schematic structural diagram of a processor provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a method for managing register access timing provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of another method for managing register access timing provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of another processor provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 1 is a schematic structural diagram of a processor according to an embodiment of the present application.
  • the processor includes an instruction execution pipeline, multiple data ports, and multiple general-purpose registers.
  • the general register is simply referred to as a register in the following.
  • the data port includes a write-back port of a register. Multiple data ports can simultaneously write data to multiple different registers, but multiple data ports cannot simultaneously write data to one register.
  • the execution process of each instruction is divided into multiple stages, including: instruction binary (DEC) decoding stage, register read (RF) stage, operation execution (EXE) stage, memory access stage 0 (M0), memory access phase 1 (M1), memory access phase 2 (M2), and data write-back (WB) phase.
  • the memory access stage may include memory access stage 0 (M0), memory access stage 1 (M1) and memory access stage 2 (M2), then the order of the instruction execution pipeline is 7 stages.
  • the instruction to be executed will enter each stage of the instruction execution pipeline in order to be executed in sequence.
  • the DEC stage includes the process of decoding the instruction to be executed, and the register where the operand of the instruction can be determined;
  • the EXE stage includes the operation of the operand according to the operation logic of the instruction;
  • M0-M2 is for memory access instructions, including reading data from the memory or writing data
  • the WB stage refers to the process of writing the result of instruction execution back to the register through the data port.
  • Each stage in the instruction execution pipeline has its own independent circuit to process. After the instruction to be executed enters the pipeline, it will go to the next stage after each stage is completed, and the circuit of the previous stage can process other instructions.
  • the processor can simultaneously execute instructions 1 in the M2 phase, instructions 2 in the M1 phase, instructions 3 in the M0 phase, and instructions 4 in the EXE phase.
  • the execution process of each instruction can also be split into the fetch stage, decoding stage, execution stage, memory access stage and write back stage.
  • the implementation process of the process line technology is similar.
  • the implementation process of the instruction execution pipeline it can be known that the instruction execution efficiency of the processor using the pipeline technology can be doubled compared to the processor without the pipeline technology. Based on the above processor, the embodiments of the present application propose the following management method of register access timing.
  • FIG. 2 is a schematic flowchart of a method for managing register access timing provided by an embodiment of the present application.
  • the execution subject in the case where two instructions write data to the same register in the same clock cycle, the execution subject is a processor.
  • the method in the embodiments of the present application includes:
  • S201 Acquire first access information for executing a first instruction to access a register in each clock cycle of multiple clock cycles.
  • the first instruction is an instruction that has been determined to be effective when the arrival of each clock cycle.
  • the clock cycle is the basic time unit in the processor, and the length of one clock cycle is equal to the reciprocal of the processor's main frequency.
  • the current clock cycle and the instructions to be executed in at least one clock cycle after the current clock cycle can be determined, where the instructions that come into effect at the arrival of each clock cycle are the ones to be executed in the clock cycle.
  • the clock cycle in which the instruction becomes effective can be determined, so as to determine the instruction effective in each clock cycle.
  • the decoding result also includes access information to access the register during the execution of the instruction.
  • the access information may include identification information of a register to be accessed by executing an instruction effective within the clock cycle, and a data port used to access the register.
  • the embodiments of the present application mainly focus on registers to which data needs to be written when the effective instruction is executed within each clock cycle.
  • the processor includes 2 data ports wp0 and wp1, 32 registers (register 1, register 2, ..., register 32), the order of the instruction execution pipeline in the processor is 7th order, in order to cover the pipeline
  • the instruction that is in effect at the time, F[2] means the instruction that is effective when the next two clock cycles arrive, ..., and F[6] means the instruction that becomes effective after six clock cycles.
  • the processor may continue to receive new to-be-executed instructions input by the application program or system while executing the instructions whose execution order and execution time have been determined.
  • the instruction will be decoded immediately to obtain the second access information for accessing the register to execute the instruction. There is no delay, so the translation of the second instruction
  • the code occurs on the current clock cycle.
  • the second access information may include identification information of a register to which data needs to be written during the execution of the second instruction, and time information (for example, after N clock cycles) in which the write operation occurs.
  • the processor receives instruction 1 while executing F[0], and then decodes instruction 1 in the current clock cycle to obtain the instruction. It needs to reverse N clock cycles of the current clock cycle. Write data back to register RW. It should be noted that the premise of writing data back to the register RW after N clock cycles is that the instruction 1 smoothly enters the RF phase in the next clock cycle, which means that the instruction is not delayed.
  • S203 Determine, according to the first access information and the second access information, a timing relationship between executing the first instruction to access the register and executing the second instruction to access the register.
  • the first access information includes first identification information of a register accessed by executing the first instruction within each clock cycle, where the register is a register to which data is written during execution of the first instruction.
  • the second access information includes second identification information of the register accessed by executing the second instruction, and time information of accessing the register by executing the second instruction, where the register is also a register to which data is written during execution of the second instruction .
  • the target clock cycle among the multiple clock cycles may be determined first according to the time information in the second access information, where the target clock cycle is the clock cycle for executing the second instruction to access the register, for example: the time information is 3 clocks
  • the current clock cycle is used as a reference to determine that the target clock cycle is the third clock cycle after the current clock cycle.
  • the identification information When the identification information is the same, it means that the first instruction and the second instruction write data back to the same register within the target clock cycle, thereby determining that the timing relationship between executing the first instruction to access the register and executing the second instruction to access the register is a time conflict.
  • the second instruction completes the DEC decoding in the current clock cycle and executes the instruction in the normal pipeline order, it will definitely conflict with the first instruction within one clock cycle after the current clock cycle. , Resulting in an incorrect execution result of the instruction.
  • the first access information also includes port status information of each data port in each data port in each clock cycle. Therefore, according to the port status information, the usage status of each data port within the target clock cycle can be determined first, and when at least one data port among the plurality of data ports is in an idle state, the first identification information and the second identification information can be performed. Comparison.
  • the above-mentioned process of determining the timing relationship between executing the first instruction to write data to the register RW and executing the second instruction to write data to the register RW can be expressed by the logical expression shown in formula (1).
  • the processor may delay execution of the second instruction, where the second instruction may be transferred through a hardware mechanism at the current clock cycle Stuck and stayed in the DEC phase.
  • the data port in the idle state can be selected according to the port information in the target clock cycle to write to the register accessed by the second instruction In order to quickly seize the free port.
  • the first access information may be updated according to the timing relationship determined within the clock cycle and the second access information. For example, if the current clock cycle is T0 and the next clock cycle of T0 is T1, when T0 ends and T1 arrives, the first access information may be updated in T1 according to the timing relationship determined in T0 and the second access information.
  • the update of the first access information includes the following three cases:
  • the first case If after the DEC decoding, it is determined that the second instruction does not need to write data back to the register, then whether the timing relationship is a time conflict or not, it is updated according to the following rules:
  • the first instruction is determined to start when the arrival of each clock cycle The effective instruction; when receiving the second instruction, determine to execute the second access information of the second instruction to access the register; then according to the target clock cycle included in the second access information to execute the data written to the register by the second instruction ; Finally, the first identification information included in the first access information to be written when the target executes the first instruction within each clock cycle of the target, and the second access information included in the target to be executed within each clock cycle of the target.
  • the second identification information of the register to be written during the second instruction when the first identification information and the second identification information are the same, it means that the first instruction and the second instruction need to write data to the same register in the same clock cycle, Cause a time conflict.
  • the method in the embodiment of the present application maintains a hardware information table in real time to save the status information of each data port and the access information of registers in each clock cycle, and can implement conflict detection in the DEC decoding stage.
  • the implementation logic is simple, the hardware design complexity is low, and the required power consumption is small.
  • the acquired register access information can also be used to quickly seize idle data ports and improve the utilization rate of data ports.
  • FIG. 3 is a schematic flowchart of another method for managing register access timing provided by an embodiment of the present application.
  • the embodiment of the present application is directed to the case where an instruction needs to read the operation result of another instruction from the register R in a certain clock cycle, but the operation result has not been written back to the register R before the arrival of the clock cycle.
  • Device As shown in the figure, the method in the embodiments of the present application includes:
  • the clock cycle is the basic time unit in the processor, and the length of one clock cycle is equal to the reciprocal of the processor's main frequency.
  • the current clock cycle and the instructions to be executed in at least one clock cycle after the current clock cycle can be determined, where the instructions that come into effect at the arrival of each clock cycle are the ones to be executed in the clock cycle.
  • the clock cycle in which the instruction becomes effective can be determined, so as to determine the instruction effective in each clock cycle.
  • the decoding result also includes access information to access the register during the execution of the instruction.
  • the access information may include identification information of the register to be accessed by executing the instruction effective within the clock cycle.
  • the embodiments of the present application mainly focus on registers to which data needs to be written when the corresponding effective instruction is executed within each clock cycle.
  • the processor may continue to receive new to-be-executed instructions input by the application program or system while executing the instructions whose execution order and execution time have been determined.
  • the instruction will be decoded immediately to obtain the second access information of the register accessed by the execution of the instruction, there is no delay, so the second instruction decoding occurs in the current clock cycle.
  • the second access information may include identification information of a register from which data needs to be read during the execution of the second instruction.
  • the second instruction when the second instruction is in the DEC phase in the current clock cycle, if the second instruction is executed according to normal pipeline technology, the second instruction should enter the RF phase in the next clock cycle of the current clock cycle and be read in the RF phase
  • the second access information includes the second identification information of the register to be read by the second instruction in the RF stage.
  • the second identification information is different from the first identification information in each clock cycle in at least one clock cycle after the current clock cycle, it means that the data to be read by the second instruction before the next clock cycle It has been written to the register, and the second instruction can read the correct operand from the corresponding register after entering the RF phase in the next clock cycle.
  • the usage status of each data port in each clock cycle of at least one clock cycle can be determined first according to the port state information in the first access information; then at least one clock cycle The clock cycle in which at least one of the data ports is in the occupied state is taken as the target clock cycle; then, the first identification information of the register accessed by executing the first instruction and the register accessed by executing the second instruction within the target clock cycle are determined Whether the second identification information is the same.
  • the second identification information when the second identification information is the same as the first identification information corresponding to one or more clock cycles in the at least one clock cycle, it means that the data required to be read by the second instruction before the next clock cycle has not yet It is written to the corresponding register. If the second instruction can enter the RF phase in the next clock cycle, the operand read from the corresponding register is not the correct operand, resulting in an incorrect execution result of the second instruction. Therefore, the second instruction can be stuck by the hardware mechanism in the current clock cycle, so as to delay the time for sending the second instruction to the RF phase until the conflict is resolved.
  • the first access information needs to be updated according to the timing relationship determined in the clock cycle and the second access information. For example, if the current clock cycle is T0 and the next clock cycle of T0 is T1, when T0 ends and T1 arrives, the first access information may be updated in T1 according to the timing relationship determined in T0 and the second access information.
  • first identification information of a register to which data is written when a first instruction is executed in each clock cycle of multiple clock cycles is acquired, the first instruction is determined to be in the Instructions that come into effect when two clock cycles arrive; when a second instruction is received, determine the second identification information of the register from which data is read when the second instruction is executed; then determine the second identification information and the current clock cycle Whether the first identification information corresponding to each clock cycle in at least one clock cycle is the same, when the second identification information is the same as the first identification corresponding to one or more clock cycles, it means that if the second instruction can be in the current clock cycle When the next clock cycle enters the RF phase, the operand read from the corresponding register is not the correct operand error, that is, there is a time conflict between the read and write operations of the register.
  • the method in the embodiment of the present application maintains a hardware information table in real time to save the status information of each data port and the access information of registers in each clock cycle, and can implement conflict detection in the DEC decoding stage.
  • the implementation logic is simple, the hardware design complexity is low, and the required power consumption is small.
  • FIG. 4 is a schematic structural diagram of a processor according to an embodiment of the present application.
  • the processor in the embodiment of the present application includes:
  • the obtaining module 401 is configured to obtain the first access information for executing the first instruction to access the register in each clock cycle of multiple clock cycles.
  • the first instruction is an instruction that has been determined to be effective when the arrival of each clock cycle.
  • the clock cycle is the basic time unit in the processor, and the length of one clock cycle is equal to the reciprocal of the processor's main frequency.
  • the current clock cycle and the instructions to be executed in at least one clock cycle after the current clock cycle can be determined, where the instructions that come into effect at the arrival of each clock cycle are the ones to be executed in the clock cycle.
  • the access information may include identification information of a register to be accessed by executing an instruction effective within the clock cycle, and a data port used to access the register.
  • the embodiments of the present application mainly focus on registers to which data needs to be written when the effective instruction is executed within each clock cycle.
  • the decoding module 402 is configured to, when receiving the second instruction, determine to execute the second access information of the second instruction to access the register.
  • the processor may continue to receive new to-be-executed instructions input by an application program or system while executing instructions whose execution order and clock cycle have been determined.
  • the instruction will be decoded immediately to obtain the second access information for accessing the register to execute the instruction, there is no delay, so the second instruction is decoded Occurs in the current clock cycle.
  • the second access information may include identification information of a register to which data needs to be written during execution of the second instruction, and time information (for example, after N clock cycles) in which the write operation occurs.
  • the second access information may also include identification information of the register from which data needs to be read during execution of the second instruction
  • the detection module 403 is configured to determine the timing relationship between executing the first instruction to access the register and executing the second instruction to access the register according to the first access information and the second access information.
  • the first access information includes first identification information of a register accessed by executing the first instruction within each clock cycle, where the register is a register to which data is written during execution of the first instruction.
  • the second access information includes second identification information of the register accessed by executing the second instruction, and time information of accessing the register by executing the second instruction, where the register is also a register to which data is written during execution of the second instruction .
  • the target clock cycle among the multiple clock cycles may be determined first according to the time information in the second access information, where the target clock cycle is the clock cycle for executing the second instruction to access the register, for example: the time information is 3 clocks
  • the current clock cycle is used as a reference to determine that the target clock cycle is the third clock cycle after the current clock cycle.
  • the identification information When the identification information is the same, it means that the first instruction and the second instruction write data back to the same register within the target clock cycle, thereby determining that the timing relationship between executing the first instruction to access the register and executing the second instruction to access the register is a time conflict.
  • the second instruction if the second instruction is executed in the normal pipeline order after the current clock cycle is decoded, it will definitely conflict with the first instruction within one clock cycle after the current clock cycle. The execution result of the instruction is incorrect.
  • the first access information also includes port status information of each data port in each data port in each clock cycle. Therefore, according to the port status information, the usage status of each data port within the target clock cycle can be determined first, and when at least one data port among the plurality of data ports is in an idle state, the first identification information and the second identification information can be performed. Comparison.
  • the processor may delay execution of the second instruction, where the second instruction may be transferred through a hardware mechanism at the current clock cycle Stuck and stayed in the DEC phase.
  • the register accessed by the data port in the idle state to execute the second instruction can be selected according to the port information in the target clock cycle Write to achieve the purpose of quickly seizing the free port.
  • the first access information may be updated according to the timing relationship determined within the clock cycle and the second access information. For example, if the current clock cycle is T0 and the next clock cycle of T0 is T1, when T0 ends and T1 arrives, the first access information may be updated in T1 according to the timing relationship determined in T0 and second access information.
  • the second access information includes the second identification information of the register to be read by the second instruction in the RF stage. Therefore, the detection module 403 is also used to:
  • the second identification information is different from the first identification information in each clock cycle in at least one clock cycle after the current clock cycle, it means that the data to be read by the second instruction before the next clock cycle It has been written into the register, and the second instruction can read the correct operand from the corresponding register after entering the RF phase in the next clock cycle.
  • the usage status of each data port in each clock cycle of at least one clock cycle can be determined first according to the port state information in the first access information; then at least one clock cycle The clock cycle in which at least one of the data ports is in the occupied state is taken as the target clock cycle; then, the first identification information of the register accessed by executing the first instruction and the register accessed by executing the second instruction within the target clock cycle are determined Whether the second identification information is the same.
  • the second identification information is the same as the first identification information corresponding to one or more clock cycles in the at least one clock cycle, it means that the data required to be read by the second instruction before the next clock cycle has not been written to the corresponding register. If the second instruction can enter the RF phase in the next clock cycle, the operand read from the corresponding register is not the correct operand, resulting in an error in the confidence result of the second instruction. Therefore, the second instruction can be stuck by the hardware mechanism in the current clock cycle, so as to delay the time for sending the second instruction to the RF phase until the conflict is resolved.
  • the first instruction is determined when the arrival of each clock cycle The instruction that becomes effective; then when the second instruction is received, it is determined to execute the second access information of the second instruction to access the register; then according to the first access information and the second access information, the execution of the first
  • the timing relationship between an instruction access register and the execution of the second instruction access register can reduce the complexity of processor hardware design and improve the resource utilization of registers.
  • the electronic device may include: at least one processor 501, such as a CPU, at least one communication interface 502, at least one memory 503, and at least one bus 504.
  • the bus 504 is used to implement connection and communication between these components.
  • the communication interface 502 of the electronic device in the embodiment of the present application is a wired transmission port, and may also be a wireless device, for example, including an antenna device, and used for signaling or data communication with other node devices.
  • the memory 503 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), for example, at least one magnetic disk memory.
  • the memory 503 may be at least one storage device located away from the foregoing processor 501.
  • a group of program codes is stored in the memory 503, and the processor 501 is used to call the program codes stored in the memory to perform the following operations:
  • the timing relationship between executing the first instruction to access the register and executing the second instruction to access the register is determined.
  • the processor 501 is also used to perform the following operation steps:
  • the first access information includes first identification information of the register accessed by executing the first instruction in each clock cycle;
  • the second access information includes second identification information of the register accessed by executing the second instruction And the time information for accessing the register by executing the second instruction;
  • the processor 501 is also used to perform the following operation steps:
  • the first access information further includes port status information of each data port of the plurality of data ports in each clock cycle, and each data port is used to access a register;
  • the processor 501 is also used to perform the following operation steps:
  • the processor 501 is also used to perform the following operation steps:
  • the data port in the idle state is selected to write to the register accessed by executing the second instruction.
  • the first access information includes first identification information of the register accessed by executing the first instruction in each clock cycle
  • the multiple clock cycles include a current clock cycle and at least one clock cycle after the current clock cycle, where the current clock cycle is a clock cycle that occurs when decoding the second instruction;
  • the processor 501 is also used to perform the following operation steps:
  • the first access information further includes port status information of each data port of the plurality of data ports in each clock cycle, and each data port is used to access a register;
  • the processor 501 is also used to perform the following operation steps:
  • the processor 501 is also used to perform the following operation steps:
  • the first access information is updated according to the second access information and the timing relationship.
  • the embodiments of the present application also provide a storage medium, which is used to store an application program, and the application program is used to execute a register access sequence shown in FIG. 2 and FIG. The operation performed by the electronic device in the management method.
  • embodiments of the present application also provide an application program, which is used to execute the operations performed by the electronic device in the register access timing management method shown in FIGS.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

一种寄存器访问时序的管理方法、处理器、电子设备及计算机可读存储介质,包括:获取在多个时钟周期中的每个时钟周期内执行第一指令访问寄存器的第一访问信息(S201);当接收到第二指令时,确定执行所述第二指令访问寄存器的第二访问信息(S202);根据所述第一访问信息和所述第二访问信息,确定执行所述第一指令访问寄存器与执行所述第二指令访问寄存器之间的时序关系(S203)。可以降低处理器硬件设计的复杂度、提高寄存器的资源利用率。

Description

寄存器访问时序的管理方法、处理器、电子设备及计算机可读存储介质
本申请要求于2018年11月26日提交中国专利局,申请号为201811417048.7、发明名称为“一种寄存器访问时序的管理方法、处理器、电子设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及寄存器资源管理领域,尤其涉及一种寄存器访问时序的管理方法、处理器、电子设备及计算机可读存储介质。
背景技术
目前,在基于流水线技术实现的处理器中,通常是将一条指令拆分为多个阶段来执行以便达到并行执行多条指令的目的,从而提高程序指令的执行效率、实现处理器的最高主频。然而,在并行执行多条指令的情况下,不可避免的将在不同指令之间发生寄存器访问时间的冲突,从而导致指令操作数的读取错误和运行结果错误。比如,指令2在第3个时钟周期需要从通用寄存器R1中读取指令1写回的数据,而指令1在第4个时钟周期才能将指令2所需的数据写回R1中,因此指令2在第3个时钟周期从R1中读取的数据并不是真正需要的数据。又如,指令1和指令2在同一个时钟周期向同一个寄存器写回数据,从而造成访问时间冲突。
因此,处理器需要对不同指令对寄存器的访问时序进行管理,避免寄存器访问冲突。现有的基于硬件实现的寄存器访问时序的管理方法的设计难度大、实现成本高。而基于编译器的寄存器访问时序的管理,将导致编译器需要同时兼顾优化编译和消除寄存器访问冲突,从而增加编译器的设计难度、影响处理器的整体性能。
发明内容
本申请实施例提供一种寄存器访问时序的管理方法、装置、电子设备及计算机可读存储介质,可以降低处理器硬件设计的复杂度、提高寄存器的资源利用率。
本申请实施例第一方面提供了一种寄存器访问时序的管理方法,包括:
获取在多个时钟周期中的每个时钟周期内执行第一指令访问寄存器的第一访问信息,所述第一指令为已确定在所述每个时钟周期到来时开始生效的指令;
当接收到第二指令时,确定执行所述第二指令访问寄存器的第二访问信息;
根据所述第一访问信息和所述第二访问信息,确定执行所述第一指令访问寄存器与执行所述第二指令访问寄存器之间的时序关系。
相应地,本申请实施例第二方面提供了一种处理器,包括:
获取模块,用于获取在多个时钟周期中的每个时钟周期内执行第一指令访问寄存器的第一访问信息,所述第一指令为已确定在所述每个时钟周期到来时开始生效的指令;
译码模块,用于当接收到第二指令时,确定执行所述第二指令访问寄存器的第二访问信息;;
检测模块,用于根据所述第一访问信息和所述第二访问信息,确定执行所述第一指令访问寄存器与执行所述第二指令访问寄存器之间的时序关系。
本申请实施例第三方面公开了一种电子设备,包括:处理器、存储器、通信接口和总线;
所述处理器、所述存储器和所述通信接口通过所述总线连接并完成相互间的通信;
所述存储器存储可执行程序代码;
所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可执行程序代码对应的程序,以用于执行本申请实施例第一方面公开的一种寄存器访问时序的管理方法中的操作。
相应地,本申请实施例提供了一种存储介质,其中,所述存储介质用于存储应用程序,所述应用程序用于在运行时执行本申请实施例第一方面公开的一种寄存器访问时序的管理方法。
相应地,本申请实施例提供了一种应用程序,其中,所述应用程序用于在 运行时执行本申请实施例第一方面公开的一种寄存器访问时序的管理方法。
实施本申请实施例,获取在多个时钟周期中的每个时钟周期内执行第一指令访问寄存器的第一访问信息,所述第一指令为已确定在所述每个时钟周期到来时开始生效的指令;当接收到第二指令时,确定执行所述第二指令访问寄存器的第二访问信息;根据所述第一访问信息和所述第二访问信息,确定执行所述第一指令访问寄存器与执行所述第二指令访问寄存器之间的时序关系。可以将每个时钟周期内执行指令访问寄存器的访问信息和寄存器访问端口的状态信息保存在一个硬件信息表中,从而可以通过简单地维护该硬件信息表来管理寄存器的访问时序,并在指令译码阶段基于该硬件信息表进行冲突检测,若存在冲突,则将指令卡住不发往RF阶段,达到了降低处理器硬件设计的复杂度的目的。此外,还可以通过该硬件信息表及时检测和利用空闲的寄存器资源,从而有效提高寄存器的资源利用率。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种处理器的结构示意图;
图2是本申请实施例提供的一种寄存器访问时序的管理方法的示意图;
图3是本申请实施例提供的另一种寄存器访问时序的管理方法的示意图;
图4是本申请实施例提供的另一种处理器的结构示意图;
图5是本申请实施例提供的一种电子设备的示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
请参考图1,图1是本申请实施例提供的一种处理器的结构示意图。如图 所示,在该处理器中包括一条指令执行流水线、多个数据端口、以及多个通用寄存器。其中,为了叙述的简洁,下文中将通用寄存器简称为寄存器。数据端口包括寄存器的写回端口,多个数据端口可以同时向多个不同的寄存器写入数据,但是多个数据端口不可以同时向一个寄存器写入数据。在该处理器中,每条指令的执行过程被拆分为多个阶段,包括:指令二进制(DEC)译码阶段、寄存器读取(RF)阶段、运算执行(EXE)阶段、存储器访问阶段0(M0)、存储器访问阶段1(M1)、存储器访问阶段2(M2)以及数据写回(WB)阶段。其中,存储器访问阶段又可以包括存储器访问阶段0(M0)、存储器访问阶段1(M1)和存储器访问阶段2(M2),则该指令执行流水线的阶数为7阶。待执行指令将依次进入指令执行流水线中的每个阶段进行执行,其中,DEC阶段包括对待执行指令进行译码的过程,可以确定指令的操作数所在的寄存器;RF阶段包括从DEC译码阶段确定的寄存器中读取指令的操作数的过程;EXE阶段包括根据指令的运算逻辑对操作数进行运算的过程;M0-M2是针对于存储器访问指令的,包括从存储器中读取数据或者将数据写入存储器的过程,其中,存储器访问指令是最重要的指令类型之一。WB阶段是指通过数据端口将指令执行的结果写回寄存器的过程。指令执行流水线中的每个阶段都有各自独立的电路来处理,待执行指令在进入该流水线后,每完成一个阶段就进到下一阶段,执行前一阶段的电路就可以处理其他指令,从而可以实现多条指令的并行处理。例如:处理器可以同时执行处于M2阶段的指令1、处于M1阶段的指令2、处于M0阶段的指令3以及处于EXE阶段的指令4。其中,还可以将每条指令的执行过程拆分为取指阶段、译码阶段、执行阶段、访存阶段和写回阶段。其中,不论是拆分为几个执行阶段、如何拆分,流程线技术的实现过程都是类似的。根据指令执行流水线的实现过程可知相比于在不采用流水线技术的处理器,采用的流水线技术的处理器的指令执行效率可以得到成倍的提升。基于上述处理器,本申请实施例提出以下寄存器访问时序的管理方法。
请参考图2,图2是本申请实施例提供的一种寄存器访问时序的管理方法的流程示意图。本申请实施例针对两条指令在同一时钟周期内向同一个寄存器写入数据的情况,执行主体为处理器。如图所示,本申请实施例中的方法包括:
S201,获取在多个时钟周期中的每个时钟周期内执行第一指令访问寄存器 的第一访问信息。其中,所述第一指令为已确定在所述每个时钟周期到来时开始生效的指令。
具体实现中,时钟周期是处理器中的基本时间单位,一个时钟周期的时间长度等于处理器主频的倒数。首先可以在当前时钟周期中,确定当前时钟周期以及当前时钟周期之后的至少一个时钟周期内将要执行的指令,其中,在每个时钟周期到来时开始生效的指令就是在该时钟周期内所要执行的指令,并且在每个时钟周期到来时开始生效的指令可以有一个或多个。其中,可以根据指令在DEC译码阶段的译码结果,确定该指令开始生效的时钟周期,从而确定在每个时钟周期内生效的指令。译码结果中还包括在执行该指令过程中访问寄存器的访问信息。针对每个时钟周期,访问信息可以包括执行在该时钟周期内生效的指令所要访问的寄存器的标识信息、以及访问该寄存器所使用的数据端口。其中,本申请实施例中主要关注在每个时钟周期内执行生效指令时需向其中写入数据的寄存器。
例如:处理器中包括2个数据端口wp0和wp1、32个寄存器(寄存器1、寄存器2、…、寄存器32),该处理器中的指令执行流水线的阶数为7阶,则为了覆盖该流水线的每个阶段,可以分别确定包括当前时钟周期在内的7个时钟周期内生效的指令,其中,F[0]表示当前时钟周期到来时生效的指令、F[1]表示下一个时钟周期到来时生效的指令、F[2]表示下两个时钟周期到来时生效的指令、…、以及F[6]表示6个时钟周期后生效的指令。接着根据F[0]、F[1]、…、F[6]中每个指令的译码结果,确定在每个时钟周期内所需写入的寄存器、和向该寄存器写入数据时所使用的数据端口,其中,可以根据确定的寄存器和数据端口的信息生成处理器的硬件信息表(如表1所示)。
其中,valid0和valid1表示对应数据端口wp0和wp1的使用情况,例如,valid0=1时表示数据端口wp0被对应指令的数据写入操作占用、valid1=0时表示数据端口wp1处于空闲状态,因此valid0和valid1也反映了指令的有效性(指令是否产生需要写入寄存器的数据),rsel1和rsel0为寄存器的标识。如表1所示,在执行F[0]的过程中分别通过wp0和wp1向寄存器15和寄存器29中写入数据、在执行F[1]的过程中将通过wp0向寄存器18写入数据、…、在执行F[5]的过程中valid0=0和valid1=0,表示没有针对寄存器的写入操作。
表1.硬件信息表
Figure PCTCN2019114336-appb-000001
S202,当接收到第二指令时,确定执行所述第二指令访问寄存器的第二访问信息。
具体实现中,处理器在执行已经确定了执行顺序和执行时间的指令的同时,还可以继续接收应用程序或系统输入的新的待执行指令。通常,当在当前时钟周期接收到新的待执行指令时,会立即对该指令进行DEC译码以得到执行该指令访问寄存器的第二访问信息,不存在延迟的情况,因此第二指令的译码发生在当前时钟周期。其中,该第二访问信息可以包括在执行第二指令的过程中需要向其中写入数据的寄存器的标识信息,以及该写入操作发生的时间信息(如N个时钟周期后)。
例如:在当前时钟周期中,处理器在执行F[0]的同时接收指令1,则在当前时钟周期对指令1进行DEC译码后得到该指令需要将当前时钟周期的N个时钟周期后向寄存器RW写回数据。需要说明的是,向寄存器RW写回数据的发生在N个时钟周期后的前提是以指令1在下一个时钟周期顺利进入RF阶段为前提的,该前提表示不对该指令进行延迟执行。
S203,根据所述第一访问信息和所述第二访问信息,确定执行所述第一指令访问寄存器与执行所述第二指令访问寄存器的时序关系。
具体实现中,第一访问信息包括在每个时钟周期内执行第一指令所访问的寄存器的第一标识信息,其中,该寄存器为在执行第一指令过程中向其中写入数据的寄存器。第二访问信息包括执行第二指令所访问的寄存器的第二标识信息、以及执行第二指令访问寄存器的时间信息,其中,该寄存器也为在执行第二指令过程中向其中写入数据的寄存器。
因此,可以首先根据第二访问信息中的时间信息,确定多个时钟周期中的目标时钟周期,其中,该目标时钟周期为执行第二指令访问寄存器的时钟周期,例如:时间信息为3个时钟周期后,则以当前时钟周期为参考,可以确定目标时钟周期为当前时钟周期之后的第3个时钟周期。接着确定在目标时钟周期内执行第一指令所访问的寄存器的第一标识信息与执行第二指令所访问的寄存器的第二标识信息是否相同,当在目标时钟周期内的第一标识和第二标识信息相同时,表示在目标时钟周期内第一指令和第二指令向同一寄存器写回数据,从而确定执行第一指令访问寄存器与执行第二指令访问寄存器的时序关系为时间冲突。也就是说,若第二指令在当前时钟周期完成DEC译码后,按照正常的流水线顺序执行该指令时,必定会在当前时钟周期之后的一个时钟周期内与第一指令发生寄存器访问时序的冲突,导致指令的执行结果错误。
可选的,考虑到必须通过数据端口才能向寄存写回数据,因此当在目标时钟周期不存在空闲的数据端口时,在执行第二指令的过程中仍无法实现向所访问的寄存器写回数据。因此,可以在确定在所述目标时钟周期内执行第一指令所访问的寄存器的第一标识信息与执行第二指令所访问的寄存器的第二标识信息是否相同之前,先确定是否有空闲的数据端口供第二指令使用。其中,第一访问信息还包括多个数据端口中的每个数据端口在每个时钟周期内的端口状态信息。于是,可以首先根据端口状态信息,确定每个数据端口在目标时钟周期内的使用状态,当多个数据端口中至少有一个数据端口处于空闲状态时,再进行第一标识信息和第二标识信息的比较。
结合表1,可以将上述确定执行第一指令向寄存器RW写入数据与执行第二指令向寄存器RW写入数据的时序关系的过程可以用(1)式所示的逻辑表达式来表示。当ww_conf=1时,确定时序关系为时间冲突。当ww_conf=0时确定时序关系正常。
ww_conf=(F[N][valid0]&F[N][valid1])|
=(F[N][valid0]&(F[N][rsel0]==RW))|
=(F[N][valid1]&(F[N][rsel1]==RW)   (1)
其中,“&”表示逻辑“与”运算,“|”表示逻辑“或”运算,“==”表示判断该表达式两边的数是否相等,“F[N][validx]”和“F[N][rselx]”表示表1中F[N]所在行的validx和rselx,N=0,1,…,6,x=0,1。
例如:通过译码得到第二指令当前时钟周期之后的2个时钟周期需要向 寄存器10写回数据,则将表1中F[2]所在行的相关信息代入(1)式中,计算得到ww_conf=0,从而表示该第二指令与F[0]、F[1]、…、F[6]均不存在寄存器写回操作上的冲突,即时序关系正常。
可选的,当确定执行第一指令访问寄存器与执行第二指令访问寄存器的时序关系为时间冲突时,处理器可以延迟执行第二指令,其中,可以在当前时钟周期通过硬件机制将第二指令卡住、使其停留在DEC阶段。
可选的,为了提高数据端口的利用率,当确定时序关系不为时间冲突时,可以根据目标时钟周期内的端口信息,选择处于空闲状态的数据端口对执行第二指令所访问的寄存器进行写入,达到快速抢占空闲端口的目的。
例如:通过译码得到第二指令当前时钟周期之后的2个时钟周期需要向寄存器10写回数据,根据表1可知此时wp1处于空闲状态,则第二指令可以抢占该端口,将F[2][valid1]赋值为1。
可选的,在每个时钟周期结束后,可以根据在该时钟周期内确定的时序关系和第二访问信息对第一访问信息进行更新。例如:当前时钟周期为T0,T0的下一个时钟周期为T1,则在T0结束、T1到来时,则可以在T1内根据T0内确定的时序关系和第二访问信息,更新第一访问信息。
以下将结合表1来对更新第一访问信息的目的和原因进行说明:(1)若在当前时钟周期确定时序关系为时间冲突,则需要在当前时钟周期之后的时钟周期继续进行时序关系的检测,直到时间冲突解除就将第二指令发往RF阶段。因此在当前时钟周期结束后,需要对表1进行更新。此外,在当前时钟周期指令2会对空闲数据端口进行抢占也是表1需要更新的原因。(2)若在当前时钟周期确定时序关系不为时间冲突,也就是说第二指令不需要延迟执行。但是在当前时钟周期结束、下一个时钟周期到来时,处理器又会接收新的待执行指令。为了对新的待执行指令进行时间冲突的检测,需要在表1中增加第二指令的第二访问信息中对应的信息,并且当前时钟周期的结束也是表1需要更新的原因。
其中,第一访问信息的更新(表1的更新)包括以下3种情况:
第一种情况:若DEC译码之后,确定第二指令不需要向寄存器写回数据,则无论时序关系是否为时间冲突,都按以下规则更新:
a)F[M]=F[M+1],M=0,1,2,3,4,5
b)F[M]=0,M=6
第二种情况:若DEC译码之后,确定第二指令需要在N个时钟周期之后向寄存器RW写回数据、且时序关系为时间冲突,则按以下规则更新:
a)F[M]=F[M+1],M=0,1,2,3,4,5
b)F[M]=0,M=6
第三种情况:若DEC译码之后,确定第二指令需要在N个时钟周期之后向寄存器RW写回数据、且时序关系正常,则按以下规则更新:
1)若F[N][valid0]==1,则
a)F[M][valid1]=1,F[M][rsel1]=RW,M=N-1
b)F[M]=F[M+1],M!=N-1
c)F[M]=0,M!=N-1,M!=6
2)若F[N][valid0]==0,则
a)F[M][valid0]=1,F[M][rsel0]=RW,M=N-1
b)F[M]=F[M+1],M!=N-1
c)F[M]=0,M!=N-1,M!=6
在本申请实施例中,获取在多个时钟周期中的每个时钟周期内执行第一指令访问寄存器的第一访问信息,所述第一指令为已确定在所述每个时钟周期到来时开始生效的指令;当接收到第二指令时,确定执行所述第二指令访问寄存器的第二访问信息;然后根据第二访问信息中包括的执行第二指令向寄存器写入的数据的目标时钟周期;最后将第一访问信息中包括的在目标每个时钟周期内执行第一指令时所需写入的寄存器的第一标识信息、与第二访问信息中包括的在目标每个时钟周期内执行第二指令时所需写入的寄存器的第二标识信息,当第一标识信息与第二标识信息相同时,表示第一指令和第二指令需要在同一时钟周期向同一个寄存器写入数据,造成时间冲突。若存在时间冲突,则通过硬件机制将敌人指令卡住不发往RF阶段,直到时间冲突解除。综上所述,本申请实施例中的方法通过实时维护一个硬件信息表,来保存每个时钟周期内的各个数据端口的状态信息和寄存器的访问信息,可以在DEC译码阶段实现冲突检测,实现逻辑简单、硬件设计复杂度低、且所需功耗小。此外还可以通过获取到的寄存器访问信息快速抢占空闲数据端口,提高数据端口的利用率。
请参考图3,图3是本申请实施例提供的另一种寄存器访问时序的管理方法的流程示意图。本申请实施例针对一条指令在某个时钟周期需要从寄存器R 中读取另一条指令的运算结果,但该运算结果在该时钟周期到来之前还未被写回寄存器R的情况,执行主体为处理器。如图所示,本申请实施例中的方法包括:
S301,获取在当前时钟周期和当前时钟周之后至少一个时钟周期中每个时钟周期内执行第一指令访问寄存器的第一访问信息。其中,所述第一指令为已确定在每个时钟周期到来时开始生效的指令。
具体实现中,时钟周期是处理器中的基本时间单位,一个时钟周期的时间长度等于处理器主频的倒数。首先可以在当前时钟周期中,确定当前时钟周期以及当前时钟周期之后的至少一个时钟周期内将要执行的指令,其中,在每个时钟周期到来时开始生效的指令就是在该时钟周期内所要执行的指令,并且在每个时钟周期到来时开始生效的指令可以有一个或多个。其中,可以根据指令在DEC译码阶段的译码结果,确定该指令开始生效的时钟周期,从而确定在每个时钟周期内生效的指令。译码结果中还包括在执行该指令过程中访问寄存器的访问信息。针对每个时钟周期,访问信息可以包括执行在该时钟周期内生效的指令所要访问的寄存器的标识信息。其中,本申请实施例中主要关注在每个时钟周期内执行对应的生效指令时需向其中写入数据的寄存器。
S302,当接收到第二指令时,确定执行第二指令时访问寄存器的第二标识信息。
具体实现中,处理器在执行已经确定了执行顺序和执行时间的指令的同时,还可以继续接收应用程序或系统输入的新的待执行指令。通常,会立即对该指令进行DEC译码以得到执行该指令访问寄存器的第二访问信息,不存在延迟的情况,因此第二指令译码发生在当前时钟周期。其中,该第二访问信息可以包括在执行第二指令的过程中需要从其中读取数据的寄存器的标识信息。
S303,根据第二访问信息,确定在当前时钟周期的下一个时钟周期执行第二指令所访问的寄存器的第二标识信息。
具体实现中,在当前时钟周期第二指令处于DEC阶段,若按照正常的流水线技术执行第二指令时,第二指令应当在当前时钟周期的下一个时钟周期进入RF阶段,并在RF阶段读取该指令的操作数。例如;针对指令a=x+y,x和y为该指令的操作数。其中,第二访问信息中包括在RF阶段第二指令所需读取的寄存器的第二标识信息。
S304,确定在至少一个时钟周期中的每个时钟周期内执行第一指令所访问的寄存器的第一标识信息与执行第二指令所访问的寄存器的第二标识信息是否相同。若是,则执行S305。若否,则确定执行第一指令访问寄存器与执行第二指令访问寄存器之间的时序关系正常。
具体实现中,考虑到在当前时钟周期内第二指令处于DEC阶段、且对于所有指令来说在DEC阶段不存在针对寄存器的读取操作,因此在当前时钟周期内存在的针对任何寄存器的写入操作都不会与第二指令在下一个时钟周期内的寄存器读取操作产生时间冲突。从而可以将在当前时钟周期之后的至少一个时钟周期内执行第一指令所需写入的寄存器的第一标识信息与执行第二指令所读取的寄存器的第二标识信息进行比较。其中,若第二标识信息与当前时钟周期之后的至少一个时钟周期中的每个时钟周期中的第一标识信息都不相同时,则说明在下一个时钟周期之前第二指令所需读取的数据已被写入寄存器,第二指令在下一个时钟周期进入RF阶段后,可以从对应的寄存器中读取到正确的操作数。
可选的,当某个时钟周期内不存在针对寄存器的写回操作时,表示该时钟周期内的第一指令与第二指令所需的操作数无关。因此,为了提高冲突检测效率,可以首先根据第一访问信息中的端口状态信息,确定每个数据端口在至少一个时钟周期中的每个时钟周期内的使用状态;接着将至少一个时钟周期中多个数据端口中的至少一个数据端口处于占用状态的时钟周期作为目标时钟周期;然后确定在目标时钟周期内执行第一指令所访问的寄存器的第一标识信息与执行第二指令所访问的寄存器的第二标识信息是否相同。
例如:如表1所示,在F[3]、F[5]和F[6]对应的时钟周期中,wp0和wp1都处于空闲状态,则只需确定F[1]、F[2]和F[4]中对应valid为1的寄存器的标识信息与第二指令所要读取的寄存器的标识信息是否相同即可。
S305,确定执行第一指令访问寄存器与执行第二指令访问寄存器之间的时序关系为时间冲突。
具体实现中,当第二标识信息与所述至少一个时钟周期中的一个或一个以上时钟周期对应的第一标识信息相同时,说明在下一个时钟周期之前第二指令所需读取的数据还未被写入对应寄存器。若第二指令可以在下一个时钟周期进入RF阶段,则从对应寄存器中读取的操作数不是正确的操作数,从而导致第二 指令的执行结果错误。因此,可以在当前时钟周期通过硬件机制将第二指令卡住,以便延迟将第二指令发往RF阶段的时间,直到冲突解除。
结合表1,可以将上述执行第一指令向寄存器写入数据与执行第二指令从寄存器中读取数据的时序关系的检测过程可以用(2)-(4)式所示的逻辑表达式来表示。其中,假设执行第一指令所需写入的寄存器为RW,执行第二指令所需读取的寄存器为RX和RY。当rw_conf=1时,确定时序关系为时间冲突。当rw_conf=0时确定时序关系正常。
rw_conf0=(F[1][valid0]&(F[1][rsel0]==RX|F[1][rsel0]==RY))|
=(F[2][valid0]&(F[2][rsel0]==RX|F[2][rsel0]==RY))|
=(F[3][valid0]&(F[3][rsel0]==RX|F[3][rsel0]==RY))|
=(F[4][valid0]&(F[4][rsel0]==RX|F[4][rsel0]==RY))|
=(F[5][valid0]&(F[5][rsel0]==RX|F[5][rsel0]==RY))|
=(F[6][valid0]&(F[6][rsel0]==RX|F[6][rsel0]==RY))|  (2)
rw_conf1=(F[1][valid1]&(F[1][rsel1]==RX|F[1][rsel1]==RY))|
=(F[2][valid1]&(F[2][rsel1]==RX|F[2][rsel1]==RY))|
=(F[3][valid1]&(F[3][rsel1]==RX|F[3][rsel1]==RY))|
=(F[4][valid1]&(F[4][rsel1]==RX|F[4][rsel1]==RY))|
=(F[5][valid1]&(F[5][rsel1]==RX|F[5][rsel1]==RY))|
=(F[6][valid1]&(F[6][rsel1]==RX|F[6][rsel1]==RY))|  (3)
rw_conf=rw_conf0|rw_conf1  (4)
例如:通过译码得到第二指令当前时钟周期需要从寄存器7和寄存器10中读取操作数。则将表1中F[1]、F[2]、…、F[6]的相关信息分别代入(2)式和(3)式中,计算得到rw_conf0=0、rw_conf1=1,从而得到rw_conf=1,表示若第二指令在下一个时钟周期进入RF阶段,则F[2]还未将正确的操作数写入寄存器7,因此,第二指令在下一个时钟周期从寄存器7中读取的操作数是错误操作数。
其中,若在DEC阶段也存在针对寄存器的读取操作,则在(2)式和(3)式中增加F[0]的相关信息即可。
可选的,在每个时钟周期结束后,需要根据在该时钟周期内确定的时序关系和第二访问信息对第一访问信息进行更新。例如:当前时钟周期为T0,T0的下一个时钟周期为T1,则在T0结束、T1到来时,则可以在T1内根据T0内 确定的时序关系和第二访问信息,更新第一访问信息。
在本申请实施例中,获取在多个时钟周期中的每个时钟周期内执行第一指令时向其中写入数据的寄存器的第一标识信息,所述第一指令为已确定在所述每个时钟周期到来时开始生效的指令;当接收到第二指令时,确定执行所述第二指令时从其中读取数据的寄存器的第二标识信息;然后确定第二标识信息与当前时钟周期之后的至少一个时钟周期中的每个时钟周期对应的第一标识信息是否相同,当第二标识信息与一个或多个时钟周期对应的第一标识相同时,表示若第二指令可以在当前时钟周期的下一个时钟周期进入RF阶段,则从对应寄存器中读取的操作数不是正确的操作数误,即寄存器的读写操作存在时间冲突。则在当前时钟周期通过硬件机制将第二指令卡住不发往RF阶段,直到时间冲突解除。综上所述,本申请实施例中的方法通过实时维护一个硬件信息表,来保存每个时钟周期内的各个数据端口的状态信息和寄存器的访问信息,可以在DEC译码阶段实现冲突检测,实现逻辑简单、硬件设计复杂度低、且所需功耗小。
请参考图4,图4是本申请实施例提供的一种处理器的结构示意图。如图所示,本申请实施例中的处理器包括:
获取模块401,用于获取在多个时钟周期中的每个时钟周期内执行第一指令访问寄存器的第一访问信息。其中,所述第一指令为已确定在所述每个时钟周期到来时开始生效的指令。
具体实现中,时钟周期是处理器中的基本时间单位,一个时钟周期的时间长度等于处理器主频的倒数。首先可以在当前时钟周期中,确定当前时钟周期以及当前时钟周期之后的至少一个时钟周期内将要执行的指令,其中,在每个时钟周期到来时开始生效的指令就是在该时钟周期内所要执行的指令,并且在每个时钟周期到来时开始生效的指令可以有一个或多个。其中,根据指令在DEC译码阶段的译码结果,确定该指令开始生效的时钟周期,从而确定在每个时钟周期内生效的指令,并且译码结果中还包括在执行该指令过程中访问寄存器的访问信息。针对每个时钟周期,访问信息可以包括执行在该时钟周期内生效的指令所要访问的寄存器的标识信息、以及访问该寄存器所使用的数据端口。其中,本申请实施例中主要关注在每个时钟周期内执行生效指令时需向其中写入数据的寄存器。
译码模块402,用于当接收到第二指令时,确定执行所述第二指令访问寄存器的第二访问信息。
具体实现中,处理器在执行已经确定了执行顺序和时钟周期的指令的同时,还可以继续接收应用程序或系统输入的新的待执行指令。通常,当在当前时钟周期接收到新的待执行指令时,会立即对该指令进行DEC译码以得到执行该指令访问寄存器的第二访问信息,不存在延迟的情况,因此第二指令译码发生在当前时钟周期。其中,第二访问信息可以包括在执行第二指令的过程中需要向其中写入数据的寄存器的标识信息,以及该写入操作发生的时间信息(如N个时钟周期后)。第二访问信息还可以包括在执行第二指令的过程中需要从其中读取数据的寄存器的标识信息
检测模块403,用于根据所述第一访问信息和所述第二访问信息,确定执行所述第一指令访问寄存器与执行所述第二指令访问寄存器的时序关系。
具体实现中,第一访问信息包括在每个时钟周期内执行第一指令所访问的寄存器的第一标识信息,其中,该寄存器为在执行第一指令过程中向其中写入数据的寄存器。第二访问信息包括执行第二指令所访问的寄存器的第二标识信息、以及执行第二指令访问寄存器的时间信息,其中,该寄存器也为在执行第二指令过程中向其中写入数据的寄存器。
因此,可以首先根据第二访问信息中的时间信息,确定多个时钟周期中的目标时钟周期,其中,该目标时钟周期为执行第二指令访问寄存器的时钟周期,例如:时间信息为3个时钟周期后,则以当前时钟周期为参考,可以确定目标时钟周期为当前时钟周期之后的第3个时钟周期。接着确定在目标时钟周期内执行第一指令所访问的寄存器的第一标识信息与执行第二指令所访问的寄存器的第二标识信息是否相同,当在目标时钟周期内的第一标识和第二标识信息相同时,表示在目标时钟周期内第一指令和第二指令向同一寄存器写回数据,从而确定执行第一指令访问寄存器与执行第二指令访问寄存器的时序关系为时间冲突。也就是说,若第二指令在当前时钟周期完成译码后,按照正常的流水线顺序执行该指令时,必定会在当前时钟周期之后的一个时钟周期内与第一指令发生寄存器访问时序的冲突,导致指令的执行结果错误。
可选的,考虑到必须通过数据端口才能向寄存写回数据,因此当在目标时钟周期不存在空闲的数据端口时,在执行第二指令的过程中仍无法实现向所访 问的寄存器写回数据。因此,可以在确定在所述目标时钟周期内执行第一指令所访问的寄存器的第一标识信息与执行第二指令所访问的寄存器的第二标识信息是否相同之前,可以先确定是否有空闲的数据端口供第二指令使用。其中,第一访问信息还包括多个数据端口中的每个数据端口在每个时钟周期内的端口状态信息。于是,可以首先根据端口状态信息,确定每个数据端口在目标时钟周期内的使用状态,当多个数据端口中至少有一个数据端口处于空闲状态时,再进行第一标识信息和第二标识信息的比较。
可选的,当确定执行第一指令访问寄存器与执行第二指令访问寄存器的时序关系为时间冲突时,处理器可以延迟执行第二指令,其中,可以在当前时钟周期通过硬件机制将第二指令卡住、使其停留在DEC阶段。
可选的,为了提高数据端口的利用率,可以当确定时序关系不为时间冲突时,则可以根据目标时钟周期内的端口信息,选择处于空闲状态的数据端口对执行第二指令所访问的寄存器进行写入,达到快速抢占空闲端口的目的。
可选的,在每个时钟周期结束后,可以根据在该时钟周期内确定的时序关系和第二访问信息对第一访问信息进行更新。例如:当前时钟周期为T0,T0的下一个时钟周期为T1,则在T0结束、T1到来时,则可以在T1内根据T0内确定的时序关系和第二访问信息,更新第一访问信息。
此外,在当前时钟周期第二指令处于DEC阶段,若按照正常的流水线技术执行第二指令时,第二指令应当在当前时钟周期的下一个时钟周期进入RF阶段,并在RF阶段读取该指令的操作数。例如;针对指令a=x+y,x和y为该指令的操作数。其中,第二访问信息中包括在RF阶段第二指令所需读取的寄存器的第二标识信息。因此检测模块403还用于:
首先,确定在至少一个时钟周期中的每个时钟周期内执行第一指令所访问的寄存器的第一标识信息与执行第二指令所访问的寄存器的第二标识信息是否相同。具体的,考虑到在当前时钟周期内第二指令处于DEC阶段、且对于所有指令来说在DEC阶段不存在针对寄存器的读取操作,因此在当前时钟周期内存在的针对任何寄存器的写入操作都不会与第二指令在下一个时钟周期内的寄存器读取操作产生时间冲突。从而可以将在当前时钟周期之后的至少一个时钟周期内执行第一指令所需写入的寄存器的第一标识信息与执行第二指令所读取的寄存器的第二标识信息进行比较。其中,若第二标识信息与当前时钟周期之后 的至少一个时钟周期中的每个时钟周期中的第一标识信息都不相同时,则说明在下一个时钟周期之前第二指令所需读取的数据已被写入寄存器,第二指令可以在下一个时钟周期进入RF阶段后,可以从对应的寄存器中读取到正确的操作数。
可选的,当某个时钟周期内不存在针对寄存器的写回操作时,表示该时钟周期内的第一指令与第二指令所需的操作数无关。因此,为了提高冲突检测效率,可以首先根据第一访问信息中的端口状态信息,确定每个数据端口在至少一个时钟周期中的每个时钟周期内的使用状态;接着将至少一个时钟周期中多个数据端口中的至少一个数据端口处于占用状态的时钟周期作为目标时钟周期;然后确定在目标时钟周期内执行第一指令所访问的寄存器的第一标识信息与执行第二指令所访问的寄存器的第二标识信息是否相同。
当第二标识信息与所述至少一个时钟周期中的一个或一个以上时钟周期对应的第一标识信息相同时,说明在下一个时钟周期之前第二指令所需读取的数据还未被写入对应寄存器。若第二指令可以在下一个时钟周期进入RF阶段,则从对应寄存器中读取的操作数不是正确的操作数,从而导致第二指令的置信结果错误。因此,可以在当前时钟周期通过硬件机制将第二指令卡住,以便延迟将第二指令发往RF阶段的时间,直到冲突解除。
结合表1,可以将上述执行第一指令向寄存器写入数据与执行第二指令从寄存器中读取数据的时序关系的过程可以用(2)-(4)式所示的逻辑表达式来表示。其中,假设执行第一指令所需写入的寄存器为RW,执行第二指令所需读取的寄存器为RX和RY。当rw_conf=1时,确定时序关系为时间冲突。当rw_conf=0时确定时序关系正常。其中,若在DEC阶段存在针对寄存器的读取操作,则在(2)式和(3)式中增加F[0]的相关信息即可。
在本申请实施例中,首先获取在多个时钟周期中的每个时钟周期内执行第一指令访问寄存器的第一访问信息,所述第一指令为已确定在所述每个时钟周期到来时开始生效的指令;接着当接收到第二指令时,确定执行所述第二指令访问寄存器的第二访问信息;然后根据所述第一访问信息和所述第二访问信息,确定执行所述第一指令访问寄存器与执行所述第二指令访问寄存器之间的时序关系。可以降低处理器硬件设计的复杂度、提高寄存器的资源利用率。
请参考图5,图5是本申请实施例提供的一种电子设备的结构示意图。如 图所示,该电子设备可以包括:至少一个处理器501,例如CPU,至少一个通信接口502,至少一个存储器503,至少一个总线504。其中,总线504用于实现这些组件之间的连接通信。其中,本申请实施例中电子设备的通信接口502是有线发送端口,也可以为无线设备,例如包括天线装置,用于与其他节点设备进行信令或数据的通信。存储器503可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器503可选的还可以是至少一个位于远离前述处理器501的存储装置。存储器503中存储一组程序代码,且处理器501用于调用存储器中存储的程序代码,用于执行以下操作:
获取在多个时钟周期中的每个时钟周期内执行第一指令访问寄存器的第一访问信息,所述第一指令为已确定在所述每个时钟周期到来时开始生效的指令;
当接收到第二指令时,确定执行所述第二指令访问寄存器的第二访问信息;
根据所述第一访问信息和所述第二访问信息,确定执行所述第一指令访问寄存器与执行所述第二指令访问寄存器之间的时序关系。
其中,处理器501还用于执行如下操作步骤:
当所述时序关系为时间冲突时,延迟执行所述第二指令。
其中,述第一访问信息包括在每个时钟周期内执行第一指令所访问的寄存器的第一标识信息;所述第二访问信息包括执行所述第二指令所访问的寄存器的第二标识信息、以及执行所述第二指令访问寄存器的时间信息;
处理器501还用于执行如下操作步骤:
根据所述时间信息,确定所述多个时钟周期中的目标时钟周期,所述目标时钟周期为执行所述第二指令访问寄存器的时钟周期;
确定在所述目标时钟周期内执行所述第一指令所访问的寄存器的第一标识信息与执行所述第二指令所访问的寄存器的第二标识信息是否相同;
当所述第一标识信息与所述第二标识信息相同时,确定所述时序关系为时间冲突。
其中,所述第一访问信息还包括多个数据端口中的每个数据端口在所述每个时钟周期内的端口状态信息,所述每个数据端口用于对寄存器进行访问;
处理器501还用于执行如下操作步骤:
根据所述端口状态信息,确定所述每个数据端口在所述目标时钟周期内的 使用状态;
当所述多个数据端口中的至少一个数据端口所述目标时钟周期内处于空闲状态时,执行所述确定在所述目标时钟周期内执行所述第一指令所访问的寄存器的第一标识信息与执行所述第二指令所访问的寄存器的第二标识信息是否相同的操作。
其中,处理器501还用于执行如下操作步骤:
当所述时序关系不为所述时间冲突时,根据所述端口状态信息,选择处于所述空闲状态的数据端口对执行所述第二指令所访问的寄存器进行写入。
其中,所述第一访问信息包括在所述每个时钟周期内的执行所述第一指令所访问的寄存器的第一标识信息;
所述多个时钟周期包括当前时钟周期和所述当前时钟周期之后的至少一个时钟周期,所述当前时钟周期为译码所述第二指令所发生的时钟周期;
处理器501还用于执行如下操作步骤:
根据所述第二访问信息,确定在所述当前时钟周期的下一个时钟周期执行所述第二指令所访问的寄存器的第二标识信息;
确定在所述至少一个时钟周期中的每个时钟周期内执行所述第一指令所访问的寄存器的第一标识信息与执行所述第二指令所访问的寄存器的第二标识信息是否相同;
当所述第一标识信息与所述第二标识信息相同时,确定所述时序关系为时间冲突。
其中,所述第一访问信息还包括多个数据端口中的每个数据端口在所述每个时钟周期内的端口状态信息,所述每个数据端口用于对寄存器进行访问;
处理器501还用于执行如下操作步骤:
根据所述端口状态信息,确定所述每个数据端口在所述至少一个时钟周期中的每个时钟周期内的使用状态;
将所述至少一个时钟周期中所述多个数据端口中的至少一个数据端口处于占用状态的时钟周期作为目标时钟周期;
确定在所述目标时钟周期内执行所述第一指令所访问的寄存器的第一标识信息与执行所述第二指令所访问的寄存器的第二标识信息是否相同。
其中,处理器501还用于执行如下操作步骤:
在所述每个时钟周期结束后,根据所述第二访问信息和所述时序关系更新所述第一访问信息。
需要说明的是,本申请实施例同时也提供了一种存储介质,该存储介质用于存储应用程序,该应用程序用于在运行时执行图2和图3所示的一种寄存器访问时序的管理方法中电子设备执行的操作。
需要说明的是,本申请实施例同时也提供了一种应用程序,该应用程序用于在运行时执行图2和图3所示的一种寄存器访问时序的管理方法中电子设备执行的操作。

Claims (10)

  1. 一种寄存器访问时序的管理方法,其特征在于,所述方法包括:
    获取在多个时钟周期中的每个时钟周期内执行第一指令访问寄存器的第一访问信息,所述第一指令为已确定在所述每个时钟周期到来时开始生效的指令;
    当接收到第二指令时,确定执行所述第二指令访问寄存器的第二访问信息;
    根据所述第一访问信息和所述第二访问信息,确定执行所述第一指令访问寄存器与执行所述第二指令访问寄存器之间的时序关系。
  2. 如权利要求1所述的方法,其特征在于,所述确定执行所述第一指令访问寄存器与执行所述第二指令访问寄存器之间的时序关系之后,还包括:
    当所述时序关系为时间冲突时,延迟执行所述第二指令。
  3. 如权利要求1所述的方法,其特征在于,所述第一访问信息包括在每个时钟周期内执行第一指令所访问的寄存器的第一标识信息;所述第二访问信息包括执行所述第二指令所访问的寄存器的第二标识信息、以及执行所述第二指令访问寄存器的时间信息;
    所述根据所述第一访问信息和所述第二访问信息,确定执行所述第一指令访问寄存器与执行所述第二指令访问寄存器之间的时序关系包括:
    根据所述时间信息,确定所述多个时钟周期中的目标时钟周期,所述目标时钟周期为执行所述第二指令访问寄存器的时钟周期;
    确定在所述目标时钟周期内执行所述第一指令所访问的寄存器的第一标识信息与执行所述第二指令所访问的寄存器的第二标识信息是否相同;
    当所述第一标识信息与所述第二标识信息相同时,确定所述时序关系为时间冲突。
  4. 如权利要求3所述的方法,其特征在于,所述第一访问信息还包括多个数据端口中的每个数据端口在所述每个时钟周期内的端口状态信息,所述每个数据端口用于对寄存器进行访问;
    所述确定在所述目标时钟周期内执行所述第一指令所访问的寄存器的第一标识信息与执行所述第二指令所访问的寄存器的第二标识信息是否相同之前, 还包括:
    根据所述端口状态信息,确定所述每个数据端口在所述目标时钟周期内的使用状态;
    当所述多个数据端口中的至少一个数据端口所述目标时钟周期内处于空闲状态时,执行所述确定在所述目标时钟周期内执行所述第一指令所访问的寄存器的第一标识信息与执行所述第二指令所访问的寄存器的第二标识信息是否相同的操作。
  5. 如权利要求4所述的方法,其特征在于,所述方法还包括:
    当所述时序关系不为时间冲突时,根据所述端口状态信息,选择处于所述空闲状态的数据端口对执行所述第二指令所访问的寄存器进行写入。
  6. 如权利要求1所述的方法,其特征在于,所述第一访问信息包括在所述每个时钟周期内的执行所述第一指令所访问的寄存器的第一标识信息;
    所述多个时钟周期包括当前时钟周期和所述当前时钟周期之后的至少一个时钟周期,所述当前时钟周期为译码所述第二指令所发生的时钟周期;
    所述根据所述第一访问信息和所述第二访问信息,确定执行所述第一指令访问寄存器与执行所述第二指令访问寄存器之间的时序关系包括:
    根据所述第二访问信息,确定在所述当前时钟周期的下一个时钟周期执行所述第二指令所访问的寄存器的第二标识信息;
    确定在所述至少一个时钟周期中的每个时钟周期内执行所述第一指令所访问的寄存器的第一标识信息与执行所述第二指令所访问的寄存器的第二标识信息是否相同;
    当所述第一标识信息与所述第二标识信息相同时,确定所述时序关系为时间冲突。
  7. 如权利要求6所述的方法,其特征在于,所述第一访问信息还包括多个数据端口中的每个数据端口在所述每个时钟周期内的端口状态信息,所述每个数据端口用于对寄存器进行访问;
    所述确定在所述至少一个时钟周期中的每个时钟周期内执行所述第一指令所访问的寄存器的第一标识信息与执行所述第二指令所访问的寄存器的第二标识信息是否相同包括:
    根据所述端口状态信息,确定所述每个数据端口在所述至少一个时钟周期中的每个时钟周期内的使用状态;
    将所述至少一个时钟周期中所述多个数据端口中的至少一个数据端口处于占用状态的时钟周期作为目标时钟周期;
    确定在所述目标时钟周期内执行所述第一指令所访问的寄存器的第一标识信息与执行所述第二指令所访问的寄存器的第二标识信息是否相同。
  8. 如权利要求1-7任一项所述的方法,其特征在于,所述方法还包括:
    在所述每个时钟周期结束后,根据所述第二访问信息和所述时序关系更新所述第一访问信息。
  9. 一种处理器,其特征在于,所述处理器包括:
    获取模块,用于获取在多个时钟周期中的每个时钟周期内执行第一指令访问寄存器的第一访问信息,所述第一指令为已确定在所述每个时钟周期到来时开始生效的指令;
    译码模块,用于当接收到第二指令时,确定执行所述第二指令访问寄存器的第二访问信息;
    检测模块,用于根据所述第一访问信息和所述第二访问信息,确定执行所述第一指令访问寄存器与执行所述第二指令访问寄存器之间的时序关系。
  10. 一种电子设备,其特征在于,包括:处理器、存储器、通信接口和总线;
    所述处理器、所述存储器和所述通信接口通过所述总线连接并完成相互间的通信;
    所述存储器存储可执行程序代码;
    所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可 执行程序代码对应的程序,以用于执行如权利要求1-8任一项所述的寄存器访问时序的管理方法。
PCT/CN2019/114336 2018-11-26 2019-10-30 寄存器访问时序的管理方法、处理器、电子设备及计算机可读存储介质 WO2020108212A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811417048.7A CN111221573B (zh) 2018-11-26 2018-11-26 一种寄存器访问时序的管理方法、处理器、电子设备及计算机可读存储介质
CN201811417048.7 2018-11-26

Publications (1)

Publication Number Publication Date
WO2020108212A1 true WO2020108212A1 (zh) 2020-06-04

Family

ID=70826989

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/114336 WO2020108212A1 (zh) 2018-11-26 2019-10-30 寄存器访问时序的管理方法、处理器、电子设备及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN111221573B (zh)
WO (1) WO2020108212A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112905995B (zh) * 2021-02-05 2022-08-05 电子科技大学 一种处理器内部寄存器组异常行为实时检测方法及系统
CN117008977B (zh) * 2023-08-08 2024-03-19 上海合芯数字科技有限公司 一种可变执行周期的指令执行方法、系统和计算机设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050108715A1 (en) * 2003-09-26 2005-05-19 Tatsunori Kanai Method and system for performing real-time operation
CN1673954A (zh) * 2004-03-24 2005-09-28 华为技术有限公司 多进程下寄存器文件置入操作时保持数据一致性的方法
CN101667448A (zh) * 2008-09-04 2010-03-10 奕力科技股份有限公司 存储器存取控制装置及其相关控制方法
CN102955709A (zh) * 2011-08-18 2013-03-06 富士通株式会社 校正设备、校正方法和计算机产品

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6453370B1 (en) * 1998-11-16 2002-09-17 Infineion Technologies Ag Using of bank tag registers to avoid a background operation collision in memory systems
CN1331053C (zh) * 2004-02-12 2007-08-08 华为技术有限公司 一种旗标寄存器和避免多进程间资源访问冲突的方法
US8725991B2 (en) * 2007-09-12 2014-05-13 Qualcomm Incorporated Register file system and method for pipelined processing
US10678544B2 (en) * 2015-09-19 2020-06-09 Microsoft Technology Licensing, Llc Initiating instruction block execution using a register access instruction
US10871967B2 (en) * 2015-09-19 2020-12-22 Microsoft Technology Licensing, Llc Register read/write ordering
US10031677B1 (en) * 2015-10-14 2018-07-24 Rambus Inc. High-throughput low-latency hybrid memory module
CN106610816B (zh) * 2016-12-29 2018-10-30 山东师范大学 一种risc-cpu中指令集之间冲突的规避方法及系统
CN107589960B (zh) * 2017-08-30 2020-07-24 北京轩宇信息技术有限公司 一种基于寄存器访问冲突检测的dsp指令模拟方法
CN108733415B (zh) * 2018-05-16 2021-03-16 中国人民解放军国防科技大学 支持向量随机访存的方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050108715A1 (en) * 2003-09-26 2005-05-19 Tatsunori Kanai Method and system for performing real-time operation
CN1673954A (zh) * 2004-03-24 2005-09-28 华为技术有限公司 多进程下寄存器文件置入操作时保持数据一致性的方法
CN101667448A (zh) * 2008-09-04 2010-03-10 奕力科技股份有限公司 存储器存取控制装置及其相关控制方法
CN102955709A (zh) * 2011-08-18 2013-03-06 富士通株式会社 校正设备、校正方法和计算机产品

Also Published As

Publication number Publication date
CN111221573A (zh) 2020-06-02
CN111221573B (zh) 2022-03-25

Similar Documents

Publication Publication Date Title
JP5043560B2 (ja) プログラム実行制御装置
US10127043B2 (en) Implementing conflict-free instructions for concurrent operation on a processor
WO2015024452A1 (zh) 一种分支预测方法及相关装置
US10481957B2 (en) Processor and task processing method therefor, and storage medium
US11327765B2 (en) Instruction processing apparatuses, processors, and processing methods
WO2020108212A1 (zh) 寄存器访问时序的管理方法、处理器、电子设备及计算机可读存储介质
US20160011874A1 (en) Silent memory instructions and miss-rate tracking to optimize switching policy on threads in a processing device
CN112214241A (zh) 一种分布式指令执行单元的方法及系统
CN110806899B (zh) 一种基于指令扩展的流水线紧耦合加速器接口结构
RU2597506C2 (ru) Неограниченная транзакционная память с гарантиями продвижения при пересылке, используя аппаратную глобальную блокировку
US7681022B2 (en) Efficient interrupt return address save mechanism
US20140059326A1 (en) Calculation processing device and calculation processing device controlling method
US20230168927A1 (en) Method and apparatus for adjusting instruction pipeline, memory and storage medium
WO2021037124A1 (zh) 一种任务处理的方法以及任务处理装置
JP2020086897A (ja) 演算処理装置及び演算処理装置の制御方法
US11467844B2 (en) Storing multiple instructions in a single reordering buffer entry
WO2022036690A1 (zh) 一种图计算装置、处理方法及相关设备
US10824431B2 (en) Releasing rename registers for floating-point operations
WO2016201699A1 (zh) 指令处理方法及设备
US20070043930A1 (en) Performance of a data processing apparatus
CN111857830A (zh) 一种提前转发指令数据的通路设计方法、系统及存储介质
CN116841614B (zh) 乱序访存机制下的顺序向量调度方法
CN113703841B (zh) 一种寄存器数据读取的优化方法、装置及介质
CN110347400B (zh) 编译加速方法、路由单元和缓存
JP2003140910A (ja) Vliwプロセッサにおけるバイナリトランスレーション方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19888663

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19888663

Country of ref document: EP

Kind code of ref document: A1