WO2018040494A1 - Method and device for extending processor instruction set - Google Patents

Method and device for extending processor instruction set Download PDF

Info

Publication number
WO2018040494A1
WO2018040494A1 PCT/CN2017/071776 CN2017071776W WO2018040494A1 WO 2018040494 A1 WO2018040494 A1 WO 2018040494A1 CN 2017071776 W CN2017071776 W CN 2017071776W WO 2018040494 A1 WO2018040494 A1 WO 2018040494A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
processor core
extended
monitoring module
module
Prior art date
Application number
PCT/CN2017/071776
Other languages
French (fr)
Chinese (zh)
Inventor
李延松
吴求应
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018040494A1 publication Critical patent/WO2018040494A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • G06F9/30189Instruction operation extension or modification according to execution mode, e.g. mode flag
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/30101Special purpose registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/32Address formation of the next instruction, e.g. by incrementing the instruction counter
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/32Address formation of the next instruction, e.g. by incrementing the instruction counter
    • G06F9/321Program or instruction counter, e.g. incrementing

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a method and apparatus for extending a processor instruction set.
  • a coprocessor is used to execute a dedicated instruction such as a floating point instruction, that is, the processor needs to send a dedicated instruction to the coprocessor for execution.
  • the advantage of this type of processing is that it is simple to program and can directly use coprocessor instructions.
  • the disadvantage is that the processor must support this special instruction.
  • not all processor cores support this function for example, the manufacturer uses third-party processing.
  • the kernel can't support the coprocessor's operation by designing its own processor.
  • the hardware acceleration module is used as a peripheral of the processor, and the processor sends the data to the acceleration module through PCIe (English: Peripheral Component Interconnect express, Chinese: PCI fast channel) interface, and the acceleration module processes the processing result.
  • PCIe Peripheral Component Interconnect express
  • the device is stored in memory, and when the processor needs to access the processing result, the processing result is read from the memory.
  • This processing method is more flexible in the implementation process because the acceleration module is decoupled from the processor, but there is frequent data interaction between the processor and the acceleration module, which reduces the processing performance of the service.
  • the present invention provides a method and apparatus for extending a processor instruction set, which can improve the processing speed of the processor without modifying the processor core.
  • the invention provides a method of extending a processor instruction set.
  • the method is for a chip comprising a processor core, a monitoring module and at least one execution module for executing an extended instruction, the monitoring module and at least one execution module for executing the extended instruction being implemented by programmable logic.
  • the method for extending the processor instruction set includes: the monitoring module identifies an instruction by an on-chip bus, the instruction is an instruction that the processor core loads from the memory through the on-chip bus; the monitoring module saves the extended instruction in the instruction to the local memory; The on-chip bus is loaded from memory into the extended instruction Thereafter, the processor core decodes the extended instruction to generate an undefined instruction exception, which is an extended instruction that has been stored in the local memory; after the processor core executes the current instruction, the processor core executes an exception handler, the exception
  • the processing program is a program triggered by an undefined instruction exception; when the processor core executes the exception processing program, the processor core suspends executing the instruction after the extended instruction, and the execution module corresponding to the extended instruction is triggered by the monitoring module to execute the extended instruction; the monitoring module controls The processor core exits the exception handler so that the processor core continues to execute instructions following the extended instruction.
  • processor core cannot execute the extended instruction, since each extended instruction has its corresponding execution module to execute the extended instruction, even if the processor core does not have the ability to execute the extended instruction, The execution module can still be successfully executed by the execution module, and after executing the extended instruction, the processor core is triggered to continue executing the instruction after the extended instruction. In this way, it is equivalent to expanding the instruction set of the processor core, thus improving the processing power of the processor, thereby improving the processing speed of the processor.
  • each execution module can execute at least one extended instruction, and each extended instruction corresponds to one instruction address. Therefore, in order to ensure accurate determination of the extended instruction, the processor core triggers the extension instruction corresponding by the monitoring module.
  • the process of executing the extended instruction by the execution module may be specifically implemented as: the processor core sends the content of the program counter when the processor core generates an undefined instruction exception, and the content of the program counter is the next instruction of the extended instruction.
  • the instruction address determines the instruction address of the extended instruction according to the instruction address of the next instruction; the monitoring module notifies the execution module corresponding to the extended instruction to execute the extended instruction corresponding to the instruction address.
  • the present invention can accurately calculate the instruction address of the extended instruction according to the instruction address of the next instruction recorded in the obtained program counter, thereby ensuring that the monitoring module can timely notify the execution module that can be used to execute the extended instruction. Execute the extension instruction.
  • the monitoring module before the monitoring module notifies the execution module corresponding to the extended instruction to execute the extended instruction corresponding to the instruction address, the monitoring module includes: determining, by the monitoring module, the execution module corresponding to the extended instruction according to the extended instruction that has been stored to the local memory. It can be seen that the present invention can find a unique execution module for executing the extended instruction by first determining the manner of executing the module according to the extended instruction, which can ensure that the execution is performed after the monitoring module notifies the execution module to execute the extended instruction. The module can successfully execute the extended instruction.
  • the monitoring module saves the extended instruction in the instruction to the local memory
  • the monitoring module may: the monitoring module saves the extended instruction in the instruction to the local memory according to a preset format, where the preset format includes the extended instruction.
  • the monitoring module of the present invention can store the extended instructions recognized from the on-chip bus according to the preset format, and can facilitate the management of the stored extended instructions.
  • the stored content may be directly called from the local memory to quickly lock the execution module corresponding to the extended instruction, and the execution module triggers the execution module to execute the extended instruction.
  • the processor core After the processor core executes the extended instruction by the execution module corresponding to the expansion instruction triggered by the monitoring module, the processor core temporarily stays in the exception handling program. It can be seen that when the processor core temporarily stays in the exception handler, it can be ensured that the processor core is It is currently possible to not continue executing instructions after the extended instruction.
  • the processor core temporarily stays in the exception handler, which can be implemented as follows: the processor core accesses a preset reserved memory address, and the reserved memory address does not correspond to any physical memory unit; when the monitoring module After sending a retry response to the processor core, the processor core accesses the reserved memory address again. It can be seen that when the processor core accesses the preset reserved memory address, since the monitoring module returns a retry response to the processor core, the processor core repeatedly accesses the preset reserved memory address, thereby ensuring the processor.
  • the kernel can temporarily stay in the exception handler. This means that the processor core can temporarily not execute the next instruction after the extended instruction.
  • the monitoring module controls the processor core to exit the exception handler, which may be specifically implemented: after the monitoring module sends a normal completion response to the processor core, the processor core exits the exception handler. That is, the present invention can ensure that the processor core can exit the exception handler when necessary.
  • the monitoring module after the processor core decodes the extended instruction to generate an undefined instruction exception, the monitoring module generates a hardware signal for controlling whether the processor core stays in the exception handler; the monitoring module The processor core sends hardware signals.
  • the processor core temporarily stays in the exception handler, which can be implemented as: When the hardware signal is low, the processor core temporarily stays in the exception handler. It can be seen that after the monitoring module generates a low-level hardware signal according to the content in the program counter and sends it to the processor core, it can ensure that the processor core can temporarily stay in the exception handling program. This means that the processor core can temporarily not execute the next instruction after the extended instruction.
  • the monitoring module controls the processor core to exit the exception handler, which can be implemented as: when the hardware signal is high, the processor core exits the exception handler. This means that the monitoring module can effectively control the working state of the processor core by generating different levels of hardware signals.
  • the processor core continues to execute the instruction after the extended instruction, which may be implemented as: the processor core restores the data of the general-purpose register, the program counter, and the status register that were backed up before executing the exception handler, and then according to the program.
  • the next instruction address stored in the counter fetches the instruction from the next instruction address; the processor core executes the instruction corresponding to the next instruction address. It can be seen that after the execution module completes the execution process of the extended instruction, the processing core thereof can recover the breakpoint by the above method, and directly execute the instruction after the extended instruction.
  • the chip further includes a memory controller for reading instructions and data from the memory, and the monitoring module can be disposed in the memory controller or separated from the memory controller.
  • the invention provides an apparatus for extending a processor instruction set.
  • the device can implement the functions performed by the monitoring module, the execution module and the processing module, that is, the processor core, in the example of the above method, and the functions can be implemented by hardware or by executing corresponding software by hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the structure of the device includes a processor and a communication interface, and the processing
  • the device is configured to support the apparatus to perform the corresponding functions of the above methods.
  • the communication interface is used to support communication between the device and other devices.
  • the apparatus can also include a memory for coupling with the processor that retains the program instructions and data necessary for the apparatus.
  • the present invention provides a method and apparatus for extending a processor instruction set, as compared with a prior art using a coprocessor to execute a dedicated instruction such as a floating point instruction, or a hardware acceleration module as a peripheral of a processor.
  • the processor sends the data to the acceleration module for processing through an interface such as PCIe.
  • the present invention provides a chip internal structure, and ensures that the monitoring module can allocate the extended instruction to the extended instruction without modifying the processor core. Executing on the execution module, and in the process of executing the extended instruction by the execution module, causing the processor core to suspend execution of the instruction after the extended instruction to ensure the execution order of the instruction.
  • the processor core cannot execute the extended instruction, since the execution module can execute the extended instruction, even if the processor core does not have the ability to execute the extended instruction, the execution module can be used to successfully complete the execution of the extended instruction. And after executing the extended instruction, triggering the processor core to continue executing the instruction after the extended instruction. In this way, it is equivalent to expanding the instruction set of the processor core, thus improving the processor's business processing capability and improving the processing speed of the processor.
  • FIG. 1 is a schematic diagram of an internal structure of a chip according to an embodiment of the present invention.
  • FIG. 2 and FIG. 3 are schematic diagrams showing another internal structure of a chip according to an embodiment of the present invention.
  • FIG. 4 is a schematic structural diagram of an embedded system according to an embodiment of the present invention.
  • FIG. 5 is an interaction diagram of a method for extending a processor instruction set according to an embodiment of the present invention
  • FIG. 6 to FIG. 14 are schematic diagrams of another method for extending a processor instruction set according to an embodiment of the present invention.
  • FIG. 15 is a schematic structural diagram of an apparatus for extending an instruction set of a processor according to an embodiment of the present disclosure
  • FIG. 16 is a schematic structural diagram of another apparatus for extending a processor instruction set according to an embodiment of the present invention.
  • the embodiment of the present invention can be used in a chip.
  • the chip can be internally provided with a processor core, a monitoring module and at least one execution module for executing the extended instruction, such as the accelerator 1 and the accelerator 2 as shown in FIG. 1 .
  • the monitoring module and the at least one execution module for executing the extended instruction may be implemented by programmable logic (such as FPGA, CPLD, etc.).
  • the chip may further include a memory controller.
  • the memory controller is configured to read the instructions and the data from the memory, and the monitoring module may be disposed in the memory controller, or may be disposed in the chip together with the memory controller, which is not limited herein.
  • an on-chip bus is disposed inside the chip, and the on-chip bus
  • the function is similar to the AHB bus (English: Advanced High Performance Bus) in ARM (English: Advanced RISC Machine) architecture, which can be used to connect various internal chips.
  • the function module also includes a processor core, such as an ARM core, having functions of fetching instructions, instruction decoding, instruction execution, etc.; the chip further includes at least one accelerator for realizing acceleration functions of various services, such as adding, Decryption, compression, decompression and other functions; the chip also includes a memory controller, the memory controller, external SDRAM (English: Synchronous Dynamic Random Access Memory, Chinese: synchronous dynamic random access memory), used to read commands from the SDRAM, Data, or write data to SDRAM.
  • the memory shown in FIG. 1 can be regarded as the above SDRAM.
  • the SDRAM interface may specifically include DDR (English: Double Data Rate, Chinese: Double Rate) 2, DDR3 or DDR4.
  • an execution module such as an accelerator and a memory controller can be implemented by an FPGA (Field Programmable Gate Array, Chinese: Field Programmable Gate Array).
  • FPGA Field Programmable Gate Array
  • the modified logic configuration code can be loaded into the FPGA, so that the functions that the execution module and/or the memory controller can implement can be changed, which can be used to correct the design flaws, or New functions are added to the execution module and/or memory controller.
  • the monitoring module can be set in the memory controller at this time.
  • the two buses can be connected by providing a bridge module between the two buses.
  • the APB bus (English: Advanced Peripheral Bus, Chinese: Advanced Peripheral Bus).
  • the APB bus can be used to connect low-speed modules
  • the AHB bus can be used to connect high-speed modules such as accelerators. It should be noted that the high-speed module and the low-speed module are separately connected, which can effectively avoid bus blocking and improve the throughput of the bus.
  • the memory controller in addition to the function of reading and writing memory, the memory controller needs to have a monitoring function, that is, it can monitor the instructions transmitted on the on-chip bus and save the extended instructions to the local In the instruction memory, it is also necessary to control the corresponding accelerator startup and receive the information of the completion of the acceleration operation.
  • these functions can also be implemented by an independent monitoring module, that is, implemented by an FPGA, and can be dynamically modified to support new extended instructions, and the original memory controller can be used.
  • Hard logic implementations are implemented without the use of an FPGA, which can help improve the performance of the memory interface.
  • the memory controller shown in FIG. 3 can be divided into a memory controller and a monitoring module according to the functions.
  • the above-mentioned chips as shown in FIG. 1 to FIG. 3 can be applied to an embedded system.
  • a memory in addition to the chip, a BIOS (Chinese: Basic Input Output System, English: basic input/output system), a network interface chip, and a serial interface may be included.
  • Chips, etc. the chip can be regarded as a processor, which is respectively connected with a memory, a BIOS, a network interface chip and a serial interface chip.
  • An embodiment of the present invention provides a method for extending a processor instruction set. As shown in FIG. 5, the method flow is performed by each module in a chip. The method process includes:
  • the processor core loads instructions from memory through an on-chip bus.
  • the machine language programs are stored in the memory, and the processor core needs to continuously read the program codes from the memory, decode and execute them, thereby completing the functions corresponding to the instructions.
  • the processor core needs to continuously read the program codes from the memory, decode and execute them, thereby completing the functions corresponding to the instructions.
  • the processor core needs to continuously read the program codes from the memory, decode and execute them, thereby completing the functions corresponding to the instructions.
  • the processor core needs to continuously read the program codes from the memory, decode and execute them, thereby completing the functions corresponding to the instructions.
  • the processor core needs to continuously read the program codes from the memory, decode and execute them, thereby completing the functions corresponding to the instructions.
  • the processor core when the processor core reads the program code from the memory, the code passes through the on-chip bus.
  • the monitoring module and the accelerator implemented by the FPGA can recognize the instructions.
  • this signal can be connected to all devices on the on-chip bus, which means that other devices can use the level of the HPROT[3:0] signal to distinguish whether the instruction is currently being transmitted on the on-chip bus, and only When the instruction transmitted on the on-chip bus is an instruction, step 102 is performed.
  • the monitoring module identifies the instruction through the on-chip bus, and saves the extended instruction in the instruction to the local memory.
  • the extended instruction can be directly saved to the local memory.
  • the processor core decodes the extended instruction to generate an undefined instruction exception.
  • the extended instruction is an extended instruction that has been stored in the local memory.
  • the processor core When the processor core decodes the extended instruction, since the processor core cannot recognize the extended instruction, the processor core generates an "undefined instruction” exception, that is, an undefined instruction exception is generated, and jumps to " The handler corresponding to the "undefined instruction” is executed. It should be noted that after the undefined instruction exception is generated, it will not be executed immediately, but will not be processed until the currently executing instruction is executed. The specific processing manner will be mentioned later, and will not be described here.
  • the processor core executes an exception handler.
  • the exception handler is a program triggered by an undefined instruction exception.
  • the processor core saves the current breakpoint, which is the contents of the general purpose registers, program counters, and status registers of the backup processor core.
  • the value of the backup program counter is equal to the content of the current program counter minus 4, that is, the instruction address of the next instruction of the extended instruction, and the instruction address is used as the return address of the exception handler;
  • Change the content of the program counter that is, the start address of the handler that sets the undefined instruction exception, for example: set to 0x0000_0004; the processor core loads the code from the memory address specified by the program counter, and this code is the exception handler.
  • the processor core executes the exception handler
  • the processor core suspends the execution of the instruction after the extended instruction, and notifies the monitoring module to ensure that the monitoring module can trigger the execution module corresponding to the extended instruction to execute the extended instruction.
  • the completion information may be sent to the monitoring module through the on-chip bus, or the monitoring module may be notified by performing a handshake signal between the module and the monitoring module.
  • the monitoring module triggers an execution module corresponding to the extended instruction to execute the extended instruction.
  • the execution module executes the extended instruction.
  • the execution module sends a message to the monitoring module that the execution module executes the extended instruction.
  • the monitoring module controls the processor core to exit the exception handler.
  • the monitoring module can control the processor core to exit the exception processing program, and execute step 109.
  • the specific implementation manner will be described later, and will not be further described herein.
  • the processor core continues to execute instructions after the extended instruction.
  • a chip internal structure is proposed. Under the premise of not modifying the processor core, it is ensured that the monitoring module can allocate the extended instruction to the execution module corresponding to the extended instruction, and the execution module performs the expansion.
  • the processor core is suspended to execute the instruction after the extended instruction to ensure the order of execution of the instruction.
  • the execution module can execute the extended instruction, even if the processor core does not have the ability to execute the extended instruction, the execution module can be used to successfully complete the execution of the extended instruction. And after executing the extended instruction, triggering the processor core to continue executing the instruction after the extended instruction. In this way, it is equivalent to expanding the instruction set of the processor core, thus improving the processor's business processing capability and improving the processing speed of the processor.
  • each execution module can execute at least one extended instruction, each extended instruction corresponds to one instruction address, and the monitoring module can be based on the processor core.
  • the content of the program counter when the undefined instruction exception is generated is used to determine the instruction address of the next instruction of the extended instruction, and then the instruction address of the extended instruction is derived, and the execution module corresponding to the extended instruction is notified to execute the extended instruction. Therefore, on the basis of the implementation shown in FIG. 5, an implementation as shown in FIG. 6 can also be implemented.
  • the step 110 is performed.
  • the step 105 is performed by the execution module that is triggered by the monitoring module to trigger the extension instruction, and may be specifically implemented as step 1051 and step 1052:
  • the processor core sends the monitoring module a content in the program counter when the processor core generates an undefined instruction exception.
  • the content in the program counter is the instruction address of the next instruction of the extended instruction.
  • the processor core Before the processor core enters the exception handler, the processor core needs to back up the contents of the program counter when an undefined instruction exception is generated, and then perform exception handling in the processor core. In the program, the processor core needs to send the content that has been backed up to the monitoring module.
  • the monitoring module determines an instruction address of the extended instruction according to an instruction address of the next instruction.
  • the monitoring module notifies the execution module corresponding to the extended instruction to execute the extended instruction corresponding to the instruction address.
  • a jump instruction there may be a jump instruction or a conditional instruction.
  • the program does not execute in order when executing the instruction, but skips the following several instructions to continue execution; for a conditional instruction, the program needs to determine whether it satisfies before executing an instruction. If the condition is not met, the instruction will not be executed. Therefore, the previously saved extended instruction does not need to be executed. It is very possible to skip the extended instruction to execute the following instruction. For example, as shown in Table 1, a program segment includes five instructions in sequence, and the first three lines have entered the instruction pipeline. The first instruction is a jump instruction, which requires a jump to the third instruction.
  • the extended instruction 1 Although it has been previously saved to the local memory of the monitoring module and the corresponding execution module, it is not executed, and the actual instruction that needs to be executed is the extended instruction 2. Therefore, in order to prevent the execution module from executing the wrong extended instruction, the monitoring module needs to inform the execution module corresponding to the extended instruction of the information of the extended instruction that should be currently executed.
  • the processor core executes the extended instruction 2, that is, during the decoding of the extended instruction 2, an undefined instruction exception is generated, and the processor core enters the exception handler.
  • the processor core needs to obtain the saved breakpoint address, that is, the instruction address of the next instruction of the extended instruction, and send the instruction address of the next instruction to the monitoring module, or subtract the 4 and send it to the monitoring module.
  • the monitoring module can determine the address of the extended instruction 2 itself according to the instruction address sent by the processor core.
  • the embodiment of the present invention provides a method for extending a processor instruction set.
  • the processor core executes an exception handler
  • the processor core suspends execution of an instruction after the extended instruction, and sends the undefined instruction to the monitoring module when the processor core generates an undefined instruction.
  • the instruction address of the next instruction in the program counter at the time of the exception and then the monitoring module determines the instruction address of the extended instruction according to the instruction address of the next instruction, and notifies the execution module corresponding to the extended instruction to execute the extended instruction corresponding to the instruction address.
  • the invention can accurately calculate the instruction address of the extended instruction according to the instruction address of the next instruction recorded in the acquired program counter, thereby ensuring that the monitoring module can timely notify the execution module that can execute the extended instruction to execute the extended instruction. .
  • each execution module corresponds to at least one extended instruction that can be executed. Therefore, in order to ensure that the execution module triggered by the monitoring module can be used to execute the extended instruction, in the embodiment of the present invention
  • the monitoring module needs to determine which execution module is to be notified. Therefore, on the basis of the implementation shown in FIG. 6, an implementation as shown in FIG. 7 can also be implemented. If the execution module corresponding to the extended instruction is executed by the monitoring module in step 1052, the execution module corresponding to the instruction address is executed, and step 111 is performed:
  • the monitoring module determines an execution module corresponding to the extended instruction according to the extended instruction that has been stored to the local storage.
  • the local memory of the monitoring module stores a correspondence between the extended instruction and the execution module. Therefore, the monitoring module can determine the execution module corresponding to each extended instruction according to the content of the stored extended instruction.
  • the method for extending the processor instruction set provided by the embodiment of the present invention before the monitoring module notifies the execution module to execute the extended instruction, the monitoring module needs to determine the execution module corresponding to the extended instruction according to the extended instruction.
  • the present invention can find a unique execution module for executing the extended instruction by first determining the manner of executing the module according to the extended instruction, which can ensure that the execution module can be successfully executed after the monitoring module notifies the execution module to execute the extended instruction.
  • the extension instruction can find a unique execution module for executing the extended instruction by first determining the manner of executing the module according to the extended instruction, which can ensure that the execution module can be successfully executed after the monitoring module notifies the execution module to execute the extended instruction.
  • the monitoring module may store the extended instruction in the instruction to the local storage according to a preset format, so as to be subsequently stored according to the storage.
  • the content to accurately determine the execution module corresponding to the extended instruction Therefore, on the basis of the implementation shown in FIG. 5, it can also be implemented as the implementation shown in FIG.
  • the monitoring module identifies the instruction by the on-chip bus, and saves the extended instruction in the instruction to the local memory, which may be specifically implemented as step 1021:
  • the monitoring module identifies the instruction by using an on-chip bus, and saves the extended instruction in the instruction to the local memory according to a preset format.
  • the preset format includes an instruction address of the extended instruction, a content of the extended instruction, and an execution module corresponding to the extended instruction.
  • the format in which the extended instruction is saved to the local memory of the monitoring module is as shown in Table 2, in which the instruction address and content of each extended instruction are stored, and the execution module corresponding to each extended instruction is stored.
  • the monitoring module receives the instruction address sent by the processor core, it can find out which instruction is currently executing the extended instruction according to the instruction address and the contents shown in Table 2, and which execution is performed by the extended instruction. The module then sends the above content to the corresponding execution module.
  • the accelerator After the accelerator receives the information, the data is read from the specified source data address according to the content of the extended instruction, and the corresponding acceleration operation is performed, and then saved to the specified target address.
  • the method for extending the processor instruction set provided by the embodiment of the present invention, after the monitoring module recognizes the extended instruction by the on-chip bus, the monitoring module stores the identified extended instruction, and the storage format is as follows: the instruction address, the content, and the corresponding Execution module.
  • the monitoring module of the invention can pass By storing the extended instructions identified from the on-chip bus in a preset format, it is convenient to manage the stored extended instructions.
  • the stored content may be directly called from the local memory to quickly lock the execution module corresponding to the extended instruction, and the execution module triggers the execution module to execute the extended instruction.
  • the processor core may temporarily stay in the exception handling program, and thus, in an implementation manner as shown in FIG. 6, in order to ensure that the processor core can suspend execution of the instruction after the extended instruction.
  • the monitoring module After performing the step 1052, notifies the execution module corresponding to the extended instruction to execute the extended instruction corresponding to the instruction address, and may perform step 112:
  • the processor core temporarily stays in the exception handler.
  • the processor core Considering that after the processor core enters the exception handler, it cannot temporarily exit, otherwise the processor core will continue to execute the instruction after the extended instruction that generates the undefined instruction exception. If the subsequent instruction depends on the execution result of the execution module corresponding to the extended instruction that generated the undefined instruction exception, the data will be in error. Therefore, the processor core must wait until the execution module completes the execution of the extended instruction before exiting the exception handler. Therefore, after the processor core starts executing the exception handler, the exception handler can be exited only after the execution module completes the execution of the extension instruction. Otherwise, the processor core will stay in the exception handler. For example, the processor core cannot exit the exception handler until the accelerator completes the acceleration operation, that is, the processor core needs to temporarily stay in the exception handler.
  • a method for extending a processor instruction set can ensure that a processor core suspends execution of an instruction after an extended instruction by temporarily causing a processor core to stay in an exception handler.
  • the processor core temporarily stays in the exception handler, it is possible to ensure that the processor core is currently unable to continue executing the instruction after the extension instruction.
  • the processor core can temporarily stay in the exception handler by repeatedly accessing the preset reserved memory address. Therefore, on the basis of the implementation shown in FIG. 9, an implementation as shown in FIG. 10 can also be implemented.
  • the processor core of step 112 temporarily stays in the exception handling program, and can be specifically implemented as step 1121 to step 1123:
  • the processor core accesses a preset reserved memory address.
  • the reserved memory address does not correspond to any physical memory unit.
  • the monitoring module sends a retry response to the processor core.
  • the processor core accesses the reserved memory address again.
  • the monitoring module monitors that the processor core accesses the reserved memory address and can pass the HRESP of the AHB bus [1:0].
  • the signal and the HREADY signal respond to the processor core, ie retry the response.
  • a method for extending a processor instruction set can ensure that a processing core can temporarily stay in an exception handling program by causing a processor to repeatedly access a preset reserved memory address.
  • the processor core accesses the preset reserved memory address
  • the monitoring module returns a retry response to the processor core
  • the processor core repeatedly accesses the preset reserved memory address, thereby ensuring the processor.
  • the kernel can temporarily stay in the exception handler. This means that the processor core can temporarily not execute the next instruction after the extended instruction.
  • the processing is performed in order to ensure that the processor core can exit the exception processing program and then continue to execute the instruction after the extended instruction.
  • the kernel exits the exception handler. Therefore, on the basis of the implementation shown in FIG. 10, an implementation as shown in FIG. 11 can also be implemented.
  • the step 108 of the monitoring module controls the processor core to exit the exception handling process, which may be specifically implemented as step 1081 and step 1082:
  • the monitoring module sends a normal completion response to the processor core.
  • the normal completion response is a response of the processor core to complete the access process normally after accessing the reserved memory address.
  • the processor core exits the exception handler.
  • the method for extending the processor instruction set provided by the embodiment of the present invention may enable the processor core to exit the exception handling program after receiving the normal completion response sent by the monitoring module.
  • the present invention ensures that the processor core can exit the exception handler if necessary.
  • the monitoring module may also send a hardware signal to the processor core. And when the hardware signal is low, the processor core temporarily stays in the exception handler. Therefore, on the basis of the implementation shown in FIG. 9, an implementation as shown in FIG. 12 can also be realized.
  • the step 103 is performed, the step 114 and the step 115 are performed.
  • the processor core temporarily stays in the exception handling program, and may be specifically implemented as step 1124:
  • the monitoring module generates a hardware signal.
  • the hardware signal is used to control whether the processor core stays in the exception handler.
  • the monitoring module sends a hardware signal to the processor core.
  • a method for extending a processor instruction set after the processor core generates an undefined instruction exception, the processor core sends the content of the backup program counter to the monitoring module, and then the monitoring module receives the received The contents of the program counter generate hardware signals and send them to the processor core.
  • the hardware signal received by the processor core is low, the processor core is suspended, which is equivalent to the processor core temporarily staying in the exception handler.
  • the monitoring module After the monitoring module generates a low-level hardware signal according to the contents of the program counter and sends it to the processor core, it can ensure that the processor core can temporarily stay in the exception handler. This means that the processor core can temporarily not execute the next instruction after the extended instruction.
  • the processor core needs to continue to execute the instruction after the extended instruction.
  • the hardware signal sent to the processor core according to the monitoring module may be high.
  • the processor core continues to execute instructions following the extended instruction. Therefore, on the basis of the implementation shown in FIG. 12, an implementation as shown in FIG. 13 can also be realized.
  • the step 108 of the monitoring module controls the processor core to exit the exception handling process, which may be specifically implemented as step 1083:
  • the embodiment of the present invention provides a method for extending a processor instruction set.
  • a hardware signal sent by a monitoring module to a processor core is a high level, the processor core continues to execute an instruction after the extended instruction.
  • the monitoring module can effectively control the operating state of the processor core by generating hardware signals of different levels.
  • the processor core needs to continue executing the next instruction following the extended instruction after restoring the breakpoint. Therefore, on the basis of the implementation shown in FIG. 11 or 13, the content shown in FIG. 11 can be taken as an implementation as shown in FIG.
  • the step 109 after the processor core continues to execute the instruction after the extended instruction may be specifically implemented as step 1091 to step 1093:
  • the processor core restores data of a general-purpose register, a program counter, and a status register that were backed up before executing the exception handler.
  • the processor core After the processor core exits the exception handler, it needs to restore the breakpoint before continuing to execute the next instruction after the extended instruction. Therefore, the processor core needs to use the general-purpose register, program counter, and status register of the previously backed-up processor core. Restore to the corresponding register.
  • the processor core fetches an instruction from the next instruction address according to the next instruction address stored in the program counter.
  • the processor core fetches instructions from the corresponding instruction address according to the new value of the program counter. At this time, the processor core has exited the exception handler.
  • the processor core executes an instruction corresponding to the next instruction address.
  • the processor core continues to execute instructions following the extended instruction that produced the undefined instruction exception.
  • the embodiment of the present invention provides a method for extending a processor instruction set.
  • the processor core recovers data of a general-purpose register, a program counter, and a status register stored before executing an exception processing program, and according to a next instruction address stored in the program counter. , fetch the instruction from the next instruction address, and then execute the instruction corresponding to the next instruction address.
  • the coprocessor is used to execute a dedicated instruction such as a floating point instruction, or the hardware acceleration module is used as a peripheral of the processor, and the processor sends the data to the acceleration module through the PCIe interface, etc.
  • the invention can achieve the effect of extending the instruction set of the processor core without modifying the processor core, thereby improving the processing speed of the processor.
  • each module such as a monitoring module, an execution module, a processing module, etc.
  • each module includes hardware structures and/or software modules corresponding to the execution of the respective functions.
  • the present invention can be implemented in a combination of hardware or hardware and computer software in combination with the elements and algorithm steps of the various examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.
  • the embodiment of the present invention may perform the division of the function module on the device that expands the processor instruction set according to the foregoing method example.
  • each function module may be divided according to each function, or two or more functions may be integrated into one processing module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules. It should be noted that the division of the module in the embodiment of the present invention is schematic, and is only a logical function division, and the actual implementation may have another division manner.
  • FIG. 15 is a schematic diagram showing a possible structure of an apparatus for expanding an instruction set of the processor in the foregoing embodiment.
  • the device 20 includes: a processing module 21 and a monitoring module. 22.
  • the processing module 21 is configured to execute the instruction, or pause the execution of the instruction after the extended instruction after the departure field processing program, and resume the breakpoint at an appropriate timing, and continue to execute the instruction after the extended instruction, for example, in FIG. 5 Processes 101, 103 and 104, 109, process 112 in FIG. 9, processes 1121 and 1123 in FIG. 10, process 1082 in FIG. 11, process 1124 in FIG. 12, process 1083 in FIG. 13, and FIG.
  • the process module 1091 to 1093; the monitoring module 22 is configured to identify an extended instruction that passes through the on-chip bus, and save it in the local memory, and then trigger the corresponding execution module to execute the extended instruction, and the subsequent control processing module exits the exception handling program, for example: Processes 102, 105, 108 in FIG. 5, processes 1051 and 1052 in FIG. 6, process 111 in FIG. 7, process 1122 in FIG. 10, process 1081 in FIG. 11, processes 114 and 115 in FIG.
  • the execution module 23 is configured to execute the extended instruction and feed back to the monitoring module after completing the execution process, such as the processes 106 and 107 in FIGS. 5 to 14.
  • the apparatus 20 can also include a storage module 24 for storing associated program code and data. All the related content of the steps involved in the foregoing method embodiments may be referred to the functional descriptions of the corresponding functional modules, and details are not described herein again.
  • the processing module 21 may be a processor or a controller, for example, a central processing unit (English: Central Processing Unit, CPU for short), a general-purpose processor, and a digital signal processor (English: Digital Signal Processor, referred to as DSP). , Application-Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) or other programmable logic devices, transistor logic devices, hardware components or random combination. It is possible to implement or carry out the various illustrative logical blocks, modules and circuits described in connection with the present disclosure.
  • the processor may also be a combination of computing functions, for example comprising one or more microprocessor combinations. A combination of DSP and microprocessor, and so on.
  • the storage module 24 can be a memory.
  • the apparatus for expanding the processor instruction set according to the embodiment of the present invention may be as shown in FIG.
  • the device 30 is shown.
  • the apparatus 30 includes a processor 31, a communication interface 32, a memory 33, and a bus 34.
  • the communication interface 32, the processor 31, and the memory 33 are connected to each other through a bus 34.
  • the bus 34 may be a Peripheral Component Interconnect (PCI) bus or an extended industry standard structure (English: Extended Industry) Standard Architecture, referred to as EISA) bus.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 16, but it does not mean that there is only one bus or one type of bus.
  • the steps of a method or algorithm described in connection with the present disclosure may be implemented in a hardware, or may be implemented by a processor executing software instructions.
  • the software instructions may be composed of corresponding software modules, and the software modules may be stored in a random access memory (English: Random Access Memory, RAM for short), flash memory, read only memory (English: Read Only Memory, referred to as: ROM), Erase programmable read-only memory (English: Erasable Programmable ROM, referred to as: EPROM), electrically erasable programmable read-only memory (English: Electrically EPROM, referred to as: EEPROM), registers, hard disk, mobile hard disk, read-only optical disk (referred to as : CD-ROM) or any other form of storage medium known in the art.
  • ROM Random Access Memory
  • EPROM Erasable Programmable ROM
  • EEPROM electrically erasable programmable read-only memory
  • registers hard disk, mobile hard disk, read-only optical disk (referred to as : CD-ROM)
  • An exemplary storage medium is coupled to the processor to enable the processor to read information from, and write information to, the storage medium.
  • the storage medium can also be an integral part of the processor.
  • the processor and the storage medium can be located in an ASIC. Additionally, the ASIC can be located in a core network interface device.
  • the processor and the storage medium may also exist as discrete components in the core network interface device.
  • the functions described herein can be implemented in hardware, software, firmware, or any combination thereof.
  • the functions may be stored in a computer readable medium or transmitted as one or more instructions or code on a computer readable medium.
  • Computer readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one location to another.
  • a storage medium may be any available media that can be accessed by a general purpose or special purpose computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

A method and device for extending a processor instruction set, relating to the technical field of communications. The method and device can improve the processing speed of a processor without modifying a processor core. The method comprises: a monitoring module identifies instructions by means of an on-chip bus and stores an extended instruction among the instructions to a local memory (102); after a processor core loads the extended instruction from the memory by means of the on-chip bus, the processor core decodes the extended instruction to generate an undefined instruction abnormality (103); after the processor core finishes executing the current instruction, the processor core executes an abnormality processing procedure (104), and suspends execution of an instruction following the extended instruction; the monitoring module triggers an execution module corresponding to the extended instruction to execute the extended instruction (105); the monitoring module controls the processor core to exit from the abnormality processing procedure (108), so that the processor core continues executing the instruction following the extended instruction (109). The method and the device are applicable to the execution process of an extended instruction.

Description

一种扩展处理器指令集的方法及装置Method and device for expanding processor instruction set
本申请要求于2016年8月30日提交中国专利局、申请号为201610777425.2、发明名称为“一种扩展处理器指令集的方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 201610777425.2, the entire disclosure of which is incorporated herein by reference. In this application.
技术领域Technical field
本发明涉及通信技术领域,尤其涉及一种扩展处理器指令集的方法及装置。The present invention relates to the field of communications technologies, and in particular, to a method and apparatus for extending a processor instruction set.
背景技术Background technique
随着通信技术的发展,为了提升通信系统的处理能力,往往需要使用硬件加速模块来对复杂业务进行处理,例如:浮点运算、加解密、压缩解压缩等操作。目前,可以采用如下方法来实现对复杂业务的处理:With the development of communication technology, in order to improve the processing capability of the communication system, it is often necessary to use a hardware acceleration module to process complex services, such as floating point operations, encryption and decryption, compression and decompression operations. At present, the following methods can be used to implement the processing of complex services:
采用协处理器执行诸如浮点指令的专用指令,也就是说,处理器需要将专用指令发送给协处理器来执行。这种处理方式的优点为编程简单,可以直接使用协处理器指令;缺点为处理器必须支持这种专用指令,但是,并非所有的处理器内核都支持这个功能,比如:厂家使用第三方的处理器内核来设计自己的处理器,就无法支持该协处理器的运行。A coprocessor is used to execute a dedicated instruction such as a floating point instruction, that is, the processor needs to send a dedicated instruction to the coprocessor for execution. The advantage of this type of processing is that it is simple to program and can directly use coprocessor instructions. The disadvantage is that the processor must support this special instruction. However, not all processor cores support this function, for example, the manufacturer uses third-party processing. The kernel can't support the coprocessor's operation by designing its own processor.
或者,将硬件加速模块作为处理器的外设,处理器通过PCIe(英文:Peripheral Component Interconnect express,中文:PCI快速通道)等接口将数据发送给加速模块进行处理,加速模块会将处理结果通过处理器存储至内存,当处理器需要访问处理结果时,则从内存读取处理结果。这种处理方式在实现过程中比较灵活,因为加速模块与处理器是解耦的,但是处理器与加速模块之间存在频繁的数据交互,这样就降低了业务的处理性能。Alternatively, the hardware acceleration module is used as a peripheral of the processor, and the processor sends the data to the acceleration module through PCIe (English: Peripheral Component Interconnect express, Chinese: PCI fast channel) interface, and the acceleration module processes the processing result. The device is stored in memory, and when the processor needs to access the processing result, the processing result is read from the memory. This processing method is more flexible in the implementation process because the acceleration module is decoupled from the processor, but there is frequent data interaction between the processor and the acceleration module, which reduces the processing performance of the service.
因此,目前仍需要一种能够结合上述两种技术的优点的实现方法,以提高业务的处理能力并简化程序设计。Therefore, there is still a need for an implementation method that combines the advantages of the above two technologies to improve the processing power of the service and simplify the programming.
发明内容Summary of the invention
本发明提供一种扩展处理器指令集的方法及装置,能够在不修改处理器内核的前提条件下,提高处理器的处理速度。The present invention provides a method and apparatus for extending a processor instruction set, which can improve the processing speed of the processor without modifying the processor core.
为达到上述目的,本发明采用如下技术方案:In order to achieve the above object, the present invention adopts the following technical solutions:
一方面,本发明提供的扩展处理器指令集的方法。该方法用于一种芯片,该芯片包括处理器内核、监控模块和至少一个用于执行扩展指令的执行模块,监控模块和至少一个用于执行扩展指令的执行模块通过可编程逻辑来实现。扩展处理器指令集的方法包括:监控模块通过片上总线识别指令,该指令为处理器内核通过片上总线从内存加载的指令;监控模块将指令中的扩展指令保存至本地存储器;在处理器内核通过片上总线从内存中加载到扩展指令之 后,处理器内核译码扩展指令,产生未定义指令异常,该扩展指令为已存储至本地存储器中的扩展指令;在处理器内核执行完当前指令之后,处理器内核执行异常处理程序,该异常处理程序为未定义指令异常触发的程序;在处理器内核执行异常处理程序时,处理器内核暂停执行扩展指令之后的指令,且通过监控模块触发扩展指令对应的执行模块执行扩展指令;监控模块控制处理器内核退出异常处理程序,以便于处理器内核继续执行扩展指令之后的指令。由此可见,虽然处理器内核无法执行该扩展指令,但由于每条扩展指令都存在与其对应的执行模块来执行该扩展指令,因此,即便处理器内核不具备能够执行该扩展指令的能力,但仍可以借助执行模块来顺利完成扩展指令的执行过程,且在执行完该扩展指令之后,再触发处理器内核继续执行该扩展指令之后的指令。这样一来,就相当于扩展了处理器内核的指令集,因此,提升了处理器的业务处理能力,从而提高了处理器的处理速度。In one aspect, the invention provides a method of extending a processor instruction set. The method is for a chip comprising a processor core, a monitoring module and at least one execution module for executing an extended instruction, the monitoring module and at least one execution module for executing the extended instruction being implemented by programmable logic. The method for extending the processor instruction set includes: the monitoring module identifies an instruction by an on-chip bus, the instruction is an instruction that the processor core loads from the memory through the on-chip bus; the monitoring module saves the extended instruction in the instruction to the local memory; The on-chip bus is loaded from memory into the extended instruction Thereafter, the processor core decodes the extended instruction to generate an undefined instruction exception, which is an extended instruction that has been stored in the local memory; after the processor core executes the current instruction, the processor core executes an exception handler, the exception The processing program is a program triggered by an undefined instruction exception; when the processor core executes the exception processing program, the processor core suspends executing the instruction after the extended instruction, and the execution module corresponding to the extended instruction is triggered by the monitoring module to execute the extended instruction; the monitoring module controls The processor core exits the exception handler so that the processor core continues to execute instructions following the extended instruction. It can be seen that although the processor core cannot execute the extended instruction, since each extended instruction has its corresponding execution module to execute the extended instruction, even if the processor core does not have the ability to execute the extended instruction, The execution module can still be successfully executed by the execution module, and after executing the extended instruction, the processor core is triggered to continue executing the instruction after the extended instruction. In this way, it is equivalent to expanding the instruction set of the processor core, thus improving the processing power of the processor, thereby improving the processing speed of the processor.
在一种可能的设计中,考虑到每个执行模块能够执行至少一条扩展指令,每条扩展指令对应一个指令地址,因此,为了确保准确确定扩展指令,处理器内核通过监控模块触发扩展指令对应的执行模块执行扩展指令的过程,可以具体实现为:处理器内核向监控模块发送当处理器内核产生未定义指令异常时程序计数器中的内容,该程序计数器中的内容为扩展指令的下一条指令的指令地址;监控模块根据下一条指令的指令地址,确定扩展指令的指令地址;监控模块通知扩展指令对应的执行模块执行与指令地址对应的扩展指令。由此可见,本发明可以根据所获取的程序计数器中所记载的下一条指令的指令地址,准确推算出扩展指令的指令地址,从而确保监控模块可以及时通知能够用于执行该扩展指令的执行模块执行该扩展指令。In a possible design, it is considered that each execution module can execute at least one extended instruction, and each extended instruction corresponds to one instruction address. Therefore, in order to ensure accurate determination of the extended instruction, the processor core triggers the extension instruction corresponding by the monitoring module. The process of executing the extended instruction by the execution module may be specifically implemented as: the processor core sends the content of the program counter when the processor core generates an undefined instruction exception, and the content of the program counter is the next instruction of the extended instruction. The instruction address determines the instruction address of the extended instruction according to the instruction address of the next instruction; the monitoring module notifies the execution module corresponding to the extended instruction to execute the extended instruction corresponding to the instruction address. It can be seen that the present invention can accurately calculate the instruction address of the extended instruction according to the instruction address of the next instruction recorded in the obtained program counter, thereby ensuring that the monitoring module can timely notify the execution module that can be used to execute the extended instruction. Execute the extension instruction.
在一种可能的设计中,在监控模块通知扩展指令对应的执行模块执行与指令地址对应的扩展指令之前,包括:监控模块根据已存储至本地存储器的扩展指令,确定扩展指令对应的执行模块。由此可见,本发明可以通过先根据扩展指令来确定执行模块的方式,可以找到唯一的用于执行该扩展指令的执行模块,这样能够确保在监控模块通知该执行模块执行扩展指令之后,该执行模块可以成功执行该扩展指令。In a possible design, before the monitoring module notifies the execution module corresponding to the extended instruction to execute the extended instruction corresponding to the instruction address, the monitoring module includes: determining, by the monitoring module, the execution module corresponding to the extended instruction according to the extended instruction that has been stored to the local memory. It can be seen that the present invention can find a unique execution module for executing the extended instruction by first determining the manner of executing the module according to the extended instruction, which can ensure that the execution is performed after the monitoring module notifies the execution module to execute the extended instruction. The module can successfully execute the extended instruction.
在一种可能的设计中,监控模块将指令中的扩展指令保存至本地存储器,具体可以实现为:监控模块将指令中的扩展指令按照预设格式保存至本地存储器,该预设格式包括扩展指令的指令地址、扩展指令的内容和扩展指令对应的执行模块。由此可见,本发明中监控模块可以通过按照预设格式对从片上总线识别到的扩展指令进行存储,可以便于对已存储的扩展指令进行管理。此外,当需要为扩展指令确定对应的执行模块时,也可以直接从本地存储器调用已存储的内容,来快速锁定扩展指令所对应的执行模块,并由监控模块触发该执行模块执行该扩展指令。In a possible design, the monitoring module saves the extended instruction in the instruction to the local memory, and the monitoring module may: the monitoring module saves the extended instruction in the instruction to the local memory according to a preset format, where the preset format includes the extended instruction. The instruction address, the content of the extended instruction, and the execution module corresponding to the extended instruction. It can be seen that the monitoring module of the present invention can store the extended instructions recognized from the on-chip bus according to the preset format, and can facilitate the management of the stored extended instructions. In addition, when it is necessary to determine a corresponding execution module for the extended instruction, the stored content may be directly called from the local memory to quickly lock the execution module corresponding to the extended instruction, and the execution module triggers the execution module to execute the extended instruction.
在一种可能的设计中,在处理器内核通过监控模块触发扩展指令对应的执行模块执行扩展指令之后,处理器内核暂时停留在异常处理程序中。由此可见,当处理器内核暂时停留在异常处理程序中时,可以确保处理器内核在 当前能够不继续执行扩展指令之后的指令。In a possible design, after the processor core executes the extended instruction by the execution module corresponding to the expansion instruction triggered by the monitoring module, the processor core temporarily stays in the exception handling program. It can be seen that when the processor core temporarily stays in the exception handler, it can be ensured that the processor core is It is currently possible to not continue executing instructions after the extended instruction.
在一种可能的设计中,处理器内核暂时停留在异常处理程序中,具体可以实现为:处理器内核访问预设的保留内存地址,该保留内存地址不对应任何一个物理内存单元;当监控模块向处理器内核发送重试响应之后,处理器内核再次访问保留内存地址。由此可见,在处理器内核访问预设的保留内存地址时,由于监控模块向处理器内核返回了重试响应,因此,处理器内核会反复访问这个预设的保留内存地址,从而确保处理器内核能够暂时停留在异常处理程序中。也就意味着,处理器内核能够暂时不执行扩展指令之后的下一条指令。In a possible design, the processor core temporarily stays in the exception handler, which can be implemented as follows: the processor core accesses a preset reserved memory address, and the reserved memory address does not correspond to any physical memory unit; when the monitoring module After sending a retry response to the processor core, the processor core accesses the reserved memory address again. It can be seen that when the processor core accesses the preset reserved memory address, since the monitoring module returns a retry response to the processor core, the processor core repeatedly accesses the preset reserved memory address, thereby ensuring the processor. The kernel can temporarily stay in the exception handler. This means that the processor core can temporarily not execute the next instruction after the extended instruction.
在一种可能的设计中,监控模块控制处理器内核退出异常处理程序,可以具体实现为:当监控模块向处理器内核发送正常完成响应之后,处理器内核退出异常处理程序。即本发明能够确保处理器内核能在必要时退出异常处理程序。In a possible design, the monitoring module controls the processor core to exit the exception handler, which may be specifically implemented: after the monitoring module sends a normal completion response to the processor core, the processor core exits the exception handler. That is, the present invention can ensure that the processor core can exit the exception handler when necessary.
在一种可能的设计中,在处理器内核译码扩展指令,产生未定义指令异常之后,监控模块生成硬件信号,该硬件信号用于控制处理器内核是否停留在异常处理程序中;监控模块向处理器内核发送硬件信号。处理器内核暂时停留在异常处理程序中,具体可以实现为:当硬件信号为低电平时,处理器内核暂时停留在异常处理程序中。由此可见,在监控模块根据程序计数器中的内容,生成低电平的硬件信号,且向处理器内核发送之后,可以确保处理器内核能够暂时停留在异常处理程序中。也就意味着,处理器内核能够暂时不执行扩展指令之后的下一条指令。In a possible design, after the processor core decodes the extended instruction to generate an undefined instruction exception, the monitoring module generates a hardware signal for controlling whether the processor core stays in the exception handler; the monitoring module The processor core sends hardware signals. The processor core temporarily stays in the exception handler, which can be implemented as: When the hardware signal is low, the processor core temporarily stays in the exception handler. It can be seen that after the monitoring module generates a low-level hardware signal according to the content in the program counter and sends it to the processor core, it can ensure that the processor core can temporarily stay in the exception handling program. This means that the processor core can temporarily not execute the next instruction after the extended instruction.
在一种可能的设计中,监控模块控制处理器内核退出异常处理程序,具体可以实现为:当硬件信号为高电平时,处理器内核退出异常处理程序。也就意味着,监控模块可以通过生成不同电平的硬件信号,来有效控制处理器内核的工作状态。In a possible design, the monitoring module controls the processor core to exit the exception handler, which can be implemented as: when the hardware signal is high, the processor core exits the exception handler. This means that the monitoring module can effectively control the working state of the processor core by generating different levels of hardware signals.
在一种可能的设计中,处理器内核继续执行扩展指令之后的指令,具体可以实现为:处理器内核将执行异常处理程序之前备份的通用寄存器、程序计数器和状态寄存器的数据恢复,之后根据程序计数器中存储的下一条指令地址,从下一条指令地址取指令;处理器内核执行下一条指令地址对应的指令。由此可见,在执行模块完成扩展指令的执行过程之后,处理其内核可以采用上述方法恢复断点,直接执行扩展指令之后的指令。In a possible design, the processor core continues to execute the instruction after the extended instruction, which may be implemented as: the processor core restores the data of the general-purpose register, the program counter, and the status register that were backed up before executing the exception handler, and then according to the program. The next instruction address stored in the counter fetches the instruction from the next instruction address; the processor core executes the instruction corresponding to the next instruction address. It can be seen that after the execution module completes the execution process of the extended instruction, the processing core thereof can recover the breakpoint by the above method, and directly execute the instruction after the extended instruction.
在一种可能的设计中,芯片还包括内存控制器,内存控制器用于从内存中读取指令和数据,监控模块能够设置在内存控制器中,或是与内存控制器分设于芯片中。In one possible design, the chip further includes a memory controller for reading instructions and data from the memory, and the monitoring module can be disposed in the memory controller or separated from the memory controller.
另一方面,本发明提供一种扩展处理器指令集的装置。该装置可以实现上述方法示例中监控模块、执行模块与处理模块,即处理器内核所执行的功能,所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个上述功能相应的模块。In another aspect, the invention provides an apparatus for extending a processor instruction set. The device can implement the functions performed by the monitoring module, the execution module and the processing module, that is, the processor core, in the example of the above method, and the functions can be implemented by hardware or by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above functions.
在一种可能的设计中,该装置的结构中包括处理器和通信接口,该处理 器被配置为支持该装置执行上述方法中相应的功能。该通信接口用于支持该装置与其他设备之间的通信。该装置还可以包括存储器,该存储器用于与处理器耦合,其保存该装置必要的程序指令和数据。In a possible design, the structure of the device includes a processor and a communication interface, and the processing The device is configured to support the apparatus to perform the corresponding functions of the above methods. The communication interface is used to support communication between the device and other devices. The apparatus can also include a memory for coupling with the processor that retains the program instructions and data necessary for the apparatus.
本发明提供的一种扩展处理器指令集的方法及装置,相比较于现有技术中采用协处理器执行诸如浮点指令的专用指令,或是将硬件加速模块作为处理器的外设,由处理器通过PCIe等接口将数据发送给加速模块进行处理,本发明通过提出一种芯片内部结构,在不修改处理器内核的前提下,确保监控模块能够将扩展指令分配到与该扩展指令对应的执行模块上执行,且在执行模块执行该扩展指令的过程中,使处理器内核暂停执行扩展指令之后的指令,以保证指令的执行顺序。虽然处理器内核无法执行该扩展指令,但由于执行模块能够执行该扩展指令,因此,即便处理器内核不具备能够执行该扩展指令的能力,但仍可以借助执行模块来顺利完成扩展指令的执行过程,且在执行完该扩展指令之后,再触发处理器内核继续执行该扩展指令之后的指令。这样一来,就相当于扩展了处理器内核的指令集,因此,提升了处理器的业务处理能力,提高了处理器的处理速度。The present invention provides a method and apparatus for extending a processor instruction set, as compared with a prior art using a coprocessor to execute a dedicated instruction such as a floating point instruction, or a hardware acceleration module as a peripheral of a processor. The processor sends the data to the acceleration module for processing through an interface such as PCIe. The present invention provides a chip internal structure, and ensures that the monitoring module can allocate the extended instruction to the extended instruction without modifying the processor core. Executing on the execution module, and in the process of executing the extended instruction by the execution module, causing the processor core to suspend execution of the instruction after the extended instruction to ensure the execution order of the instruction. Although the processor core cannot execute the extended instruction, since the execution module can execute the extended instruction, even if the processor core does not have the ability to execute the extended instruction, the execution module can be used to successfully complete the execution of the extended instruction. And after executing the extended instruction, triggering the processor core to continue executing the instruction after the extended instruction. In this way, it is equivalent to expanding the instruction set of the processor core, thus improving the processor's business processing capability and improving the processing speed of the processor.
附图说明DRAWINGS
图1为本发明实施例提供的一种芯片内部结构的示意图;1 is a schematic diagram of an internal structure of a chip according to an embodiment of the present invention;
图2和图3为本发明实施例提供的另一种芯片内部结构的示意图;FIG. 2 and FIG. 3 are schematic diagrams showing another internal structure of a chip according to an embodiment of the present invention; FIG.
图4为本发明实施例提供的一种嵌入式系统架构图;4 is a schematic structural diagram of an embedded system according to an embodiment of the present invention;
图5为本发明实施例提供的一种扩展处理器指令集的方法交互图;FIG. 5 is an interaction diagram of a method for extending a processor instruction set according to an embodiment of the present invention;
图6至图14为本发明实施例提供的另一种扩展处理器指令集的方法交互图;FIG. 6 to FIG. 14 are schematic diagrams of another method for extending a processor instruction set according to an embodiment of the present invention; FIG.
图15为本发明实施例提供的一种扩展处理器指令集的装置结构示意图;FIG. 15 is a schematic structural diagram of an apparatus for extending an instruction set of a processor according to an embodiment of the present disclosure;
图16为本发明实施例提供的另一种扩展处理器指令集的装置结构示意图。FIG. 16 is a schematic structural diagram of another apparatus for extending a processor instruction set according to an embodiment of the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
本发明实施例可以用于一种芯片,芯片内部可以设置有处理器内核、监控模块和至少一个用于执行扩展指令的执行模块,比如:如图1所示的加速器1和加速器2。其中,监控模块和至少一个用于执行扩展指令的执行模块可以通过可编程逻辑(如FPGA、CPLD等)来实现。需要说明的是,芯片还可以包括内存控制器。其中,内存控制器用于从内存中读取指令和数据,而上述监控模块可以设置在内存控制器中,也可以与内存控制器分设于芯片中,在此不做限定。The embodiment of the present invention can be used in a chip. The chip can be internally provided with a processor core, a monitoring module and at least one execution module for executing the extended instruction, such as the accelerator 1 and the accelerator 2 as shown in FIG. 1 . The monitoring module and the at least one execution module for executing the extended instruction may be implemented by programmable logic (such as FPGA, CPLD, etc.). It should be noted that the chip may further include a memory controller. The memory controller is configured to read the instructions and the data from the memory, and the monitoring module may be disposed in the memory controller, or may be disposed in the chip together with the memory controller, which is not limited herein.
以图1所示的架构为例,芯片内部设置有一条片上总线,该片上总线的 功能与ARM(英文:Advanced RISC Machine,中文:一种RISC处理器架构)架构中的AHB总线(英文:Advanced High performance Bus,中文:先进的高性能总线)类似,可以用来连接芯片内部的各个功能模块;芯片内部还包括一个处理器内核,例如ARM内核,具有取指令、指令译码、指令执行等功能;芯片内部还包括至少一个加速器,用于实现各种业务的加速功能,例如加、解密、压缩、解压缩等功能;芯片内部还包括一个内存控制器,该内存控制器,外接SDRAM(英文:Synchronous Dynamic Random Access Memory,中文:同步动态随机存储器),用于从SDRAM读取指令、数据,或是将数据写入SDRAM。在本发明实施例中,图1所示的内存可以被视为上述SDRAM。其中,SDRAM接口具体可以包括DDR(英文:Double Data Rate,中文:双倍速率)2、DDR3或DDR4。需要说明的是,诸如加速器的执行模块和内存控制器都可以用FPGA(英文:Field Programmable Gate Array,中文:现场可编程门阵列)来实现。在实际应用过程中,可以将修改后的逻辑配置代码加载到FPGA中,这样就可以改变执行模块和/或内存控制器所能实现的功能,既可以用来修正设计上的缺陷,也可以为执行模块和/或内存控制器增加新功能。需要说明的是,监控模块在此时可以设置于内存控制器中。Taking the architecture shown in FIG. 1 as an example, an on-chip bus is disposed inside the chip, and the on-chip bus The function is similar to the AHB bus (English: Advanced High Performance Bus) in ARM (English: Advanced RISC Machine) architecture, which can be used to connect various internal chips. The function module; the chip also includes a processor core, such as an ARM core, having functions of fetching instructions, instruction decoding, instruction execution, etc.; the chip further includes at least one accelerator for realizing acceleration functions of various services, such as adding, Decryption, compression, decompression and other functions; the chip also includes a memory controller, the memory controller, external SDRAM (English: Synchronous Dynamic Random Access Memory, Chinese: synchronous dynamic random access memory), used to read commands from the SDRAM, Data, or write data to SDRAM. In the embodiment of the present invention, the memory shown in FIG. 1 can be regarded as the above SDRAM. The SDRAM interface may specifically include DDR (English: Double Data Rate, Chinese: Double Rate) 2, DDR3 or DDR4. It should be noted that an execution module such as an accelerator and a memory controller can be implemented by an FPGA (Field Programmable Gate Array, Chinese: Field Programmable Gate Array). In the actual application process, the modified logic configuration code can be loaded into the FPGA, so that the functions that the execution module and/or the memory controller can implement can be changed, which can be used to correct the design flaws, or New functions are added to the execution module and/or memory controller. It should be noted that the monitoring module can be set in the memory controller at this time.
考虑到目前存在各种片上总线标准,除了上述AHB总线之外,还有CoreConnect、Wishbone或其他自定义的总线,这些总线均可以将芯片内部的各个功能模块连接起来,以实现各个模块之间的数据传输。在实际应用中,还可以在芯片内部设置多条总线,形成层次结构。在本发明实施例中,如图2所示,可以通过在两条总线之间设置一个桥接模块来将这两条总线联系起来。比如:在ARM架构中,除了包括AHB总线之外,还可以包括APB总线(英文:Advanced Peripheral Bus,中文:先进外设总线)。其中,APB总线可以用来连接低速模块,而AHB总线可以用来连接加速器等高速模块。需要说明的是,将高速模块和低速模块分开进行连接,可以有效避免总线阻塞,提高总线的吞吐量。Considering the existence of various on-chip bus standards, in addition to the above AHB bus, there are also CoreConnect, Wishbone or other custom buses, which can connect the various functional modules inside the chip to realize the between the modules. data transmission. In practical applications, multiple buses can be placed inside the chip to form a hierarchical structure. In the embodiment of the present invention, as shown in FIG. 2, the two buses can be connected by providing a bridge module between the two buses. For example, in the ARM architecture, in addition to the AHB bus, it can also include an APB bus (English: Advanced Peripheral Bus, Chinese: Advanced Peripheral Bus). Among them, the APB bus can be used to connect low-speed modules, and the AHB bus can be used to connect high-speed modules such as accelerators. It should be noted that the high-speed module and the low-speed module are separately connected, which can effectively avoid bus blocking and improve the throughput of the bus.
在如图1、图2所示的架构中,内存控制器除了具有读写内存的功能之外,还需要具有监控功能,即能够监测片上总线上所传递的指令,并将扩展指令保存到本地的指令存储器中,之后还需要控制对应的加速器启动,并接收加速操作完成的信息。在本发明实施例中,这些功能也可以通过一个独立的监控模块来实现,也就是由FPGA来实现,并且,可以通过动态修改来支持新的扩展指令,而原有的内存控制器,可以使用硬逻辑实现而不用FPGA来实现,这样能够有助于提高内存接口的性能。以如图1所示的架构为例,如图3所示内存控制器可以根据所具有的功能分设为内存控制器与监控模块。In the architecture shown in Figure 1 and Figure 2, in addition to the function of reading and writing memory, the memory controller needs to have a monitoring function, that is, it can monitor the instructions transmitted on the on-chip bus and save the extended instructions to the local In the instruction memory, it is also necessary to control the corresponding accelerator startup and receive the information of the completion of the acceleration operation. In the embodiment of the present invention, these functions can also be implemented by an independent monitoring module, that is, implemented by an FPGA, and can be dynamically modified to support new extended instructions, and the original memory controller can be used. Hard logic implementations are implemented without the use of an FPGA, which can help improve the performance of the memory interface. Taking the architecture shown in FIG. 1 as an example, the memory controller shown in FIG. 3 can be divided into a memory controller and a monitoring module according to the functions.
需要说明的是,上述如图1至图3所示的芯片,均可以应用于一个嵌入式系统。例如:如图4所示,在该嵌入式系统中,除了包括该芯片以外,还可以包括内存、BIOS(中文:Basic Input Output System,英文:基本输入输出系统)、网络接口芯片、串行接口芯片等。其中,该芯片可以被视为处理器,分别与内存、BIOS、网络接口芯片和串行接口芯片相连接。 It should be noted that the above-mentioned chips as shown in FIG. 1 to FIG. 3 can be applied to an embedded system. For example, as shown in FIG. 4, in the embedded system, in addition to the chip, a memory, a BIOS (Chinese: Basic Input Output System, English: basic input/output system), a network interface chip, and a serial interface may be included. Chips, etc. Among them, the chip can be regarded as a processor, which is respectively connected with a memory, a BIOS, a network interface chip and a serial interface chip.
本发明实施例提供一种扩展处理器指令集的方法,如图5所示,该方法流程由芯片内部各个模块共同来执行,该方法流程包括:An embodiment of the present invention provides a method for extending a processor instruction set. As shown in FIG. 5, the method flow is performed by each module in a chip. The method process includes:
101、处理器内核通过片上总线从内存加载指令。101. The processor core loads instructions from memory through an on-chip bus.
系统在运行过程中,机器语言程序都保存在内存里,处理器内核需要不断的从内存中读取这些程序代码,并译码、执行,从而完成这些指令对应的功能。以ARM7内核的三级指令流水线为例,当第一条指令正在执行时,第二条指令正在译码,第三条指令正在从内存读取,此时,程序计数器中的内容为第三条指令的指令地址。During the running of the system, the machine language programs are stored in the memory, and the processor core needs to continuously read the program codes from the memory, decode and execute them, thereby completing the functions corresponding to the instructions. Taking the ARM7 core's three-level instruction pipeline as an example, when the first instruction is being executed, the second instruction is being decoded, and the third instruction is being read from the memory. At this time, the content of the program counter is the third. The instruction address of the instruction.
在本发明实施例中,在处理器内核从内存读取程序代码时,这些代码会经过片上总线,此时,用FPGA实现的监控模块、加速器都可以识别这些指令。为了便于监控模块和执行模块来进行指令的识别,在本发明实施例中,AHB总线定义了HPROT[3:0]信号。其中,HPROT[0]=0表示当前的总线操作是在取指令,HPROT[0]=1表示当前的总线操作是在取数据。并且,这个信号可以连接到片上总线上的所有设备,也就意味着,其他设备可以通过HPROT[3:0]信号的电平来区分当前在片上总线上传送的是指令还是数据,并且,仅当片上总线上传输的为指令时,执行步骤102。In the embodiment of the present invention, when the processor core reads the program code from the memory, the code passes through the on-chip bus. At this time, the monitoring module and the accelerator implemented by the FPGA can recognize the instructions. In order to facilitate the identification of the module by the monitoring module and the execution module, in the embodiment of the invention, the AHB bus defines the HPROT[3:0] signal. Among them, HPROT[0]=0 means that the current bus operation is fetching, HPROT[0]=1 means that the current bus operation is fetching data. Moreover, this signal can be connected to all devices on the on-chip bus, which means that other devices can use the level of the HPROT[3:0] signal to distinguish whether the instruction is currently being transmitted on the on-chip bus, and only When the instruction transmitted on the on-chip bus is an instruction, step 102 is performed.
102、监控模块通过片上总线识别指令,并将指令中的扩展指令保存至本地存储器。102. The monitoring module identifies the instruction through the on-chip bus, and saves the extended instruction in the instruction to the local memory.
当监控模块识别到经过片上总线进行传输的指令中存在扩展指令时,可以直接将扩展指令保存至本地存储器。需要说明的是,执行模块也可以将扩展指令保存在本地,并且暂时不执行这些扩展指令。也就是当HPROT[0]=0,且通过片上总线传送的指令为扩展指令时,监控模块会将这个扩展指令存储在本地的指令存储器中,即存储在本地存储器中。与此同时,能够识别该扩展指令的执行模块也会将这个指令保存到本地。此时,执行模块并不会执行该扩展指令,而是等待监控模块发出触发执行模块执行该扩展指令的信息。When the monitoring module recognizes that there is an extended instruction in the instruction transmitted through the on-chip bus, the extended instruction can be directly saved to the local memory. It should be noted that the execution module may also save the extension instruction locally and temporarily not execute the extension instruction. That is, when HPROT[0]=0, and the instruction transmitted through the on-chip bus is an extended instruction, the monitoring module stores the extended instruction in the local instruction memory, that is, in the local memory. At the same time, the execution module that recognizes the extension instruction will also save this instruction locally. At this time, the execution module does not execute the extended instruction, but waits for the monitoring module to issue information that triggers the execution module to execute the extended instruction.
103、当处理器内核加载到扩展指令之后,处理器内核译码扩展指令,产生未定义指令异常。103. After the processor core loads the extended instruction, the processor core decodes the extended instruction to generate an undefined instruction exception.
其中,扩展指令为已存储至本地存储器中的扩展指令。The extended instruction is an extended instruction that has been stored in the local memory.
在处理器内核对扩展指令进行译码时,由于处理器内核不能识别这条扩展指令,因此,处理器内核会产生“未定义指令”的异常,即产生未定义指令异常,并跳转到“未定义指令”对应的处理程序去执行。需要说明的是,在产生未定义指令异常之后并不会立即执行,而是等到当前正在执行的指令执行完后才会处理,具体处理方式会在后文提及,在此不做赘述。When the processor core decodes the extended instruction, since the processor core cannot recognize the extended instruction, the processor core generates an "undefined instruction" exception, that is, an undefined instruction exception is generated, and jumps to " The handler corresponding to the "undefined instruction" is executed. It should be noted that after the undefined instruction exception is generated, it will not be executed immediately, but will not be processed until the currently executing instruction is executed. The specific processing manner will be mentioned later, and will not be described here.
104、在处理器内核执行完当前指令之后,处理器内核执行异常处理程序。104. After the processor core executes the current instruction, the processor core executes an exception handler.
其中,异常处理程序为未定义指令异常触发的程序。Among them, the exception handler is a program triggered by an undefined instruction exception.
在开始执行异常处理程序时,处理器内核保存当前的断点,即备份处理器内核的通用寄存器、程序计数器和状态寄存器中的内容。对于ARM7内核来说,备份的程序计数器的取值等于当前程序计数器的内容减去4,即扩展指令的下一条指令的指令地址,将该指令地址作为异常处理程序的返回地址;修 改程序计数器的内容,即设置未定义指令异常的处理程序的起始地址,例如:设置为0x0000_0004;处理器内核从程序计数器指定的内存地址加载代码,而这个代码就是异常处理程序。At the beginning of the execution of the exception handler, the processor core saves the current breakpoint, which is the contents of the general purpose registers, program counters, and status registers of the backup processor core. For the ARM7 core, the value of the backup program counter is equal to the content of the current program counter minus 4, that is, the instruction address of the next instruction of the extended instruction, and the instruction address is used as the return address of the exception handler; Change the content of the program counter, that is, the start address of the handler that sets the undefined instruction exception, for example: set to 0x0000_0004; the processor core loads the code from the memory address specified by the program counter, and this code is the exception handler.
需要说明的是,在处理器内核执行异常处理程序时,处理器内核暂停执行扩展指令之后的指令,并告知监控模块,以确保监控模块能够触发扩展指令对应的执行模块执行扩展指令。在本发明实施例中,当执行模块完成扩展指令的执行操作之后,可以通过片上总线向监控模块发送完成信息,或者通过执行模块与监控模块之间的握手信号来通知监控模块。It should be noted that when the processor core executes the exception handler, the processor core suspends the execution of the instruction after the extended instruction, and notifies the monitoring module to ensure that the monitoring module can trigger the execution module corresponding to the extended instruction to execute the extended instruction. In the embodiment of the present invention, after the execution module completes the execution operation of the extended instruction, the completion information may be sent to the monitoring module through the on-chip bus, or the monitoring module may be notified by performing a handshake signal between the module and the monitoring module.
105、监控模块触发扩展指令对应的执行模块执行扩展指令。105. The monitoring module triggers an execution module corresponding to the extended instruction to execute the extended instruction.
106、执行模块执行扩展指令。106. The execution module executes the extended instruction.
107、执行模块向监控模块发送执行模块执行完扩展指令的消息。107. The execution module sends a message to the monitoring module that the execution module executes the extended instruction.
108、监控模块控制处理器内核退出异常处理程序。108. The monitoring module controls the processor core to exit the exception handler.
在对应的执行模块执行完扩展指令之后,监控模块就可以控制处理器内核退出异常处理程序,并执行步骤109,具体的实现方式会在后文提出,在此不做赘述。After the corresponding execution module executes the extended instruction, the monitoring module can control the processor core to exit the exception processing program, and execute step 109. The specific implementation manner will be described later, and will not be further described herein.
109、处理器内核继续执行扩展指令之后的指令。109. The processor core continues to execute instructions after the extended instruction.
在本发明中,提出了一种芯片内部结构,在不修改处理器内核的前提下,确保监控模块能够将扩展指令分配到与该扩展指令对应的执行模块上执行,且在执行模块执行该扩展指令的过程中,使处理器内核暂停执行扩展指令之后的指令,以保证指令的执行顺序。虽然处理器内核无法执行该扩展指令,但由于执行模块能够执行该扩展指令,因此,即便处理器内核不具备能够执行该扩展指令的能力,但仍可以借助执行模块来顺利完成扩展指令的执行过程,且在执行完该扩展指令之后,再触发处理器内核继续执行该扩展指令之后的指令。这样一来,就相当于扩展了处理器内核的指令集,因此,提升了处理器的业务处理能力,提高了处理器的处理速度In the present invention, a chip internal structure is proposed. Under the premise of not modifying the processor core, it is ensured that the monitoring module can allocate the extended instruction to the execution module corresponding to the extended instruction, and the execution module performs the expansion. During the instruction, the processor core is suspended to execute the instruction after the extended instruction to ensure the order of execution of the instruction. Although the processor core cannot execute the extended instruction, since the execution module can execute the extended instruction, even if the processor core does not have the ability to execute the extended instruction, the execution module can be used to successfully complete the execution of the extended instruction. And after executing the extended instruction, triggering the processor core to continue executing the instruction after the extended instruction. In this way, it is equivalent to expanding the instruction set of the processor core, thus improving the processor's business processing capability and improving the processing speed of the processor.
为了准确确定能够执行当前扩展指令的执行模块,在本发明实施例的一个实现方式中,每个执行模块能够执行至少一条扩展指令,每条扩展指令对应一个指令地址,监控模块可以根据处理器内核产生未定义指令异常时程序计数器中的内容来确定扩展指令的下一条指令的指令地址,之后推算出扩展指令的指令地址,并通知扩展指令对应的执行模块来执行该扩展指令。因此,在如图5所示的实现方式的基础上,还可以实现为如图6所示的实现方式。其中,在执行步骤104之后,还可以执行步骤110;步骤105监控模块触发扩展指令对应的执行模块执行扩展指令,可以具体实现为步骤1051和步骤1052:In an implementation manner of the embodiment of the present invention, each execution module can execute at least one extended instruction, each extended instruction corresponds to one instruction address, and the monitoring module can be based on the processor core. The content of the program counter when the undefined instruction exception is generated is used to determine the instruction address of the next instruction of the extended instruction, and then the instruction address of the extended instruction is derived, and the execution module corresponding to the extended instruction is notified to execute the extended instruction. Therefore, on the basis of the implementation shown in FIG. 5, an implementation as shown in FIG. 6 can also be implemented. After the step 104 is performed, the step 110 is performed. The step 105 is performed by the execution module that is triggered by the monitoring module to trigger the extension instruction, and may be specifically implemented as step 1051 and step 1052:
110、处理器内核向监控模块发送当处理器内核产生未定义指令异常时程序计数器中的内容。110. The processor core sends the monitoring module a content in the program counter when the processor core generates an undefined instruction exception.
其中,程序计数器中的内容为扩展指令的下一条指令的指令地址。The content in the program counter is the instruction address of the next instruction of the extended instruction.
在处理器内核进入异常处理程序之前,处理器内核需要先将产生未定义指令异常时程序计数器中的内容进行备份,之后在处理器内核执行异常处理 程序时,处理器内核需要将已经备份的内容向监控模块发送。Before the processor core enters the exception handler, the processor core needs to back up the contents of the program counter when an undefined instruction exception is generated, and then perform exception handling in the processor core. In the program, the processor core needs to send the content that has been backed up to the monitoring module.
1051、监控模块根据下一条指令的指令地址,确定扩展指令的指令地址。1051. The monitoring module determines an instruction address of the extended instruction according to an instruction address of the next instruction.
1052、监控模块通知扩展指令对应的执行模块执行与指令地址对应的扩展指令。1052. The monitoring module notifies the execution module corresponding to the extended instruction to execute the extended instruction corresponding to the instruction address.
在程序指令中,可能存在跳转指令或者条件指令。对于跳转指令而言,程序在执行指令时候并不是按顺序执行的,而是跳过后面的若干条指令继续执行;对于条件指令而言,程序在执行某条指令之前,需要先判断是否满足指定条件,不满足的话就不执行这条指令,因此,此前保存的扩展指令不一定需要执行,很有可能跳过扩展指令来执行后面的指令。例如:如表一所示内容,一个程序段依次包括五条指令,前面三条已经进入指令流水线,其中,第一条指令是跳转指令,要求跳转到第三条指令,这样一来,扩展指令1虽然已经预先被保存到监控模块和对应的执行模块的本地存储器中,但并不会被执行,而实际需要执行的指令是扩展指令2。因此,为了避免执行模块执行了错误的扩展指令,监控模块需要将当前应该执行的扩展指令的信息告诉该扩展指令对应的执行模块。In a program instruction, there may be a jump instruction or a conditional instruction. For a jump instruction, the program does not execute in order when executing the instruction, but skips the following several instructions to continue execution; for a conditional instruction, the program needs to determine whether it satisfies before executing an instruction. If the condition is not met, the instruction will not be executed. Therefore, the previously saved extended instruction does not need to be executed. It is very possible to skip the extended instruction to execute the following instruction. For example, as shown in Table 1, a program segment includes five instructions in sequence, and the first three lines have entered the instruction pipeline. The first instruction is a jump instruction, which requires a jump to the third instruction. In this way, the extended instruction 1 Although it has been previously saved to the local memory of the monitoring module and the corresponding execution module, it is not executed, and the actual instruction that needs to be executed is the extended instruction 2. Therefore, in order to prevent the execution module from executing the wrong extended instruction, the monitoring module needs to inform the execution module corresponding to the extended instruction of the information of the extended instruction that should be currently executed.
表一Table I
指令编号Instruction number 指令类型Instruction type
第一条指令First instruction 跳转指令Jump instruction
第二条指令Second instruction 扩展指令1Extended instruction 1
第三条指令Third instruction 普通指令1Ordinary instruction 1
第四条指令Fourth instruction 扩展指令2Extended instruction 2
在处理器内核执行扩展指令2时,即在扩展指令2的译码过程中会产生未定义指令异常,处理器内核进入异常处理程序。此时,处理器内核需要先获取保存的断点地址,也就是扩展指令的下一条指令的指令地址,并将下一条指令的指令地址发给监控模块,或者减去4以后再发给监控模块。考虑到每条指令的长度是4个字节,这样监控模块就可以根据处理器内核发送的指令地址,来确定扩展指令2自身所在的地址。When the processor core executes the extended instruction 2, that is, during the decoding of the extended instruction 2, an undefined instruction exception is generated, and the processor core enters the exception handler. At this point, the processor core needs to obtain the saved breakpoint address, that is, the instruction address of the next instruction of the extended instruction, and send the instruction address of the next instruction to the monitoring module, or subtract the 4 and send it to the monitoring module. . Considering that the length of each instruction is 4 bytes, the monitoring module can determine the address of the extended instruction 2 itself according to the instruction address sent by the processor core.
本发明实施例提供的一种扩展处理器指令集的方法,在处理器内核执行异常处理程序时,处理器内核暂停执行扩展指令之后的指令,并向监控模块发送当处理器内核产生未定义指令异常时程序计数器中的下一条指令的指令地址,之后由监控模块根据下一条指令的指令地址,确定扩展指令的指令地址,并通知扩展指令对应的执行模块执行与指令地址对应的扩展指令。本发明可以根据所获取的程序计数器中所记载的下一条指令的指令地址,准确推算出扩展指令的指令地址,从而确保监控模块可以及时通知能够用于执行该扩展指令的执行模块执行该扩展指令。The embodiment of the present invention provides a method for extending a processor instruction set. When a processor core executes an exception handler, the processor core suspends execution of an instruction after the extended instruction, and sends the undefined instruction to the monitoring module when the processor core generates an undefined instruction. The instruction address of the next instruction in the program counter at the time of the exception, and then the monitoring module determines the instruction address of the extended instruction according to the instruction address of the next instruction, and notifies the execution module corresponding to the extended instruction to execute the extended instruction corresponding to the instruction address. The invention can accurately calculate the instruction address of the extended instruction according to the instruction address of the next instruction recorded in the acquired program counter, thereby ensuring that the monitoring module can timely notify the execution module that can execute the extended instruction to execute the extended instruction. .
考虑到芯片中可能设置有多个执行模块,而每个执行模块都对应至少一个能够执行的扩展指令,因此,为了确保监控模块所触发的执行模块可以用于执行扩展指令,在本发明实施例的一个实现方式中,监控模块在通知执行模块执行扩展指令之前,需要先确定待通知的执行模块具体为哪个执行模块。 因此,在如图6所示的实现方式的基础上,还可以实现为如图7所示的实现方式。其中,在执行步骤1052监控模块通知扩展指令对应的执行模块执行与指令地址对应的扩展指令之前,还可以执行步骤111:Considering that a plurality of execution modules may be disposed in the chip, and each execution module corresponds to at least one extended instruction that can be executed. Therefore, in order to ensure that the execution module triggered by the monitoring module can be used to execute the extended instruction, in the embodiment of the present invention In an implementation manner, before the monitoring module executes the extended instruction, the monitoring module needs to determine which execution module is to be notified. Therefore, on the basis of the implementation shown in FIG. 6, an implementation as shown in FIG. 7 can also be implemented. If the execution module corresponding to the extended instruction is executed by the monitoring module in step 1052, the execution module corresponding to the instruction address is executed, and step 111 is performed:
111、监控模块根据已存储至本地存储器的扩展指令,确定扩展指令对应的执行模块。111. The monitoring module determines an execution module corresponding to the extended instruction according to the extended instruction that has been stored to the local storage.
监控模块的本地存储器中存储有扩展指令与执行模块之间的对应关系,因此,监控模块可以根据已存储的扩展指令的内容来确定各个扩展指令对应的执行模块。The local memory of the monitoring module stores a correspondence between the extended instruction and the execution module. Therefore, the monitoring module can determine the execution module corresponding to each extended instruction according to the content of the stored extended instruction.
本发明实施例提供的一种扩展处理器指令集的方法,在监控模块通知执行模块执行扩展指令之前,监控模块需要根据扩展指令来确定该扩展指令对应的执行模块。本发明可以通过先根据扩展指令来确定执行模块的方式,可以找到唯一的用于执行该扩展指令的执行模块,这样能够确保在监控模块通知该执行模块执行扩展指令之后,该执行模块可以成功执行该扩展指令。The method for extending the processor instruction set provided by the embodiment of the present invention, before the monitoring module notifies the execution module to execute the extended instruction, the monitoring module needs to determine the execution module corresponding to the extended instruction according to the extended instruction. The present invention can find a unique execution module for executing the extended instruction by first determining the manner of executing the module according to the extended instruction, which can ensure that the execution module can be successfully executed after the monitoring module notifies the execution module to execute the extended instruction. The extension instruction.
为了对所有存储至本地存储器的扩展指令进行统一的管理,在本发明实施例的一个实现方式中,监控模块可以将指令中的扩展指令按照预设格式存储至本地存储器,以便于后续根据已存储的内容来准确确定扩展指令对应的执行模块。因此,在如图5所示的实现方式的基础上,还可以实现为如图8所示的实现方式。其中,步骤102监控模块通过片上总线识别指令,并将指令中的扩展指令保存至本地存储器,可以具体实现为步骤1021:In an implementation manner of the embodiment of the present invention, the monitoring module may store the extended instruction in the instruction to the local storage according to a preset format, so as to be subsequently stored according to the storage. The content to accurately determine the execution module corresponding to the extended instruction. Therefore, on the basis of the implementation shown in FIG. 5, it can also be implemented as the implementation shown in FIG. The step 102: The monitoring module identifies the instruction by the on-chip bus, and saves the extended instruction in the instruction to the local memory, which may be specifically implemented as step 1021:
1021、监控模块通过片上总线识别指令,并将指令中的扩展指令按照预设格式保存至本地存储器。1021. The monitoring module identifies the instruction by using an on-chip bus, and saves the extended instruction in the instruction to the local memory according to a preset format.
其中,预设格式包括扩展指令的指令地址、扩展指令的内容和扩展指令对应的执行模块。The preset format includes an instruction address of the extended instruction, a content of the extended instruction, and an execution module corresponding to the extended instruction.
扩展指令保存至监控模块的本地存储器中的格式如表二所示,其中,保存着每条扩展指令的指令地址、内容,以及每条扩展指令对应的执行模块。当监控模块接收到处理器内核发来的指令地址时,可以根据指令地址与表二所示的内容,查出当前将要执行的扩展指令具体为哪一条指令,以及该扩展指令对应于哪一个执行模块,之后将上述内容向对应的执行模块发送。The format in which the extended instruction is saved to the local memory of the monitoring module is as shown in Table 2, in which the instruction address and content of each extended instruction are stored, and the execution module corresponding to each extended instruction is stored. When the monitoring module receives the instruction address sent by the processor core, it can find out which instruction is currently executing the extended instruction according to the instruction address and the contents shown in Table 2, and which execution is performed by the extended instruction. The module then sends the above content to the corresponding execution module.
表二Table II
扩展指令的指令地址Instruction address of the extended instruction 扩展指令的内容Extended instruction content 扩展指令对应的执行模块Execution module corresponding to the extended instruction
指令地址1Instruction address 1 扩展指令1Extended instruction 1 加速器1Accelerator 1
指令地址2Instruction address 2 扩展指令2Extended instruction 2 加速器1Accelerator 1
指令地址3Instruction address 3 扩展指令3Extended instruction 3 加速器2Accelerator 2
指令地址4Instruction address 4 扩展指令4Extended instruction 4 加速器2Accelerator 2
在加速器接收到信息之后,根据扩展指令的内容,从指定的源数据地址读取数据,并执行对应的加速操作,之后保存到指定的目标地址。After the accelerator receives the information, the data is read from the specified source data address according to the content of the extended instruction, and the corresponding acceleration operation is performed, and then saved to the specified target address.
本发明实施例提供的一种扩展处理器指令集的方法,在监控模块通过片上总线识别到扩展指令之后,监控模块将所识别到的扩展指令进行存储,存储格式如下:指令地址、内容和对应的执行模块。本发明中监控模块可以通 过按照预设格式对从片上总线识别到的扩展指令进行存储,可以便于对已存储的扩展指令进行管理。此外,当需要为扩展指令确定对应的执行模块时,也可以直接从本地存储器调用已存储的内容,来快速锁定扩展指令所对应的执行模块,并由监控模块触发该执行模块执行该扩展指令。The method for extending the processor instruction set provided by the embodiment of the present invention, after the monitoring module recognizes the extended instruction by the on-chip bus, the monitoring module stores the identified extended instruction, and the storage format is as follows: the instruction address, the content, and the corresponding Execution module. The monitoring module of the invention can pass By storing the extended instructions identified from the on-chip bus in a preset format, it is convenient to manage the stored extended instructions. In addition, when it is necessary to determine a corresponding execution module for the extended instruction, the stored content may be directly called from the local memory to quickly lock the execution module corresponding to the extended instruction, and the execution module triggers the execution module to execute the extended instruction.
为了确保处理器内核能够暂停执行该扩展指令之后的指令,在本发明实施例的一个实现方式中,处理器内核可以暂时停留在异常处理程序中,因此,在如图6所示的实现方式的基础上,还可以实现为如图9所示的实现方式。其中,在执行完步骤1052监控模块通知扩展指令对应的执行模块执行与指令地址对应的扩展指令之后,还可以执行步骤112:In an implementation manner of the embodiment of the present invention, the processor core may temporarily stay in the exception handling program, and thus, in an implementation manner as shown in FIG. 6, in order to ensure that the processor core can suspend execution of the instruction after the extended instruction. In addition, it can also be implemented as an implementation as shown in FIG. After performing the step 1052, the monitoring module notifies the execution module corresponding to the extended instruction to execute the extended instruction corresponding to the instruction address, and may perform step 112:
112、处理器内核暂时停留在异常处理程序中。112. The processor core temporarily stays in the exception handler.
考虑到在处理器内核进入异常处理程序后,暂时不能退出,否则处理器内核会继续执行产生未定义指令异常的扩展指令之后的指令。如果之后的指令依赖于产生未定义指令异常的扩展指令对应的执行模块的执行结果,那么数据就会出错。因此,处理器内核必须等到执行模块完成扩展指令的执行之后,才可以退出异常处理程序。因此,处理器内核开始执行异常处理程序之后,仅当执行模块完成扩展指令的执行操作之后,才能退出异常处理程序,否则,处理器内核将停留在异常处理程序中。比如:处理器内核在加速器完成加速操作之前,不能退出异常处理程序,即处理器内核需要暂时停留在异常处理程序中。Considering that after the processor core enters the exception handler, it cannot temporarily exit, otherwise the processor core will continue to execute the instruction after the extended instruction that generates the undefined instruction exception. If the subsequent instruction depends on the execution result of the execution module corresponding to the extended instruction that generated the undefined instruction exception, the data will be in error. Therefore, the processor core must wait until the execution module completes the execution of the extended instruction before exiting the exception handler. Therefore, after the processor core starts executing the exception handler, the exception handler can be exited only after the execution module completes the execution of the extension instruction. Otherwise, the processor core will stay in the exception handler. For example, the processor core cannot exit the exception handler until the accelerator completes the acceleration operation, that is, the processor core needs to temporarily stay in the exception handler.
本发明实施例提供的一种扩展处理器指令集的方法,可以通过使处理器内核暂时停留在异常处理程序中,来确保处理器内核暂停执行扩展指令之后的指令。本发明中,当处理器内核暂时停留在异常处理程序中时,可以确保处理器内核在当前能够不继续执行扩展指令之后的指令。A method for extending a processor instruction set according to an embodiment of the present invention can ensure that a processor core suspends execution of an instruction after an extended instruction by temporarily causing a processor core to stay in an exception handler. In the present invention, when the processor core temporarily stays in the exception handler, it is possible to ensure that the processor core is currently unable to continue executing the instruction after the extension instruction.
为了确保处理器内核能够暂时停留在异常处理程序中,在本发明实施例的一个实现方式中,处理器内核可以通过反复访问预设的保留内存地址来使自身暂时停留在异常处理程序中。因此,在如图9所示的实现方式的基础上,还可以实现为如图10所示的实现方式。其中,步骤112处理器内核暂时停留在异常处理程序中,可以具体实现为步骤1121至步骤1123:In order to ensure that the processor core can temporarily stay in the exception handler, in one implementation of the embodiment of the present invention, the processor core can temporarily stay in the exception handler by repeatedly accessing the preset reserved memory address. Therefore, on the basis of the implementation shown in FIG. 9, an implementation as shown in FIG. 10 can also be implemented. The processor core of step 112 temporarily stays in the exception handling program, and can be specifically implemented as step 1121 to step 1123:
1121、处理器内核访问预设的保留内存地址。1121. The processor core accesses a preset reserved memory address.
其中,保留内存地址不对应任何一个物理内存单元。The reserved memory address does not correspond to any physical memory unit.
1122、监控模块向处理器内核发送重试响应。1122. The monitoring module sends a retry response to the processor core.
1123、处理器内核再次访问保留内存地址。1123. The processor core accesses the reserved memory address again.
在处理器内核访问预设的保留内存地址时,由于保留内存地址不对应任何一个物理内存单元,因此,监控模块监测到处理器内核访问保留内存地址后,可以通过AHB总线的HRESP[1:0]信号和HREADY信号向处理器内核发出响应,即重试响应。当HRESP[1:0]=10,且HREADY=0时,表示RETRY响应,此时,处理器内核会反复重试直到成功。需要说明的是,当HRESP[1:0]=10,且HREADY=0时,处理器内核均会反复执行访问保留内存地址的指令,且不会继续执行扩展指令之后的指令。 When the processor core accesses the preset reserved memory address, since the reserved memory address does not correspond to any one of the physical memory units, the monitoring module monitors that the processor core accesses the reserved memory address and can pass the HRESP of the AHB bus [1:0]. The signal and the HREADY signal respond to the processor core, ie retry the response. When HRESP[1:0]=10 and HREADY=0, it indicates RETRY response. At this time, the processor core will repeatedly try again until it succeeds. It should be noted that when HRESP[1:0]=10 and HREADY=0, the processor core repeatedly executes the instruction to access the reserved memory address, and does not continue to execute the instruction after the extended instruction.
本发明实施例提供的一种扩展处理器指令集的方法,可以通过使处理器反复访问预设的保留内存地址,来确保处理内核能够暂时停留在异常处理程序中。本发明中,在处理器内核访问预设的保留内存地址时,由于监控模块向处理器内核返回了重试响应,因此,处理器内核会反复访问这个预设的保留内存地址,从而确保处理器内核能够暂时停留在异常处理程序中。也就意味着,处理器内核能够暂时不执行扩展指令之后的下一条指令。A method for extending a processor instruction set according to an embodiment of the present invention can ensure that a processing core can temporarily stay in an exception handling program by causing a processor to repeatedly access a preset reserved memory address. In the present invention, when the processor core accesses the preset reserved memory address, since the monitoring module returns a retry response to the processor core, the processor core repeatedly accesses the preset reserved memory address, thereby ensuring the processor. The kernel can temporarily stay in the exception handler. This means that the processor core can temporarily not execute the next instruction after the extended instruction.
为了确保处理器内核能在必要时退出异常处理程序,从而继续执行扩展指令之后的指令,在本发明实施例的一个实现方式中,在处理器内核接收到监控模块发送的正常完成响应之后,处理器内核退出异常处理程序。因此,在如图10所示的实现方式的基础上,还可以实现为如图11所示的实现方式。其中,步骤108监控模块控制处理器内核退出异常处理程序,可以具体实现为步骤1081和步骤1082:In an implementation manner of the embodiment of the present invention, after the processor core receives the normal completion response sent by the monitoring module, the processing is performed in order to ensure that the processor core can exit the exception processing program and then continue to execute the instruction after the extended instruction. The kernel exits the exception handler. Therefore, on the basis of the implementation shown in FIG. 10, an implementation as shown in FIG. 11 can also be implemented. The step 108 of the monitoring module controls the processor core to exit the exception handling process, which may be specifically implemented as step 1081 and step 1082:
1081、监控模块向处理器内核发送正常完成响应。1081. The monitoring module sends a normal completion response to the processor core.
需要说明的是,正常完成响应为处理器内核在访问保留内存地址之后,正常完成访问过程的响应。It should be noted that the normal completion response is a response of the processor core to complete the access process normally after accessing the reserved memory address.
1082、处理器内核退出异常处理程序。1082. The processor core exits the exception handler.
当监控模块知道执行模块已执行完扩展指令之后,如果处理器内核再次发起重试操作,则会向处理器内核返回HRESP[1:0]=00且HREADY=1的信号。其中,HRESP[1:0]=00表示OKAY响应,于是处理器内核完成当前指令的执行,退出异常处理程序,继续执行扩展指令之后的指令。After the monitoring module knows that the execution module has executed the extended instruction, if the processor core initiates the retry operation again, it will return a signal of HRESP[1:0]=00 and HREADY=1 to the processor core. Among them, HRESP[1:0]=00 represents the OKAY response, so the processor core completes the execution of the current instruction, exits the exception handler, and continues to execute the instruction after the extended instruction.
本发明实施例提供的一种扩展处理器指令集的方法,可以使处理器内核在接收到监控模块发送的正常完成响应之后,退出异常处理程序。本发明能够确保处理器内核能在必要时退出异常处理程序。The method for extending the processor instruction set provided by the embodiment of the present invention may enable the processor core to exit the exception handling program after receiving the normal completion response sent by the monitoring module. The present invention ensures that the processor core can exit the exception handler if necessary.
在本发明实施例的一个实现方式中,除了通过使处理器内核反复访问预设的保留内存地址来确保处理器内核暂时停留在异常处理程序中,还可以由监控模块向处理器内核发送硬件信号,并当硬件信号为低电平时,使处理器内核暂时停留在异常处理程序中。因此,在如图9所示的实现方式的基础上,还可以实现为如图12所示的实现方式。其中,在执行完步骤103之后,还可以执行步骤114和步骤115;步骤112处理器内核暂时停留在异常处理程序中,可以具体实现为步骤1124:In an implementation manner of the embodiment of the present invention, in addition to ensuring that the processor core temporarily stays in the exception handling program by repeatedly accessing the preset reserved memory address by the processor core, the monitoring module may also send a hardware signal to the processor core. And when the hardware signal is low, the processor core temporarily stays in the exception handler. Therefore, on the basis of the implementation shown in FIG. 9, an implementation as shown in FIG. 12 can also be realized. After the step 103 is performed, the step 114 and the step 115 are performed. The processor core temporarily stays in the exception handling program, and may be specifically implemented as step 1124:
114、监控模块生成硬件信号。114. The monitoring module generates a hardware signal.
其中,硬件信号用于控制处理器内核是否停留在异常处理程序中。Among them, the hardware signal is used to control whether the processor core stays in the exception handler.
115、监控模块向处理器内核发送硬件信号。115. The monitoring module sends a hardware signal to the processor core.
1124、当硬件信号为低电平时,处理器内核暂时停留在异常处理程序中。1124. When the hardware signal is low, the processor core temporarily stays in the exception handler.
本发明实施例提供的一种扩展处理器指令集的方法,在处理器内核产生未定义指令异常之后,处理器内核向监控模块发送备份的程序计数器中的内容,之后监控模块根据所接收到的程序计数器中的内容生成硬件信号,并向处理器内核发送。当处理器内核接收到的硬件信号为低电平时,处理器内核暂停运行,也就相当于处理器内核暂时停留在异常处理程序中。本发明中, 在监控模块根据程序计数器中的内容,生成低电平的硬件信号,且向处理器内核发送之后,可以确保处理器内核能够暂时停留在异常处理程序中。也就意味着,处理器内核能够暂时不执行扩展指令之后的下一条指令。A method for extending a processor instruction set according to an embodiment of the present invention, after the processor core generates an undefined instruction exception, the processor core sends the content of the backup program counter to the monitoring module, and then the monitoring module receives the received The contents of the program counter generate hardware signals and send them to the processor core. When the hardware signal received by the processor core is low, the processor core is suspended, which is equivalent to the processor core temporarily staying in the exception handler. In the present invention, After the monitoring module generates a low-level hardware signal according to the contents of the program counter and sends it to the processor core, it can ensure that the processor core can temporarily stay in the exception handler. This means that the processor core can temporarily not execute the next instruction after the extended instruction.
考虑到在执行模块完成扩展指令的执行过程之后,处理器内核需要继续执行扩展指令之后的指令,在本发明实施例的一个实现方式中,可以根据监控模块向处理器内核发送的硬件信号为高电平时,使处理器内核继续执行扩展指令之后的指令。因此,在如图12所示的实现方式的基础上,还可以实现为如图13所示的实现方式。其中,步骤108监控模块控制处理器内核退出异常处理程序,可以具体实现为步骤1083:Considering that after the execution module completes the execution process of the extended instruction, the processor core needs to continue to execute the instruction after the extended instruction. In an implementation manner of the embodiment of the present invention, the hardware signal sent to the processor core according to the monitoring module may be high. At the level, the processor core continues to execute instructions following the extended instruction. Therefore, on the basis of the implementation shown in FIG. 12, an implementation as shown in FIG. 13 can also be realized. The step 108 of the monitoring module controls the processor core to exit the exception handling process, which may be specifically implemented as step 1083:
1083、当硬件信号为高电平时,处理器内核退出异常处理程序。1083. When the hardware signal is high, the processor core exits the exception handler.
本发明实施例提供的一种扩展处理器指令集的方法,当监控模块向处理器内核发送的硬件信号为高电平时,处理器内核继续执行扩展指令之后的指令。本发明中,监控模块可以通过生成不同电平的硬件信号,来有效控制处理器内核的工作状态。The embodiment of the present invention provides a method for extending a processor instruction set. When a hardware signal sent by a monitoring module to a processor core is a high level, the processor core continues to execute an instruction after the extended instruction. In the present invention, the monitoring module can effectively control the operating state of the processor core by generating hardware signals of different levels.
为了确保处理器内核能够顺利执行扩展指令之后的下一条指令,在本发明实施例的一个实现方式中,处理器内核需要在恢复断点之后继续执行扩展指令后面的下一条指令。因此,在如图11或13所示的实现方式的基础上,以如图11所示的内容为例,还可以实现为如图14所示的实现方式。其中,步骤109处理器内核继续执行扩展指令之后的指令,可以具体实现为步骤1091至步骤1093:In order to ensure that the processor core can successfully execute the next instruction after the extended instruction, in one implementation of the embodiment of the present invention, the processor core needs to continue executing the next instruction following the extended instruction after restoring the breakpoint. Therefore, on the basis of the implementation shown in FIG. 11 or 13, the content shown in FIG. 11 can be taken as an implementation as shown in FIG. The step 109 after the processor core continues to execute the instruction after the extended instruction may be specifically implemented as step 1091 to step 1093:
1091、处理器内核将执行异常处理程序之前备份的通用寄存器、程序计数器和状态寄存器的数据恢复。1091. The processor core restores data of a general-purpose register, a program counter, and a status register that were backed up before executing the exception handler.
处理器内核在退出异常处理程序之后,需要先恢复断点,才能继续执行扩展指令之后的下一条指令,因此,处理器内核需要将此前备份的处理器内核的通用寄存器、程序计数器和状态寄存器都恢复到对应的寄存器中。After the processor core exits the exception handler, it needs to restore the breakpoint before continuing to execute the next instruction after the extended instruction. Therefore, the processor core needs to use the general-purpose register, program counter, and status register of the previously backed-up processor core. Restore to the corresponding register.
1092、处理器内核根据程序计数器中存储的下一条指令地址,从下一条指令地址取指令。1102: The processor core fetches an instruction from the next instruction address according to the next instruction address stored in the program counter.
处理器内核根据程序计数器的新值,从对应的指令地址取指令,此时,处理器内核已经退出异常处理程序。The processor core fetches instructions from the corresponding instruction address according to the new value of the program counter. At this time, the processor core has exited the exception handler.
1093、处理器内核执行下一条指令地址对应的指令。1093. The processor core executes an instruction corresponding to the next instruction address.
处理器内核继续执行产生未定义指令异常的扩展指令之后的指令。The processor core continues to execute instructions following the extended instruction that produced the undefined instruction exception.
本发明实施例提供的一种扩展处理器指令集的方法,处理器内核将执行异常处理程序之前存储的通用寄存器、程序计数器和状态寄存器的数据恢复,并根据程序计数器中存储的下一条指令地址,从下一条指令地址取指令,之后执行下一条指令地址对应的指令。相比较于现有技术中采用协处理器执行诸如浮点指令的专用指令,或是将硬件加速模块作为处理器的外设,由处理器通过PCIe等接口将数据发送给加速模块进行处理,本发明可以不通过修改处理器内核就实现扩展处理器内核的指令集的效果,从而提高处理器的处理速度。 The embodiment of the present invention provides a method for extending a processor instruction set. The processor core recovers data of a general-purpose register, a program counter, and a status register stored before executing an exception processing program, and according to a next instruction address stored in the program counter. , fetch the instruction from the next instruction address, and then execute the instruction corresponding to the next instruction address. Compared with the prior art, the coprocessor is used to execute a dedicated instruction such as a floating point instruction, or the hardware acceleration module is used as a peripheral of the processor, and the processor sends the data to the acceleration module through the PCIe interface, etc. The invention can achieve the effect of extending the instruction set of the processor core without modifying the processor core, thereby improving the processing speed of the processor.
上述主要从芯片内部各个模块之间交互的角度对本发明实施例提供的方案进行了介绍。可以理解的是,各个模块,例如监控模块、执行模块、处理模块等为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本发明能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。The solution provided by the embodiment of the present invention is mainly introduced from the perspective of interaction between modules in the chip. It can be understood that each module, such as a monitoring module, an execution module, a processing module, etc., in order to implement the above functions, includes hardware structures and/or software modules corresponding to the execution of the respective functions. Those skilled in the art will readily appreciate that the present invention can be implemented in a combination of hardware or hardware and computer software in combination with the elements and algorithm steps of the various examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods for implementing the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present invention.
本发明实施例可以根据上述方法示例对扩展处理器指令集的装置进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本发明实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。The embodiment of the present invention may perform the division of the function module on the device that expands the processor instruction set according to the foregoing method example. For example, each function module may be divided according to each function, or two or more functions may be integrated into one processing module. in. The above integrated modules can be implemented in the form of hardware or in the form of software functional modules. It should be noted that the division of the module in the embodiment of the present invention is schematic, and is only a logical function division, and the actual implementation may have another division manner.
在采用对应各个功能划分各个功能模块的情况下,图15示出了上述实施例中所涉及的扩展处理器指令集的装置的一种可能的结构示意图,装置20包括:处理模块21、监控模块22、执行模块23。其中,处理模块21用于执行指令,或是在出发场处理程序之后暂停执行扩展指令之后的指令,并在适当的时机,恢复断点,继续执行扩展指令之后的指令,比如:图5中的过程101、103和104、109,图9中的过程112,图10中的过程1121和1123,图11中的过程1082,图12中的过程1124,图13中的过程1083,图14中的过程1091至1093;监控模块22,用于识别经过片上总线的扩展指令,并保存在本地存储器中,之后触发相应的执行模块执行该扩展指令,以及后续控制处理模块退出异常处理程序等,比如:图5中的过程102、105、108,图6中的过程1051和1052,图7中的过程111,图10中的过程1122,图11中的过程1081,图12中的过程114和115;执行模块23,用于执行扩展指令,并在完成执行过程之后反馈至监控模块,比如:图5至图14中的过程106和107。该装置20还可以包括存储模块24,用于存储相关的程序代码和数据。其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。FIG. 15 is a schematic diagram showing a possible structure of an apparatus for expanding an instruction set of the processor in the foregoing embodiment. The device 20 includes: a processing module 21 and a monitoring module. 22. Execution module 23. The processing module 21 is configured to execute the instruction, or pause the execution of the instruction after the extended instruction after the departure field processing program, and resume the breakpoint at an appropriate timing, and continue to execute the instruction after the extended instruction, for example, in FIG. 5 Processes 101, 103 and 104, 109, process 112 in FIG. 9, processes 1121 and 1123 in FIG. 10, process 1082 in FIG. 11, process 1124 in FIG. 12, process 1083 in FIG. 13, and FIG. The process module 1091 to 1093; the monitoring module 22 is configured to identify an extended instruction that passes through the on-chip bus, and save it in the local memory, and then trigger the corresponding execution module to execute the extended instruction, and the subsequent control processing module exits the exception handling program, for example: Processes 102, 105, 108 in FIG. 5, processes 1051 and 1052 in FIG. 6, process 111 in FIG. 7, process 1122 in FIG. 10, process 1081 in FIG. 11, processes 114 and 115 in FIG. The execution module 23 is configured to execute the extended instruction and feed back to the monitoring module after completing the execution process, such as the processes 106 and 107 in FIGS. 5 to 14. The apparatus 20 can also include a storage module 24 for storing associated program code and data. All the related content of the steps involved in the foregoing method embodiments may be referred to the functional descriptions of the corresponding functional modules, and details are not described herein again.
其中,处理模块21可以是处理器或控制器,例如可以是中央处理器(英文:Central Processing Unit,简称:CPU),通用处理器,数字信号处理器(英文:Digital Signal Processor,简称:DSP),专用集成电路(英文:Application-Specific Integrated Circuit,简称:ASIC),现场可编程门阵列(英文:Field Programmable Gate Array,简称:FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本发明公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合, DSP和微处理器的组合等等。存储模块24可以是存储器。The processing module 21 may be a processor or a controller, for example, a central processing unit (English: Central Processing Unit, CPU for short), a general-purpose processor, and a digital signal processor (English: Digital Signal Processor, referred to as DSP). , Application-Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) or other programmable logic devices, transistor logic devices, hardware components or random combination. It is possible to implement or carry out the various illustrative logical blocks, modules and circuits described in connection with the present disclosure. The processor may also be a combination of computing functions, for example comprising one or more microprocessor combinations. A combination of DSP and microprocessor, and so on. The storage module 24 can be a memory.
当处理模块21和执行模块23为处理器,存储模块24为存储器,且各个模块之间通过通信接口进行数据传输时,本发明实施例所涉及的扩展处理器指令集的装置可以为图16所示的装置30。When the processing module 21 and the execution module 23 are processors, and the storage module 24 is a memory, and the data is transmitted through the communication interface between the modules, the apparatus for expanding the processor instruction set according to the embodiment of the present invention may be as shown in FIG. The device 30 is shown.
参阅图16所示,该装置30包括:处理器31、通信接口32、存储器33以及总线34。其中,通信接口32、处理器31以及存储器33通过总线34相互连接;总线34可以是外设部件互连标准(英文:Peripheral Component Interconnect,简称:PCI)总线或扩展工业标准结构(英文:Extended Industry Standard Architecture,简称:EISA)总线等。所述总线可以分为地址总线、数据总线、控制总线等。为便于表示,图16中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。Referring to FIG. 16, the apparatus 30 includes a processor 31, a communication interface 32, a memory 33, and a bus 34. The communication interface 32, the processor 31, and the memory 33 are connected to each other through a bus 34. The bus 34 may be a Peripheral Component Interconnect (PCI) bus or an extended industry standard structure (English: Extended Industry) Standard Architecture, referred to as EISA) bus. The bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is shown in Figure 16, but it does not mean that there is only one bus or one type of bus.
结合本发明公开内容所描述的方法或者算法的步骤可以硬件的方式来实现,也可以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(英文:Random Access Memory,简称:RAM)、闪存、只读存储器(英文:Read Only Memory,简称:ROM)、可擦除可编程只读存储器(英文:Erasable Programmable ROM,简称:EPROM)、电可擦可编程只读存储器(英文:Electrically EPROM,简称:EEPROM)、寄存器、硬盘、移动硬盘、只读光盘(简称:CD-ROM)或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。另外,该ASIC可以位于核心网接口设备中。当然,处理器和存储介质也可以作为分立组件存在于核心网接口设备中。The steps of a method or algorithm described in connection with the present disclosure may be implemented in a hardware, or may be implemented by a processor executing software instructions. The software instructions may be composed of corresponding software modules, and the software modules may be stored in a random access memory (English: Random Access Memory, RAM for short), flash memory, read only memory (English: Read Only Memory, referred to as: ROM), Erase programmable read-only memory (English: Erasable Programmable ROM, referred to as: EPROM), electrically erasable programmable read-only memory (English: Electrically EPROM, referred to as: EEPROM), registers, hard disk, mobile hard disk, read-only optical disk (referred to as : CD-ROM) or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor to enable the processor to read information from, and write information to, the storage medium. Of course, the storage medium can also be an integral part of the processor. The processor and the storage medium can be located in an ASIC. Additionally, the ASIC can be located in a core network interface device. Of course, the processor and the storage medium may also exist as discrete components in the core network interface device.
本领域技术人员应该可以意识到,在上述一个或多个示例中,本发明所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。计算机可读介质包括计算机存储介质和通信介质,其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。存储介质可以是通用或专用计算机能够存取的任何可用介质。Those skilled in the art will appreciate that in one or more examples described above, the functions described herein can be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored in a computer readable medium or transmitted as one or more instructions or code on a computer readable medium. Computer readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one location to another. A storage medium may be any available media that can be accessed by a general purpose or special purpose computer.
以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本发明的保护范围之内。 The specific embodiments of the present invention have been described in detail with reference to the preferred embodiments of the present invention. The scope of the protection, any modifications, equivalent substitutions, improvements, etc., which are made on the basis of the technical solutions of the present invention, are included in the scope of the present invention.

Claims (22)

  1. 一种扩展处理器指令集的方法,其特征在于,所述方法用于一种芯片,所述芯片包括处理器内核、监控模块和至少一个用于执行扩展指令的执行模块,所述监控模块和所述至少一个用于执行扩展指令的执行模块通过可编程逻辑来实现,所述方法包括:A method for extending a processor instruction set, the method being for a chip, the chip comprising a processor core, a monitoring module, and at least one execution module for executing an extended instruction, the monitoring module and The at least one execution module for executing the extended instruction is implemented by programmable logic, the method comprising:
    所述监控模块通过片上总线识别指令,所述指令为所述处理器内核通过所述片上总线从内存加载的指令;The monitoring module identifies an instruction by an on-chip bus, the instruction being an instruction that the processor core loads from a memory through the on-chip bus;
    所述监控模块将所述指令中的扩展指令保存至本地存储器;The monitoring module saves the extended instruction in the instruction to a local memory;
    在所述处理器内核通过所述片上总线从所述内存中加载到扩展指令之后,所述处理器内核译码扩展指令,产生未定义指令异常,所述扩展指令为已存储至所述本地存储器中的所述扩展指令;After the processor core loads from the memory to the extended instruction via the on-chip bus, the processor core decodes the extended instruction to generate an undefined instruction exception, the extended instruction being stored to the local memory The extended instruction in
    在所述处理器内核执行完当前指令之后,所述处理器内核执行异常处理程序,所述异常处理程序为所述未定义指令异常触发的程序;After the processor core executes the current instruction, the processor core executes an exception handler, and the exception handler is a program triggered by the undefined instruction exception;
    在所述处理器内核执行所述异常处理程序时,所述处理器内核暂停执行所述扩展指令之后的指令,且通过所述监控模块触发所述扩展指令对应的执行模块执行所述扩展指令;When the processor core executes the exception processing program, the processor core suspends execution of the instruction after the extended instruction, and triggers, by the monitoring module, an execution module corresponding to the extended instruction to execute the extended instruction;
    所述监控模块控制所述处理器内核退出所述异常处理程序,以便于所述处理器内核继续执行所述扩展指令之后的指令。The monitoring module controls the processor core to exit the exception handler to facilitate execution of the instructions following the extended instruction by the processor core.
  2. 根据权利要求1所述的方法,其特征在于,每个执行模块能够执行至少一条扩展指令,每条扩展指令对应一个指令地址,所述处理器内核通过所述监控模块触发所述扩展指令对应的执行模块执行所述扩展指令,包括:The method according to claim 1, wherein each execution module is capable of executing at least one extended instruction, each extended instruction corresponding to an instruction address, and the processor core triggers, by the monitoring module, the extended instruction The execution module executes the extended instruction, including:
    所述处理器内核向所述监控模块发送当所述处理器内核产生所述未定义指令异常时程序计数器中的内容,所述程序计数器中的内容为所述扩展指令的下一条指令的指令地址;The processor core sends, to the monitoring module, content in a program counter when the processor core generates the undefined instruction abnormality, and the content in the program counter is an instruction address of a next instruction of the extended instruction ;
    所述监控模块根据所述下一条指令的指令地址,确定所述扩展指令的指令地址;The monitoring module determines an instruction address of the extended instruction according to an instruction address of the next instruction;
    所述监控模块通知所述扩展指令对应的执行模块执行与所述指令地址对应的扩展指令。The monitoring module notifies the execution module corresponding to the extended instruction to execute an extended instruction corresponding to the instruction address.
  3. 根据权利要求2所述的方法,其特征在于,在所述监控模块通知所述扩展指令对应的执行模块执行与所述指令地址对应的扩展指令之前,包括:The method according to claim 2, wherein before the monitoring module notifies the execution module corresponding to the extension instruction to execute the extension instruction corresponding to the instruction address, the method includes:
    所述监控模块根据已存储至所述本地存储器的所述扩展指令,确定所述扩展指令对应的执行模块。The monitoring module determines an execution module corresponding to the extended instruction according to the extended instruction that has been stored to the local storage.
  4. 根据权利要求1至3中任意一项所述的方法,其特征在于,所述监控模块将所述指令中的扩展指令保存至本地存储器,包括:The method according to any one of claims 1 to 3, wherein the monitoring module saves the extended instruction in the instruction to the local memory, including:
    所述监控模块将所述指令中的扩展指令按照预设格式保存至所述本地存储器,所述预设格式包括所述扩展指令的指令地址、所述扩展指令的内容和所述扩展指令对应的执行模块。The monitoring module saves the extended instruction in the instruction to the local memory according to a preset format, where the preset format includes an instruction address of the extended instruction, a content of the extended instruction, and a corresponding corresponding to the extended instruction. Execution module.
  5. 根据权利要求2所述的方法,其特征在于,在所述处理器内核通过所述监控模块触发所述扩展指令对应的执行模块执行所述扩展指令之后,包括: The method according to claim 2, after the execution of the extension instruction by the execution module of the execution module by the processor module, the method includes:
    所述处理器内核暂时停留在所述异常处理程序中。The processor core temporarily stays in the exception handler.
  6. 根据权利要求5所述的方法,其特征在于,所述处理器内核暂时停留在所述异常处理程序中,包括:The method of claim 5, wherein the processor core temporarily stays in the exception handler, comprising:
    所述处理器内核访问预设的保留内存地址,所述保留内存地址不对应任何一个物理内存单元;The processor core accesses a preset reserved memory address, where the reserved memory address does not correspond to any one of the physical memory units;
    当所述监控模块向所述处理器内核发送重试响应之后,所述处理器内核再次访问所述保留内存地址。After the monitoring module sends a retry response to the processor core, the processor core accesses the reserved memory address again.
  7. 根据权利要求6所述的方法,其特征在于,所述监控模块控制所述处理器内核退出所述异常处理程序,包括:The method according to claim 6, wherein the monitoring module controls the processor core to exit the exception handling program, including:
    当所述监控模块向所述处理器内核发送正常完成响应之后,所述处理器内核退出所述异常处理程序。After the monitoring module sends a normal completion response to the processor core, the processor core exits the exception handler.
  8. 根据权利要求5所述的方法,其特征在于,在所述处理器内核译码扩展指令,产生未定义指令异常之后,包括:The method of claim 5, wherein after the processor core decodes the extended instruction to generate an undefined instruction exception, the method comprises:
    所述监控模块生成硬件信号,所述硬件信号用于控制所述处理器内核是否停留在所述异常处理程序中;The monitoring module generates a hardware signal, and the hardware signal is used to control whether the processor core stays in the exception handling program;
    所述监控模块向所述处理器内核发送所述硬件信号;The monitoring module sends the hardware signal to the processor core;
    所述处理器内核暂时停留在所述异常处理程序中,包括:The processor core temporarily stays in the exception handler, including:
    当所述硬件信号为低电平时,所述处理器内核暂时停留在所述异常处理程序中。When the hardware signal is low, the processor core temporarily stays in the exception handler.
  9. 根据权利要求8所述的方法,其特征在于,所述监控模块控制所述处理器内核退出所述异常处理程序,包括:The method according to claim 8, wherein the monitoring module controls the processor core to exit the exception handling program, including:
    当所述硬件信号为高电平时,所述处理器内核退出所述异常处理程序。The processor core exits the exception handler when the hardware signal is high.
  10. 根据权利要求7或9所述的方法,其特征在于,所述处理器内核继续执行所述扩展指令之后的指令,包括:The method according to claim 7 or 9, wherein the processor core continues to execute the instructions after the extended instruction, including:
    所述处理器内核将执行所述异常处理程序之前备份的通用寄存器、程序计数器和状态寄存器的数据恢复;The processor core recovers data of a general-purpose register, a program counter, and a status register that are backed up before execution of the exception handler;
    所述处理器内核根据所述程序计数器中存储的下一条指令地址,从所述下一条指令地址取指令;The processor core fetches an instruction from the next instruction address according to a next instruction address stored in the program counter;
    所述处理器内核执行所述下一条指令地址对应的指令。The processor core executes an instruction corresponding to the next instruction address.
  11. 根据权利要求1所述的方法,其特征在于,所述芯片还包括内存控制器,所述内存控制器用于从所述内存中读取指令和数据,所述监控模块能够设置在所述内存控制器中,或是与所述内存控制器分设于所述芯片中。The method of claim 1 wherein said chip further comprises a memory controller, said memory controller for reading instructions and data from said memory, said monitoring module being configurable in said memory control In the device, or in the same manner as the memory controller is disposed in the chip.
  12. 一种扩展处理器指令集的装置,其特征在于,所述装置用于一种芯片,所述芯片包括处理模块、监控模块和至少一个用于执行扩展指令的执行模块,所述监控模块和所述至少一个用于执行扩展指令的执行模块通过可编程逻辑来实现,所述装置包括:An apparatus for expanding a processor instruction set, wherein the apparatus is for a chip, the chip comprising a processing module, a monitoring module, and at least one execution module for executing an extended instruction, the monitoring module and the The at least one execution module for executing the extended instruction is implemented by programmable logic, and the apparatus includes:
    所述监控模块,用于通过片上总线识别指令,所述指令为所述处理模块通过所述片上总线从内存加载的指令;The monitoring module is configured to identify an instruction by an on-chip bus, where the instruction is an instruction that the processing module loads from a memory through the on-chip bus;
    所述监控模块,还用于将所述指令中的扩展指令保存至本地存储器; The monitoring module is further configured to save the extended instruction in the instruction to a local memory;
    所述处理模块,用于在所述处理模块通过所述片上总线从所述内存中加载到扩展指令之后,译码扩展指令,产生未定义指令异常,所述扩展指令为已存储至所述本地存储器中的所述扩展指令;The processing module is configured to decode an extended instruction after the processing module loads the extended instruction from the memory by using the on-chip bus, to generate an undefined instruction abnormality, where the extended instruction is stored to the local The extended instruction in the memory;
    所述处理模块,还用于在所述处理模块执行完当前指令之后,执行异常处理程序,所述异常处理程序为所述未定义指令异常触发的程序;The processing module is further configured to: after the processing module executes the current instruction, execute an exception processing program, where the exception processing program is a program triggered by the undefined instruction exception;
    所述处理模块,还用于在所述处理模块执行所述异常处理程序时,暂停执行所述扩展指令之后的指令,且通过所述监控模块触发所述扩展指令对应的执行模块执行所述扩展指令;The processing module is further configured to suspend execution of the instruction after the extended instruction when the processing module executes the exception processing program, and trigger an execution module corresponding to the extended instruction to perform the expansion by using the monitoring module instruction;
    所述监控模块,用于控制所述处理模块退出所述异常处理程序,以便于所述处理模块继续执行所述扩展指令之后的指令。The monitoring module is configured to control the processing module to exit the exception handling program, so that the processing module continues to execute the instruction after the extended instruction.
  13. 根据权利要求12所述的装置,其特征在于,每个执行模块能够执行至少一条扩展指令,每条扩展指令对应一个指令地址,所述处理模块,具体用于向所述监控模块发送当所述处理模块产生所述未定义指令异常时程序计数器中的内容,所述程序计数器中的内容为所述扩展指令的下一条指令的指令地址;The apparatus according to claim 12, wherein each execution module is capable of executing at least one extended instruction, each extended instruction corresponding to an instruction address, and the processing module is specifically configured to send to the monitoring module The processing module generates content in the program counter when the undefined instruction is abnormal, and the content in the program counter is an instruction address of a next instruction of the extended instruction;
    所述监控模块,具体用于根据所述下一条指令的指令地址,确定所述扩展指令的指令地址;The monitoring module is specifically configured to determine an instruction address of the extended instruction according to an instruction address of the next instruction;
    所述监控模块,还具体用于通知所述扩展指令对应的执行模块执行与所述指令地址对应的扩展指令。The monitoring module is further configured to notify the execution module corresponding to the extended instruction to execute an extended instruction corresponding to the instruction address.
  14. 根据权利要求13所述的装置,其特征在于,所述监控模块,还用于根据已存储至所述本地存储器的所述扩展指令,确定所述扩展指令对应的执行模块。The apparatus according to claim 13, wherein the monitoring module is further configured to determine an execution module corresponding to the extended instruction according to the extended instruction that has been stored to the local storage.
  15. 根据权利要求12至14中任意一项所述的装置,其特征在于,所述监控模块,具体用于将所述指令中的扩展指令按照预设格式保存至所述本地存储器,所述预设格式包括所述扩展指令的指令地址、所述扩展指令的内容和所述扩展指令对应的执行模块。The device according to any one of claims 12 to 14, wherein the monitoring module is configured to save the extended instruction in the instruction to the local memory according to a preset format, the preset The format includes an instruction address of the extended instruction, a content of the extended instruction, and an execution module corresponding to the extended instruction.
  16. 根据权利要求13所述的装置,其特征在于,所述处理模块,还用于暂时停留在所述异常处理程序中。The apparatus according to claim 13, wherein said processing module is further configured to temporarily stay in said exception handling program.
  17. 根据权利要求16所述的装置,其特征在于,所述处理模块,具体用于访问预设的保留内存地址,所述保留内存地址不对应任何一个物理内存单元;The device according to claim 16, wherein the processing module is configured to access a preset reserved memory address, where the reserved memory address does not correspond to any one of the physical memory units;
    当所述监控模块向所述处理模块发送重试响应之后,再次访问所述保留内存地址。After the monitoring module sends a retry response to the processing module, the reserved memory address is accessed again.
  18. 根据权利要求17所述的装置,其特征在于,所述处理模块,还用于当所述监控模块向所述处理模块发送正常完成响应之后,退出所述异常处理程序。The device according to claim 17, wherein the processing module is further configured to exit the exception handling program after the monitoring module sends a normal completion response to the processing module.
  19. 根据权利要求16所述的装置,其特征在于,所述监控模块,还用于生成硬件信号,所述硬件信号用于控制所述处理模块是否停留在所述异常处理程序中; The apparatus according to claim 16, wherein the monitoring module is further configured to generate a hardware signal, where the hardware signal is used to control whether the processing module stays in the exception processing program;
    所述监控模块,还用于向所述处理模块发送所述硬件信号;The monitoring module is further configured to send the hardware signal to the processing module;
    所述处理模块,具体用于当所述硬件信号为低电平时,暂时停留在所述异常处理程序中。The processing module is specifically configured to temporarily stay in the exception handling program when the hardware signal is low.
  20. 根据权利要求19所述的装置,其特征在于,所述处理模块,还用于当所述硬件信号为高电平时,退出所述异常处理程序。The device according to claim 19, wherein the processing module is further configured to exit the exception handling program when the hardware signal is high.
  21. 根据权利要求18或20所述的装置,其特征在于,所述处理模块,具体用于将执行所述异常处理程序之前备份的通用寄存器、程序计数器和状态寄存器的数据恢复;The device according to claim 18 or 20, wherein the processing module is specifically configured to recover data of a general-purpose register, a program counter, and a status register that are backed up before the execution of the exception processing program;
    根据所述程序计数器中存储的下一条指令地址,从所述下一条指令地址取指令;Obtaining an instruction from the next instruction address according to a next instruction address stored in the program counter;
    执行所述下一条指令地址对应的指令。Execute the instruction corresponding to the next instruction address.
  22. 根据权利要求12所述的装置,其特征在于,所述芯片还包括内存控制器,所述内存控制器用于从所述内存中读取指令和数据,所述监控模块能够设置在所述内存控制器中,或是与所述内存控制器分设于所述芯片中。 The apparatus according to claim 12, wherein said chip further comprises a memory controller, said memory controller is operative to read instructions and data from said memory, said monitoring module being configurable in said memory control In the device, or in the same manner as the memory controller is disposed in the chip.
PCT/CN2017/071776 2016-08-30 2017-01-19 Method and device for extending processor instruction set WO2018040494A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610777425.2A CN106371807B (en) 2016-08-30 2016-08-30 A kind of method and device of extensible processor instruction set
CN201610777425.2 2016-08-30

Publications (1)

Publication Number Publication Date
WO2018040494A1 true WO2018040494A1 (en) 2018-03-08

Family

ID=57902478

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/071776 WO2018040494A1 (en) 2016-08-30 2017-01-19 Method and device for extending processor instruction set

Country Status (2)

Country Link
CN (1) CN106371807B (en)
WO (1) WO2018040494A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112131164A (en) * 2020-09-23 2020-12-25 山东云海国创云计算装备产业创新中心有限公司 Data scheduling method and device applied to acceleration board card, acceleration board card and medium

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102016211386A1 (en) * 2016-06-14 2017-12-14 Robert Bosch Gmbh Method for operating a computing unit
CN108415729A (en) * 2017-12-29 2018-08-17 北京智芯微电子科技有限公司 A kind of processing method and processing device of cpu instruction exception
CN108897706B (en) * 2018-05-10 2021-07-23 北京融芯微科技有限公司 Accelerator interface
CN109144572B (en) * 2018-08-06 2021-03-30 龙芯中科技术股份有限公司 Instruction execution method and processor
CN109189475B (en) * 2018-08-16 2022-06-10 北京算能科技有限公司 Method for constructing instruction set of programmable artificial intelligence accelerator
CN109918292B (en) * 2019-01-28 2020-09-11 中国科学院信息工程研究所 Processor instruction set testing method and device
CN111966624B (en) * 2020-07-16 2022-02-15 芯发威达电子(上海)有限公司 PCIe expansion method, system and storage medium thereof
CN112015490A (en) * 2020-11-02 2020-12-01 鹏城实验室 Method, apparatus and medium for programmable device implementing and testing reduced instruction set
CN113220225B (en) * 2021-04-06 2022-04-12 浙江大学 Memory data read-write method and device for RISC-V processor, processor and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1140857A (en) * 1995-04-28 1997-01-22 松下电器产业株式会社 Information processing device equipped with a coprocessor which efficiently uses register data in main processor
CN1573683A (en) * 2003-06-04 2005-02-02 株式会社东芝 Processor and semiconductor integrated circuit
CN101299199A (en) * 2008-06-26 2008-11-05 上海交通大学 Heterogeneous multi-core system based on configurable processor and instruction set extension
CN103207771A (en) * 2013-03-19 2013-07-17 浙江中控研究院有限公司 IP (Intelligent property) core of PLC (programmable logic controller) program performer
CN104011684A (en) * 2011-12-22 2014-08-27 英特尔公司 Interrupt return instruction with embedded interrupt service functionality

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2001245833A1 (en) * 2000-03-27 2001-10-08 Infineon Technologies, Ag Method and apparatus for adding user-defined execution units to a processor using configurable long instruction word (cliw)
US7200735B2 (en) * 2002-04-10 2007-04-03 Tensilica, Inc. High-performance hybrid processor with configurable execution units
US20070150702A1 (en) * 2005-12-23 2007-06-28 Verheyen Henry T Processor
US9582287B2 (en) * 2012-09-27 2017-02-28 Intel Corporation Processor having multiple cores, shared core extension logic, and shared core extension utilization instructions

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1140857A (en) * 1995-04-28 1997-01-22 松下电器产业株式会社 Information processing device equipped with a coprocessor which efficiently uses register data in main processor
CN1573683A (en) * 2003-06-04 2005-02-02 株式会社东芝 Processor and semiconductor integrated circuit
CN101299199A (en) * 2008-06-26 2008-11-05 上海交通大学 Heterogeneous multi-core system based on configurable processor and instruction set extension
CN104011684A (en) * 2011-12-22 2014-08-27 英特尔公司 Interrupt return instruction with embedded interrupt service functionality
CN103207771A (en) * 2013-03-19 2013-07-17 浙江中控研究院有限公司 IP (Intelligent property) core of PLC (programmable logic controller) program performer

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112131164A (en) * 2020-09-23 2020-12-25 山东云海国创云计算装备产业创新中心有限公司 Data scheduling method and device applied to acceleration board card, acceleration board card and medium
CN112131164B (en) * 2020-09-23 2022-06-17 山东云海国创云计算装备产业创新中心有限公司 Data scheduling method and device applied to acceleration board card, acceleration board card and medium

Also Published As

Publication number Publication date
CN106371807B (en) 2019-03-19
CN106371807A (en) 2017-02-01

Similar Documents

Publication Publication Date Title
WO2018040494A1 (en) Method and device for extending processor instruction set
CN107122321B (en) Hardware repair method, hardware repair system, and computer-readable storage device
US8898517B2 (en) Handling a failed processor of a multiprocessor information handling system
US20160103743A1 (en) Methods and apparatus for recovering errors with an inter-processor communication link between independently operable processors
TW445416B (en) Upgrade card for a computer system and method of operating the same
US20180322016A1 (en) System and method to capture stored data following system crash
US10725848B2 (en) Supporting hang detection and data recovery in microprocessor systems
US8782643B2 (en) Device and method for controlling communication between BIOS and BMC
KR20120061938A (en) Providing state storage in a processor for system management mode
JP2017527902A (en) Avoid early enablement of non-maskable interrupts when returning from exceptions
EP3360044B1 (en) System and method for providing operating system independent error control in a computing device
JP2006209448A (en) Direct memory access control method, direct memory access controller, information processing system, and program
US20240256263A1 (en) Application Upgrade Method and Apparatus, Computing Device, and Chip System
US20190004818A1 (en) Method of UEFI Shell for Supporting Power Saving Mode and Computer System thereof
US20210081234A1 (en) System and Method for Handling High Priority Management Interrupts
US8181063B2 (en) Computer device, continuing operation method for computer device, and program
US9747114B2 (en) Information processing apparatus, boot up method, and computer-readable storage medium storing boot up program
US6772266B2 (en) Detecting transfer of universal serial bus (USB) host controller information from operating system drivers to basic input output system (BIOS)
US6968410B2 (en) Multi-threaded processing of system management interrupts
JPS61182160A (en) Data processing device
US20160292108A1 (en) Information processing device, control program for information processing device, and control method for information processing device
CN105556461A (en) Techniques for pre-OS image rewriting to provide cross-architecture support, security introspection, and performance optimization
JPS63279328A (en) Control system for guest execution of virtual computer system
CN115576734B (en) Multi-core heterogeneous log storage method and system
US20180089012A1 (en) Information processing apparatus for analyzing hardware failure

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17844805

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17844805

Country of ref document: EP

Kind code of ref document: A1