WO2021128249A1 - Processor, task response method, movable platform, and camera - Google Patents

Processor, task response method, movable platform, and camera Download PDF

Info

Publication number
WO2021128249A1
WO2021128249A1 PCT/CN2019/129100 CN2019129100W WO2021128249A1 WO 2021128249 A1 WO2021128249 A1 WO 2021128249A1 CN 2019129100 W CN2019129100 W CN 2019129100W WO 2021128249 A1 WO2021128249 A1 WO 2021128249A1
Authority
WO
WIPO (PCT)
Prior art keywords
processor
response
dedicated
task
module
Prior art date
Application number
PCT/CN2019/129100
Other languages
French (fr)
Chinese (zh)
Inventor
雍振强
董岚
杨富强
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to CN201980050197.0A priority Critical patent/CN112513809A/en
Priority to PCT/CN2019/129100 priority patent/WO2021128249A1/en
Publication of WO2021128249A1 publication Critical patent/WO2021128249A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/30105Register structure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file

Definitions

  • This application relates to the field of computer technology, and in particular to a processor, a task response method, a movable platform, a camera, and a computer-readable storage medium.
  • CPU Center Process Unit
  • co-processor units or acceleration processing units with specific functions hereinafter collectively referred to as dedicated modules
  • dedicated modules co-processor units or acceleration processing units with specific functions
  • embodiments of the present invention provide a processor, a task response method, a movable platform, a camera, and a computer-readable storage medium.
  • a processor including a bus interface and a control register.
  • the bus interface can be coupled to a plurality of dedicated modules outside the processor, and is used to receive a plurality of task completion requests related to the plurality of dedicated modules.
  • the control register is used to store a plurality of response modes of the processor in response to a plurality of the dedicated modules; wherein, a plurality of the response modes are different.
  • the processor is coupled to the control register, and is configured to respond to the multiple task completion requests of the multiple dedicated modules according to the corresponding response mode in the control register.
  • a task response method including: receiving multiple task completion requests corresponding to multiple dedicated modules coupled to a bus interface and located outside the processor; and obtaining the processor Respond to multiple response modes of multiple dedicated modules, and store multiple response modes in a control register; wherein multiple response modes are different; and according to the corresponding response in the control register Way, responding to a plurality of said task completion requests of a plurality of said dedicated modules.
  • a computer-readable storage medium storing computer instructions.
  • the one or more processors When the computer instructions are executed by one or more processors, the one or more processors perform actions including the following: receiving multiple dedicated modules that are coupled to the bus interface and located outside the processor. Task completion request; acquiring multiple response modes of the processor in response to multiple dedicated modules, and storing multiple response modes in a control register; wherein a plurality of the response modes are different; and according to the The corresponding response mode in the control register responds to a plurality of the task completion requests.
  • a movable platform includes the processor mentioned in the first aspect of the embodiments of the present application.
  • a camera is provided.
  • the camera includes the processor mentioned in the first aspect of the embodiments of the present application.
  • the processor, task response method, movable platform, camera, and computer-readable storage medium according to the embodiments of the present invention can improve system efficiency.
  • Fig. 1 is a schematic diagram of a processor and a dedicated module according to an embodiment of the present invention
  • FIG. 2 is a flowchart of the interrupt response mode of the processor shown in FIG. 1;
  • FIG. 3 is a schematic diagram of a processor and multiple dedicated modules according to another embodiment of the present invention.
  • FIG. 4 is a schematic diagram of the configuration of the control register shown in FIG. 1;
  • FIG. 5 is a schematic diagram of a query instruction of the processor shown in FIG. 1;
  • Fig. 6a is a flowchart of the processor shown in Fig. 1 responding to a dedicated module in a query mode according to an embodiment of the present invention
  • Fig. 6b is a flowchart of the processor shown in Fig. 1 responding to the dedicated module in the query mode according to an embodiment of the present invention
  • Fig. 7a is a schematic diagram of the processor shown in Fig. 1 using a pseudo code of a query instruction according to an embodiment of the present invention
  • Fig. 7b is a schematic diagram of the processor shown in Fig. 1 using a pseudo code of a query instruction according to another embodiment of the present invention
  • FIG. 8 is a schematic diagram of a detection unit in the processor shown in FIG. 1;
  • FIG. 9 is a schematic diagram of a detection module in the processor shown in FIG. 1;
  • Fig. 10 is a flowchart of a task response method according to an embodiment of the present invention.
  • CPU Center Process Unit
  • co-processor units or acceleration processing units with specific functions hereinafter collectively referred to as dedicated modules
  • Fig. 1 is a schematic diagram of a processor and a dedicated module according to an embodiment of the present invention.
  • the processor 106 (for example, CPU) exchanges data with the memory 102 through the system bus 104.
  • the processor 106 performs data interaction with various special modules through the internal bus 110. That is to say, structurally, these dedicated modules are connected to the processor through the internal bus 110, so as to receive instructions or configuration signals from the processor. Furthermore, these dedicated modules can send a completion request to the processor after completing a specific task.
  • these dedicated modules are also connected to each other through the internal bus 110.
  • these dedicated modules can be a graphics processing module (Graphic Process Unit, GPU) 120, a vector computing module (Vector Process Unit) 112, a floating point processing module (Floatpoint Process Unit, FPU) 114, and direct memory access A module (Direct Memory Access, DMA) 108 or a similar dedicated processing unit capable of processing a special computing task.
  • graphics processing module Graphic Process Unit, GPU
  • Vector Process Unit Vector Process Unit
  • FPU floating point processing module
  • DMA Direct Memory Access
  • AI artificial intelligence
  • FFT fast Fourier transform
  • FFT fast Fourier transform
  • the present invention is not limited to this, and may also include dedicated modules for realizing other functions.
  • a group of DMA 108 is added between the processor 106 and the memory 102 as a dedicated module.
  • the processor 106 sends configuration information to the DMA 108 to configure the configuration registers in the DMA 108, and informs the DMA 108 of information such as the address where the data needs to be transferred from the memory 102 and the size of the data to be transferred.
  • the DMA 108 transfers data to the memory 102 through the system bus 104 according to the configuration information of the processor 106.
  • the processor 106 does not intervene in the process of transferring data by the DMA 108, so other tasks can be processed.
  • the DMA 108 When the DMA 108 completes the task of transferring data, the DMA 108 will issue an interrupt request to the processor 106. After the processor 106 receives the interrupt request sent by the DMA 108, it stops other tasks being processed, and responds to the interrupt request sent by the DMA 108. In the interrupt service function corresponding to the interrupt request, the processor 106 processes the memory data transported back by the DMA 108, and continues to execute the interrupted task with the result and status of this operation.
  • FIG. 2 is a flowchart of the interrupt response mode of the processor shown in Fig. 1.
  • the processor for example, CPU
  • the dedicated module receives the configuration signal from the processor and configures the configuration register in the dedicated module, and the dedicated module receives other instructions to perform operations related to these instructions.
  • the processor sends a configuration signal to the dedicated module (for example, DMA) for configuring the configuration register in the dedicated module.
  • step S204 the processor performs other tasks.
  • the dedicated module starts to work after configuring the configuration register. For example, the dedicated module starts to perform operations or processing related to configuration signals or other instructions.
  • step S206 although the processor is executing other tasks, once the processor receives the task completion request sent by the dedicated module, other tasks being executed by the processor are interrupted. At this time, the processor enters the interrupt service routine.
  • step S208 after the interrupt service routine is executed, the processor continues to execute other tasks that were previously executed and were interrupted.
  • step S210 after executing the previously executed and interrupted task, the processor enters an idle state or executes a new task.
  • the dedicated module completes the task, if there is no dependency between the task processing result of the dedicated module and the task currently processed by the processor, then it does not make any sense to interrupt the task being processed by the processor.
  • DMA to transfer data
  • the processor responds to the interrupt request of the DMA at this time, which is meaningless for the task being processed by the processor.
  • responding to the interrupt request of the dedicated module will reduce the real-time performance of the task being processed by the processor.
  • the embodiments of the present application provide a processor, a task response method, a movable platform, a camera, and a computer-readable storage medium, which can improve system efficiency.
  • the embodiments of the present application provide a processor, a task response method, and a computer-readable storage medium, which are applicable to any system with an acceleration unit or a co-processing unit.
  • the processor and the task response method can be applied to devices with networking, such as mobile phones, computers, personal tablets, or PDAs (Personal Digital Assistant, personal digital assistants).
  • the processor and task response method can also be applied to mobile platforms (such as unmanned aerial vehicles, unmanned vehicles or unmanned ships), cameras, sweeping robots, or smart speakers and other equipment.
  • a processor which includes a bus interface and a control register.
  • the bus interface can be coupled to multiple dedicated modules outside the processor, and is used to receive multiple task completion requests related to multiple dedicated modules; and a control register is used to store the processor in response to multiple dedicated modules.
  • Response mode wherein multiple response modes are different; wherein, the processor responds to multiple task completion requests of multiple dedicated modules according to the corresponding response mode in the control register.
  • a movable platform includes the above-mentioned processor.
  • a camera is provided.
  • the camera includes the above-mentioned processor.
  • the processor may also further include a controller, which is coupled to the control register, for responding to a plurality of the dedicated modules according to the corresponding response mode in the control register A number of said task completion requests.
  • Fig. 3 is a schematic diagram of a processor and multiple dedicated modules according to another embodiment of the present invention.
  • the processor 300 includes a controller 302, a control register 304, and a bus interface 306.
  • the bus interface 306 can be coupled to multiple dedicated modules outside the processor 300 for receiving multiple task completion requests related to these dedicated modules.
  • the control register 304 is used to store multiple response modes of the processor 300 in response to these dedicated modules. These multiple response modes are different from each other.
  • the controller 302 is coupled to the control register 304, and is configured to respond to multiple task completion requests of these dedicated modules according to the corresponding response mode in the control register 304.
  • a processor 300 (for example, a CPU) is connected to N specific function units (SFU) through an internal bus.
  • the system numbers N dedicated modules and ensures that the numbers of all dedicated modules are not repeated.
  • the N dedicated modules are dedicated module SFU 1 308, dedicated module SFU 2 310, dedicated module SFU 3 312... and dedicated module SFU N 314. It can be seen from FIG. 3 that in this embodiment, there is no need to change the connection mode of the internal bus.
  • the system can classify N dedicated modules. For example, the special modules are classified according to their functions and uses, and the special modules are numbered according to different classification results. For example, if there are two dedicated modules with different functions, the dedicated modules with different functions are numbered with different codes.
  • the dedicated modules A1, A2, and A3 are encoded as 000, 001, 010, and the dedicated modules B1, B2 and B3 are coded as 100, 101, 110. That is, 4 bits are used to encode the dedicated modules A1, A2, and A3 and the dedicated modules B1, B2, and B3.
  • the highest bit of the number of the dedicated module is "0" it means that the dedicated module is a dedicated module with A function.
  • the highest bit of the dedicated module number is "1”
  • the highest bit of the number of the dedicated module is "1"
  • the dedicated module is a dedicated module with A function.
  • the highest bit of the dedicated module number is "0”
  • the dedicated module is a dedicated module with B function.
  • the highest bit values of dedicated modules with different functions are different. For example, if there are multiple graphics processing modules and multiple artificial intelligence accelerators, the highest bit of the multiple image processing modules is coded as "0”, and the highest bit of the multiple artificial intelligence accelerators is coded as "1".
  • the present invention is not limited to this. Other coding methods that can realize the coding and classification of dedicated modules all fall into the protection scope of the present invention.
  • the processor responds to the N dedicated modules in multiple response modes.
  • multiple response modes include interrupt mode and query mode.
  • the processor for example, CPU
  • the processor checks whether the dedicated module has completed the task by executing the query instruction. If the dedicated module has not completed the task, the processor enters the suspended state; if it is detected that the dedicated module has completed the task, the processor can execute subsequent instructions. Since the query instruction executed by the processor is determined according to the algorithm or system requirements, it can be ensured that the processor is in an idle state when the query instruction is executed, or the processor urgently needs to use the calculation result of the dedicated module at this time.
  • a set of control registers (CR) 304 is added, and the bit width of the control register 304 is N bits.
  • Each bit on the control register 304 has a one-to-one correspondence with N dedicated modules. That is, each bit on the control register 304 corresponds to the dedicated module SFU 1 308, the dedicated module SFU 2 310, the dedicated module SFU 3 312... and the dedicated module SFU N 314, respectively.
  • the control register 304 includes the first bit.
  • the first bit is used to store the first dedicated module (eg, dedicated module SFU 1 308, dedicated module SFU 2 310, dedicated module SFU 3 312) among a plurality of dedicated modules (eg, N dedicated modules) corresponding to the processor 300 response. ...And the first response mode of one of the dedicated modules SFU N 314).
  • the first response mode is the interrupt mode; when the first bit is "1", the first response mode is the query mode.
  • the first bit is "1", the first response mode is an interrupt mode; when the first bit is "0", the first response mode is a query mode.
  • the bit values of the bits corresponding to the two response modes are different.
  • each bit of the control register 304 can be configured as a logic 1 or a logic 0.
  • the response mode of the processor 300 in response to the corresponding dedicated module is the query mode. In other words, the bit is configured in the query mode at this time.
  • the response mode of the processor 300 in response to the corresponding dedicated module is an interrupt mode. In other words, the bit is configured in interrupt mode at this time.
  • the response mode of the processor 300 in response to the corresponding dedicated module is the query mode. In other words, the bit is configured in the query mode at this time.
  • the response mode of the processor 300 in response to the corresponding dedicated module is the interrupt mode. In other words, the bit is configured in interrupt mode at this time.
  • the interrupt mode and the query mode can coexist in N dedicated modules.
  • the control register 304 it is necessary to configure the bits corresponding to each dedicated module at the corresponding logical value.
  • the processor 300 will determine the response mode taken by the corresponding special module among the N special modules according to the logic value configured by each bit in the control register 304.
  • the response mode of the processor in response to the corresponding dedicated module is query mode
  • the response mode of the processor in response to the corresponding dedicated module is interrupt mode .
  • the special module SFU 1 The bits corresponding to 308 and the dedicated module SFU 3 312 are configured as logic 1, and the bits corresponding to the dedicated module SFU 2 310 and the dedicated module SFU N 314 are configured as logic 0.
  • the response mode of the processor in response to the corresponding dedicated module is query mode, and when the bit of the control register is configured as logic 1, the processor responds to the corresponding dedicated module.
  • the response mode of the module is interrupt mode.
  • the special module SFU 1 If the response mode of the processor 300 in response to the special module SFU 1 308 and the special module SFU 3 312 is query mode, and the response mode of the processor 300 in response to the special module SFU 2 310 and the special module SFU N 314 is interrupt mode, the special module SFU 1
  • the bits corresponding to 308 and the dedicated module SFU 3 312 are configured as logic 0, and the bits corresponding to the dedicated module SFU 2 310 and the dedicated module SFU N 314 are configured as logic 1.
  • Fig. 4 is a schematic diagram of the control register configuration shown in Fig. 1.
  • the dedicated module SFU 1 420, the dedicated module SFU 2 422 and the dedicated module SFU 4 426 correspond to the bits in the control register being bit 402, bit 404, and bit 408, respectively, and bit 402, bit 404, bit The logical value of 408 is 1.
  • the dedicated module SFU 3 424 and the dedicated module SFU 5 428 correspond to the bits in the control register being bit 406 and bit 410, respectively, and the logical values of bit 406 and bit 410 are both 0.
  • the processor responds to the response of the dedicated module SFU 1 420, the dedicated module SFU 2 422 and the dedicated module SFU 4 426
  • the modes are all query modes, and the response modes of the processor's response to the dedicated module SFU 3 424 and the dedicated module SFU 5 428 are all interrupt modes.
  • the processor interacts with the special module SFU 1 420, the special module SFU 2 422, and the special module SFU 4 426 in the query mode, and the processor interacts with the special module SFU 3 424 and the special module SFU 5 428 in the interrupt mode.
  • the processor responds to the dedicated module SFU 1 420, the dedicated module SFU 2 422 and the dedicated module SFU 4 426.
  • the response mode of the processor is interrupt mode
  • the response mode of the processor response to the dedicated module SFU 3 424 and the dedicated module SFU 5 428 is the query mode. That is to say, the processor interacts with the special module SFU 1 420, the special module SFU 2 422, and the special module SFU 4 426 through the interrupt mode
  • the processor interacts with the special module SFU 3 424 and the special module SFU 5 428 through the query mode. Therefore, configuring the response mode of the CPU to respond to the N dedicated modules through the control register is the prerequisite and basis for ensuring that the system has a flexible task response mode.
  • control register may not only include bits indicating the response mode of the processor in response to the dedicated module, and the control register may also include some other bits to implement other functions of the processor in response to the dedicated module.
  • control processor may include a plurality of bits, where these bits are used to store a response time corresponding to the processor's request for completion of at least one of the plurality of task completion requests.
  • the response time is a preset time.
  • the response time can be adjusted in real time according to the task completion status and task priority of the processor. For example, if the task currently processed by the processor is the highest priority task, even if the processor responds to a dedicated module in an interrupt mode, the controller will not immediately respond to the task completion request for this dedicated module. Dedicated module. Instead, it delays the response to the task completion request.
  • the delay time is set according to the completion time of the current processing task estimated by the system. In other words, the delay time can be adjusted in real time according to the completion time of the task. In this embodiment, the flexibility of the processor is higher.
  • the processor can not only ensure that the task with the highest optimal level is processed in time, but also ensure the efficiency of the processor in responding to the dedicated module.
  • the response time can also be preset in advance. That is, the response time is the preset time. In one embodiment, the response time is preset according to the longest time required to complete a task. It should be noted that the above embodiments are only used to explain the present invention, but not to limit the present invention.
  • the controller in the processor will not receive information about this dedicated module. After the task of the module completes the request, it responds to the dedicated module immediately.
  • the controller in the processor delays in responding to the task completion request. Among them, the delay time is the response time.
  • the response time can be a preset response time.
  • the controller in the processor receives the task completion request for this dedicated module. Respond to the dedicated module immediately. At this time, the delay time and response time are both 0.
  • the time point for responding to the dedicated module can be determined according to the completion of the task currently processed by the processor. In this way, the task currently processed by the processor and the task completion request of the dedicated module can be taken into consideration.
  • the control register includes some other bits. Among them, these bits are used to store the response conditions corresponding to the processor's request for completion of multiple tasks.
  • the response mode of the processor in response to the special module is interrupt mode, if the response condition is met, the controller responds to the task completion request of the special module.
  • the response condition is that the processor receives the task completion request of the dedicated module from the dedicated module through the bus interface and the processor has processed the currently executing task.
  • this implementation manner it can be ensured that the completion of the task currently processed by the processor is not disturbed by the task completion request of the dedicated module. This implementation is suitable for situations where there is no correlation between any task currently processed by the processor and the task completed by the dedicated module.
  • the controller sends at least one query instruction on the dedicated module to determine whether the dedicated module has issued a task completion request.
  • the at least one query command includes at least one of the command type, the code of the dedicated module number to be queried, the first address of the dedicated module, the number of dedicated modules, and the scalable flag; wherein the scalable flag is used for Indicates whether the size of at least one query command is scalable.
  • the scalable flag indicates whether the size of the query command can be correspondingly changed according to the number of dedicated modules corresponding to the query command.
  • Fig. 5 is a schematic diagram of the query instruction of the processor shown in Fig. 1. In the encoding method 500, the query instruction only includes a dedicated module number numi with query.
  • the query instruction can encode more dedicated module numbers (for example, the dedicated module numbers numi, numj, numk, nump, numq) at one time.
  • the encoding method 502 can make full use of the reserved space (reserved) in the instruction encoding.
  • the query instruction itself requires 7 bits to encode the instruction type (represented by "opcode" in FIG. 5)
  • the query instruction can encode the IDs of up to 5 dedicated modules.
  • the first address of the dedicated module to be queried represented by "start_id" in FIG.
  • the maximum number of the dedicated module to be queried (for example, the address of the dedicated module) that can be supported in the encoding method 504 is start_id, start_id+1, start_id+2, ..., start_id+length.
  • the processor for example, CPU
  • the present invention is not limited to this, and it is also possible to use only any one of the above-mentioned three query instruction encoding methods, or any combination of the above-mentioned three query instruction encoding methods, to increase the flexibility of using the query instruction.
  • Fig. 6a is a flowchart of the processor shown in Fig. 1 responding to a dedicated module in a query mode according to an embodiment of the present invention.
  • Fig. 6b is a flowchart of the processor shown in Fig. 1 responding to the dedicated module in the query mode according to an embodiment of the present invention.
  • the processor dynamically schedules to determine when to insert query instructions based on algorithm requirements, the idle state of the CPU, and the degree of algorithm execution. For example, when the task being executed by the processor requires the processing result of the task executed by the first dedicated module, the controller issues the at least one query instruction. Or, when the processor is in an idle state, the controller issues the at least one query instruction.
  • the processor After the processor allocates tasks to multiple dedicated modules, it can perform other tasks first.
  • the processor for example, CPU
  • configures a dedicated module for example, DMA or other dedicated module
  • the dedicated module receives the configuration signal from the processor and configures the configuration register in the dedicated module, and the dedicated module receives other instructions to perform operations related to these instructions.
  • the processor sends a configuration signal to a dedicated module (e.g., DMA) for configuring the configuration register in the dedicated module.
  • a dedicated module e.g., DMA
  • step S604 the processor performs other tasks.
  • the dedicated module starts to work after configuring the configuration register. For example, the dedicated module starts to perform operations or processing related to configuration signals or other instructions.
  • the dedicated module sends a task completion request to the processor.
  • step S606 the processor continues to execute other tasks until the processor completes the task being executed and enters an idle state.
  • step S608 after entering the idle state, the processor executes the query instruction.
  • the processor detects a completion request issued by the dedicated processing module, the processor processes the task completed by the dedicated module.
  • step S610 after processing the tasks completed by the dedicated module, the processor executes a new task.
  • the processor after processing the tasks completed by the dedicated module, the processor enters an execution idle state.
  • the processor configures a dedicated module (for example, DMA), and sends configuration signals or other instructions related to the dedicated module to the dedicated module.
  • a dedicated module for example, DMA
  • the dedicated module receives the configuration signal from the processor and configures the configuration register in the dedicated module, and the dedicated module receives other instructions to perform operations related to these instructions.
  • the processor sends a configuration signal to the dedicated module (for example, DMA) for configuring the configuration register in the dedicated module.
  • step S634 the processor performs other tasks.
  • the dedicated module starts to work after configuring the configuration register.
  • the dedicated module starts to perform operations or processing related to configuration signals or other instructions.
  • step S636 if the processor finds that the task being executed requires the calculation result of the dedicated module, the processor executes the query instruction, suspends the execution of the task currently being executed, and waits for the task completion request of the dedicated module. As shown in step S654, after completing the operation or processing related to the configuration signal or other instructions sent by the processor, the dedicated module sends a task completion request to the processor.
  • step S638 after receiving the task completion request issued by the dedicated module, the processor continues to execute other tasks in step S634 (that is, the task that was suspended in step S636). In one embodiment, the processor enters an idle state after executing the above-mentioned tasks.
  • step S640 the processor executes a new task.
  • the dedicated module is one of a graphics processing module, a vector calculation module, a floating point processing module, a direct memory access module, an artificial intelligence accelerator, and a fast Fourier transform module.
  • other dedicated modules that interact with the processor can also be used.
  • the dedicated module when the query instruction is used, the dedicated module has completed the task, so the processor (for example, the CPU) will not enter the suspended state.
  • the processor for example, the CPU
  • the processor needs to use the calculation results of the dedicated module in other current tasks. Therefore, the processor executes the query instruction and waits for the task completion request of the dedicated module.
  • the processor executes the query instruction only when the processor enters the idle state after completing other tasks, or the processor urgently needs the calculation result of the dedicated module. Therefore, the use of query mode can fully improve the execution efficiency of the system.
  • the processor and the dedicated module cooperate to complete the same task, if the response mode of the processor in response to the dedicated module is the query mode, the time point at which the processor needs to use the processing result of the first dedicated module is estimated, And according to the time point, the time point at which the controller sends at least one query instruction on the first dedicated module is determined, so as to reduce the waiting time of the processor. Moreover, this method can improve the computational efficiency of the processor.
  • the processor may further include a process status register for marking the process status of the process processed by the processor. Further, according to the marked process status of the process status register, the controller dynamically adjusts the response mode of the processor in response to multiple dedicated modules, and sends the adjusted response mode to the control register. In this embodiment, the processor can adjust the response mode of the processor and multiple dedicated modules in real time.
  • the value of the flag status register is set to "1".
  • the processor does not respond to the response mode of any dedicated module.
  • the above scheme can also be realized by using a logical value "0". For example, if the task currently processed by the processor is the task with the highest priority, or the task currently processed by the processor is higher than the priority of multiple dedicated modules, the value of the flag status register is set to "0". The processor does not respond to the response mode of any dedicated module.
  • the value of the flag status register is set to "1" if the task currently processed by the processor is the task with the lowest priority, or the task currently processed by the processor is lower than the priority of multiple dedicated modules.
  • the processor immediately responds to the response mode of the dedicated module.
  • the above scheme can also be realized by using a logical value "0". If the task currently processed by the processor is the task with the lowest priority, or the task currently processed by the processor is lower than the priority of multiple dedicated modules, the value of the flag status register is set to "0". As long as the processor receives a complete task request from the dedicated module, the processor immediately responds to the response mode of the dedicated module.
  • Fig. 7a is a schematic diagram of the processor shown in Fig. 1 using a pseudo code of a query instruction according to an embodiment of the present invention.
  • Fig. 7b is a schematic diagram of the processor shown in Fig. 1 using a pseudo code of a query instruction according to another embodiment of the present invention.
  • the pseudo code 700 is a non-continuous execution or a single execution query instruction.
  • the dedicated module i to the dedicated module k are configured first.
  • code line 708 the processor performs other tasks.
  • code line 710 the processor executes query instruction 1.
  • the processor After executing the code line 710, the processor executes some other codes (not shown). In code line 712, the processor executes query instruction 2. After executing the code line 712, the processor executes some other codes (shown in the figure). In code line 714, the processor executes other instructions.
  • the pseudo code 720 is to execute multiple query instructions continuously.
  • the dedicated module i to the dedicated module k are configured first.
  • the processor performs other tasks.
  • the processor executes query instruction 1.
  • the processor executes the code line 732.
  • the processor executes query instruction 2.
  • the processor executes some other codes (shown in the figure).
  • the processor executes other instructions. It can be seen from the above that query instruction 1 and query instruction 2 are two query instructions that are executed continuously. In one embodiment, since the processor continuously uses multiple query instructions, the processor can execute other subsequent instructions only after all dedicated modules specified by the query instructions report the task completion request.
  • the same query command can be used for multiple dedicated modules.
  • this query instruction it is not necessary to indicate the ID of the dedicated module (for example, the dedicated module code), address information, or other attribute information.
  • the same query command can also be used for multiple dedicated modules of the same type. At this time, it is necessary to add identification information about the same type of dedicated module in the query command. For example, the codes of these dedicated modules, or attribute flags.
  • the processor executes the query instruction, it records the number of the dedicated module to be queried in the pipeline processing, and uses an independent register in the processor to record whether the dedicated module to be queried has issued a task completion request.
  • the processor can be divided into three-stage pipelines of fetching, decoding, and execution. After the query instruction is streamed to the execution level, the number of the dedicated module to be queried will be recorded on the execution level. Different encoding methods of query instructions require different methods of recording dedicated modules.
  • the processor further includes one or more detection units.
  • the number of detection units can be determined according to different hardware design schemes.
  • the processor includes a detection module.
  • the detection module includes at least one detection unit.
  • the detection unit includes a comparator and a data register.
  • the data register is used to store the first dedicated module number corresponding to the first dedicated module among the plurality of dedicated modules; and when the detecting unit receives the second task completion request, the detecting unit determines the second dedicated module number of the second task request, The second dedicated module number is compared with the first dedicated module number in the data register through a comparator to determine the dedicated module corresponding to the second task completion request.
  • the detection unit further includes a result register. The detection unit stores multiple comparison results in the result register. When the controller sends the query command, the controller responds to the second task request according to the value in the result register.
  • the detection module further includes an AND gate for receiving a plurality of first comparison results from a plurality of first detection units, and the AND gate performs an AND operation on a plurality of the first comparison results to determine a plurality of Whether the dedicated modules all issue corresponding task completion requests respectively.
  • Fig. 8 is a schematic diagram of a detection unit in the processor shown in Fig. 1.
  • the processor further detects the unit 800.
  • the detection unit 800 includes a comparator 802 (for example, a comparison circuit), a data register 804 and a result register 806.
  • the comparator in the detection unit 800 detects whether the dedicated module number corresponding to the task completion request is consistent with the dedicated module number num stored in the data register 804 corresponding to the detection unit 800. If the dedicated module number stored in the data register 804 of the detection unit 800 is consistent with the dedicated module number corresponding to the task completion request, the detection result is valid, and the detection result is recorded in the result register 806. If the dedicated module number stored in the data register 806 of the detection unit 800 and the dedicated module number corresponding to the task completion request are both inconsistent, the detection result is invalid.
  • the processor further includes a flag register, which is used to store information about whether the first dedicated module of the plurality of dedicated modules is sent by the processor when the response mode of the first dedicated module is a query mode.
  • the record of the first task completion request For example, the processor includes a flag register A.
  • the mark register A corresponds to the dedicated module SFU 0 . If it detects that the dedicated module SFU 0 sends a task completion request, the value of the flag register A is set to "1". Otherwise, the value of the flag register A is "0". Or, if the task completion request from the dedicated module SFU 0 is detected, the value of the flag register A is set to "0". Otherwise, the value of the flag register A is "1".
  • the flag register A may be the result register 806. However, the flag register may also be another register capable of storing records about whether the dedicated module issues a task completion request.
  • the controller sends query instructions on the multiple dedicated modules to determine whether the multiple dedicated processing modules issue multiple tasks. Complete the request. If all task completion requests have been issued, the processor executes tasks related to multiple dedicated processing modules.
  • Fig. 9 is a schematic diagram of a detection module in the processor shown in Fig. 1.
  • the processor further includes a detection module 900 for detecting whether multiple dedicated processing modules (for example, dedicated processing modules SFU 0 , SFU 1 ,..., SFU N ) all issue task completion requests.
  • the detection module 900 includes a detection unit 906 to a detection unit 914.
  • the detection unit 906 to the detection unit 914 respectively include a comparator (for example, a comparison circuit), a data register, and a result register.
  • the data register is used to store the code of the special module to be queried.
  • the result register is used to store the comparison result of the comparator.
  • the detection unit 906 includes a comparator 904, a data register 916, and a result register 920.
  • the data register 816 is used to store the dedicated module code numi. It should be noted that the detection unit 908 to the detection unit 914 have a structure similar to that of the detection unit 906. For the sake of brevity, I won't repeat it.
  • each comparator in the detection unit 906 to the detection unit 914 Respectively detect whether the dedicated module number corresponding to the task completion request is consistent with the dedicated module number stored in the data register corresponding to the detection unit. If there is at least one special module number stored in the data register in the detection unit 906 to the detection unit 914 that is consistent with the special module number corresponding to the task completion request, the detection result is valid, and the detection result is recorded in the result register.
  • the detection result is invalid.
  • the result registers of all detection units are valid, it indicates that all the dedicated modules to be queried in the detection modules have completed calculations.
  • the output result of the AND gate 902 is valid.
  • the processor outputs the query instruction completion signal, the query instruction leaves the execution level, and the processor can continue to execute subsequent instructions.
  • Fig. 10 is a flowchart of a task response method according to an embodiment of the present invention.
  • the task response method can be applied to the interaction between processing and dedicated modules.
  • the dedicated module can be a graphics processing module (Graphic Process Unit, GPU), a vector computing module (Vector Process Unit), a floating point processing module (Floatpoint Process Unit, FPU), and a direct memory access module (Direct Memory Access, DMA).
  • a graphics processing module Graphic Process Unit, GPU
  • Vector Process Unit Vector Process Unit
  • FPU floating point processing module
  • DMA Direct Memory Access
  • similar dedicated processing units that can handle a special computing task, such as artificial intelligence (AI) accelerators, fast Fourier transform (Fast Fourier Transform, FFT), and so on.
  • these dedicated modules are connected to the processor through an internal bus and are interconnected through the internal bus.
  • These dedicated modules receive instructions or configuration signals from the CPU, and can also send completion requests to the CPU after completing specific tasks.
  • the task response method includes step S1002 to step S1006.
  • step S1002 receiving multiple task completion requests corresponding to multiple dedicated modules external to the processor coupled to the bus interface;
  • step S1004 obtain multiple response modes of the processor in response to multiple dedicated modules, and store the multiple response modes in the control register; wherein, the multiple response modes are different;
  • step S1006 respond to multiple task completion requests of multiple dedicated modules according to the corresponding response mode in the control register.
  • the multiple response modes include an interrupt mode and a query mode; and the task response method includes: when the response mode of the processor responding to the first dedicated module of the plurality of dedicated modules is the query mode, storing information about the first dedicated module Whether the module issues a record of the first task completion request.
  • multiple bits are used to store the response time of the processor corresponding to at least one task completion request of the multiple task completion requests; the response time is a preset time; and when the processor responds to the first dedicated module
  • the one response mode is the interrupt mode, after receiving at least one task completion request from the first dedicated module through the bus interface, the response to the at least one task completion request is delayed, and the delay time is the response time.
  • a plurality of bits are used to store a response condition corresponding to the processor's request for completion of a plurality of tasks; the response condition is that the processor receives the first dedicated module from the first dedicated module of the plurality of dedicated modules through the bus interface When the processor responds to the first dedicated module in the interrupt mode, and if the response condition is met, the processor responds to the task completion request of the first dedicated module.
  • the first bit is used to store the first response mode corresponding to the processor responding to the first dedicated module among the plurality of dedicated modules; and wherein, when the first bit is 0, the first response mode is an interrupt mode ; When the first bit is 1, the second response mode is the query mode; or when the first bit is 1, the first response mode is the interrupt mode; when the first bit is 0, the second response mode is the query mode.
  • the response mode of the processor in response to the first dedicated module is the query mode
  • at least one query instruction regarding the first dedicated module is sent to determine whether the first dedicated module has issued at least one first task completion request .
  • the processor when the task being executed by the processor requires the processing result of the task executed by the first dedicated module, the processor (for example, the controller in the processor) issues at least one query instruction; or when the processor is idle In the state, the processor (for example, the controller in the processor) issues at least one query command.
  • the at least one query instruction includes at least one of the instruction type, the code of the dedicated module number, the first address of the dedicated module, the number of dedicated modules, and the scalable flag; wherein the scalable flag is used to indicate at least one query Whether the size of the instruction is scalable.
  • the first dedicated module number corresponding to the multiple dedicated modules is stored through the data register; and when the second task completion request is received by the detection unit, the second dedicated module requested by the second task is determined by the detection unit And compare the second dedicated module number with the first dedicated module number in the data register through a comparator to determine the dedicated module corresponding to the second task completion request.
  • a plurality of comparison results are stored in a result register; and when the processor (for example, the controller in the processor) sends a query command, respond to the second task according to the values in the plurality of result registers request.
  • a plurality of comparison results are received, and the plurality of comparison results are ANDed to determine whether a plurality of dedicated modules respectively issue corresponding task completion requests.
  • the multiple response modes of the processor in response to multiple dedicated modules are query modes, it sends query instructions for multiple dedicated modules to determine whether multiple dedicated processing modules issue multiple task completion requests; if All task completion requests have been issued, and tasks related to multiple dedicated processing modules are executed.
  • the processor and the first dedicated module of the plurality of dedicated modules cooperate to complete the same task, if the response mode of the processor in response to the first dedicated module is the query mode, it is estimated that the processor needs to use the first dedicated module.
  • the time point of the processing result of a dedicated module and according to the time point, determine the time point at which the processor (for example, the controller in the processor) sends at least one query instruction about the first dedicated module, so as to reduce the waiting time of the processor .
  • the process status of the process processed by the processor is marked; and the response mode of the processor in response to multiple dedicated modules is dynamically adjusted according to the marked process status of the process status register, and the adjusted response mode is stored.
  • the embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores a computer program, the computer program includes program instructions, and the processor executes the program instructions to execute the following action:
  • the memory is used to store a computer program, and can be configured to store various other data to support operations on the device where it is located.
  • the processor can execute the computer program stored in the memory to realize the corresponding control logic.
  • the memory can be implemented by any type of volatile or non-volatile storage devices or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEAROM), erasable and programmable Read only memory (EAROM), programmable read only memory (AROM), read only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
  • the processor may be any hardware processing device that can execute the foregoing method logic.
  • the processor may be a central processing unit (Central Arocessing Unit, CAU), a graphics processor (GraAhics Arocessing Unit, GAU), or a micro control unit (Microcontroller Unit, MCU); it may also be a Field Programmable Gate Array (Field Programmable Gate Array).
  • FAGA -Arogrammable Gate Array
  • AAL Programmable Array Logic Device
  • General Array Logic Device General Array Logic, GAL
  • Complex Programmable Logic Device ComAlex Arogrammable Logic Device, CALD
  • RISC Advanced Reduced Instruction Set
  • ARM Advanced RISC Machines
  • system chip System on ChiA SOC
  • the embodiments of the present invention can be provided as a method, a system, or a computer program product. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may adopt the form of a computer program product implemented on one or a computer-usable storage medium (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
  • a computer-usable storage medium including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are used to generate It is a device that realizes the functions specified in a flow or a flow in the flowchart and/or a block or a block in the block diagram.
  • These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device.
  • the device implements the functions specified in a flow chart or a flow and/or a block or a block in the block diagram.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing functions specified in a flow or a flow in the flowchart and/or a block or a block in the block diagram.
  • the computing device includes an OR processor (CAU), input/output interface, network interface, and memory.
  • CAU OR processor
  • the memory may include non-permanent memory in a computer-readable medium, random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM).
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer-readable media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology.
  • the information can be computer-readable instructions, data structures, program modules, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (ARAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEAROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

A processor (300), a task response method, a movable platform, a camera, and a computer readable storage medium. The processor (300) comprises a bus interface (306), a control register (304), and a controller (302). The bus interface (306) can be coupled to a plurality of dedicated modules (308, 310, 312, 314) at the exterior of the processor (300) and can be used to receive a plurality of task completion requests for the plurality of dedicated modules (308, 310, 312, 314). The control register (304) is used to store a plurality of response means of the processor (300) in response to the plurality of dedicated modules (308, 310, 312, 314), the plurality of response means being different. The processor (300) responds to the plurality of task completion requests of the plurality of dedicated modules (308, 310, 312, 314) according to the corresponding response means in the control register (304). System efficiency can be improved according to the described processor, task response method, movable platform, camera, and computer readable storage medium.

Description

处理器、任务响应方法、可移动平台、及相机Processor, task response method, movable platform, and camera 技术领域Technical field
本申请涉及计算机技术领域,尤其涉及一种处理器、任务响应方法、可移动平台、相机及计算机可读存储介质。This application relates to the field of computer technology, and in particular to a processor, a task response method, a movable platform, a camera, and a computer-readable storage medium.
背景技术Background technique
随着技术的不断发展,由中央处理器(Center Process Unit,以下简称为CPU)处理执行的指令或者功能有限。如果完全依靠CPU处理某些特定任务将无法高效的完成这些特定任务。因此,为了提高CPU的执行效率、完善系统功能,会在CPU周围增加具有特定功能的协处理器单元或者加速处理单元(下文中统称为专用模块),从而让CPU和专用模块并行工作,以提高整体系统的工作效率。然而,由于CPU与专用模块进行交互时,可能会打断CPU正在处理的任务,从而导致系统效率降低。With the continuous development of technology, the instructions or functions processed and executed by the central processing unit (Center Process Unit, hereinafter referred to as CPU) are limited. If you completely rely on the CPU to handle certain specific tasks, you will not be able to efficiently complete these specific tasks. Therefore, in order to improve the execution efficiency of the CPU and improve the system functions, co-processor units or acceleration processing units with specific functions (hereinafter collectively referred to as dedicated modules) will be added around the CPU to allow the CPU and dedicated modules to work in parallel to improve The efficiency of the overall system. However, when the CPU interacts with the dedicated module, it may interrupt the task that the CPU is processing, which results in a decrease in system efficiency.
发明内容Summary of the invention
有鉴于此,本发明实施例提供一种处理器、任务响应方法、可移动平台、相机及计算机可读存储介质。In view of this, embodiments of the present invention provide a processor, a task response method, a movable platform, a camera, and a computer-readable storage medium.
根据本申请实施例的第一方面,提供了一种处理器,包括:总线接口,以及控制寄存器。其中,总线接口能够耦接于所述处理器外部的多个专用模块,用于接收关于多个所述专用模块的多个任务完成请求。控制寄存器用于存储所述处理器响应多个所述专用模块的多个响应方式;其中,多个所述响应方式不同。处理器与所述控制寄存器相耦接,用于根据所述控制寄存器中的相应的所述响应方式,响应多个所述专用模块的多个所述任务完成请求。According to the first aspect of the embodiments of the present application, a processor is provided, including a bus interface and a control register. Wherein, the bus interface can be coupled to a plurality of dedicated modules outside the processor, and is used to receive a plurality of task completion requests related to the plurality of dedicated modules. The control register is used to store a plurality of response modes of the processor in response to a plurality of the dedicated modules; wherein, a plurality of the response modes are different. The processor is coupled to the control register, and is configured to respond to the multiple task completion requests of the multiple dedicated modules according to the corresponding response mode in the control register.
根据本申请实施例的第二方面,提供了一种任务响应方法,包括:接收耦接于总线接口的、位于处理器外部的多个专用模块对应的多个任务完成请求;获取所述处理器响应多个所述专用模块的多个响应方式,以及将多个所述响应方式存储于控制寄存器中;其中,多个所述响应方式不同;以及根据 所述控制寄存器中的相应的所述响应方式,响应多个所述专用模块的多个所述任务完成请求。According to a second aspect of the embodiments of the present application, there is provided a task response method, including: receiving multiple task completion requests corresponding to multiple dedicated modules coupled to a bus interface and located outside the processor; and obtaining the processor Respond to multiple response modes of multiple dedicated modules, and store multiple response modes in a control register; wherein multiple response modes are different; and according to the corresponding response in the control register Way, responding to a plurality of said task completion requests of a plurality of said dedicated modules.
根据本申请实施例的第三方面,提供了一种存储有计算机指令的计算机可读存储介质。当所述计算机指令被一个或多个处理器执行时,所述一个或多个处理器执行包括以下的动作:接收耦接于总线接口的、位于处理器外部的多个专用模块对应的多个任务完成请求;获取所述处理器响应多个所述专用模块的多个响应方式,以及将多个所述响应方式存储于控制寄存器中;其中,多个所述响应方式不同;以及根据所述控制寄存器中的相应的所述响应方式,响应多个所述任务完成请求。According to a third aspect of the embodiments of the present application, a computer-readable storage medium storing computer instructions is provided. When the computer instructions are executed by one or more processors, the one or more processors perform actions including the following: receiving multiple dedicated modules that are coupled to the bus interface and located outside the processor. Task completion request; acquiring multiple response modes of the processor in response to multiple dedicated modules, and storing multiple response modes in a control register; wherein a plurality of the response modes are different; and according to the The corresponding response mode in the control register responds to a plurality of the task completion requests.
根据本申请实施例的第四方面,提供了一种可移动平台。所述可移动平台包括本申请实施例的第一方面提到的处理器。According to the fourth aspect of the embodiments of the present application, a movable platform is provided. The movable platform includes the processor mentioned in the first aspect of the embodiments of the present application.
根据本申请实施例的第五方面,提供了一种相机。所述相机包括本申请实施例的第一方面提到的处理器。According to a fifth aspect of the embodiments of the present application, a camera is provided. The camera includes the processor mentioned in the first aspect of the embodiments of the present application.
根据本发明实施例的处理器、任务响应方法、可移动平台、相机及计算机可读存储介质,可提高系统效率。The processor, task response method, movable platform, camera, and computer-readable storage medium according to the embodiments of the present invention can improve system efficiency.
附图说明Description of the drawings
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:The drawings described here are used to provide a further understanding of the application and constitute a part of the application. The exemplary embodiments and descriptions of the application are used to explain the application, and do not constitute an improper limitation of the application. In the attached picture:
图1为根据本发明一实施方式的处理器与专用模块的示意图;Fig. 1 is a schematic diagram of a processor and a dedicated module according to an embodiment of the present invention;
图2为图1所示的处理器的中断响应方式的流程图;FIG. 2 is a flowchart of the interrupt response mode of the processor shown in FIG. 1;
图3为根据本发明另一实施方式的处理器和多个专用模块的示意图;3 is a schematic diagram of a processor and multiple dedicated modules according to another embodiment of the present invention;
图4为图1所示的控制寄存器配置的示意图;FIG. 4 is a schematic diagram of the configuration of the control register shown in FIG. 1;
图5为图1所示的处理器的查询指令的示意图;FIG. 5 is a schematic diagram of a query instruction of the processor shown in FIG. 1;
图6a为图1所示的处理器在根据本发明一实施方式的查询方式下响应专用模块的流程图;Fig. 6a is a flowchart of the processor shown in Fig. 1 responding to a dedicated module in a query mode according to an embodiment of the present invention;
图6b为图1所示的处理器在根据本发明一实施方式的查询方式下响应专用模块的流程图;Fig. 6b is a flowchart of the processor shown in Fig. 1 responding to the dedicated module in the query mode according to an embodiment of the present invention;
图7a为图1所示的处理器使用根据本发明一实施方式的查询指令的伪代码的示意图;Fig. 7a is a schematic diagram of the processor shown in Fig. 1 using a pseudo code of a query instruction according to an embodiment of the present invention;
图7b为图1所示的处理器使用根据本发明另一实施方式的查询指令的伪代码的示意图;Fig. 7b is a schematic diagram of the processor shown in Fig. 1 using a pseudo code of a query instruction according to another embodiment of the present invention;
图8为图1所示的处理器中的检测单元的示意图;FIG. 8 is a schematic diagram of a detection unit in the processor shown in FIG. 1;
图9为图1所示的处理器中的检测模块的示意图;FIG. 9 is a schematic diagram of a detection module in the processor shown in FIG. 1;
图10为根据本发明实施方式的任务响应方法的流程图。Fig. 10 is a flowchart of a task response method according to an embodiment of the present invention.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of the present application clearer, the technical solutions of the present application will be described clearly and completely in conjunction with specific embodiments of the present application and the corresponding drawings. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
随着技术的不断发展,由中央处理器(Center Process Unit,以下简称为CPU)处理所能执行的指令或者功能有限。如果完全依靠CPU处理某些特定任务将无法高效的完成特定任务。因此,为了提高CPU的执行效率、完善系统功能,在现今常见系统中,会在CPU周围增加具有特定功能的协处理器单元或者加速处理单元(下文中统称为专用模块),从而让CPU和专用模块并行工作,以提高整体系统的工作效率。With the continuous development of technology, the instructions or functions that can be executed by the central processing unit (Center Process Unit, hereinafter referred to as CPU) are limited. If you completely rely on the CPU to handle certain tasks, you will not be able to efficiently complete certain tasks. Therefore, in order to improve the execution efficiency of the CPU and improve the system functions, in today’s common systems, co-processor units or acceleration processing units with specific functions (hereinafter collectively referred to as dedicated modules) are added around the CPU, so that the CPU and dedicated Modules work in parallel to improve the efficiency of the overall system.
请参见图1。图1为根据本发明一实施方式的处理器与专用模块的示意图。处理器106(例如,CPU)通过系统总线104与内存102进行数据交互。此外,处理器106通过内部总线110与各种专用模块进行数据交互。也就是说,在结构上,这些专用模块通过内部总线110与处理器连接,从而接收来自处理器的指令或者配置信号。进一步,这些专用模块能够在完成特定任务后向处理器发 送完成请求。此外,这些专用模块也通过内部总线110相互连接。其中,这些专用模块可以是具有图形处理功能的图形处理模块(Graphic Process Unit,GPU)120、矢量计算模块(Vector Process Unit)112、浮点处理模块(Floatpoint Process Unit,FPU)114,直接内存访问模块(Direct Memory Access,DMA)108或者类似的可处理某个特殊计算任务的专用处理单元。例如,人工智能(AI)加速器118、快速傅里叶变换(Fast Fourier Transform,FFT)116等等。这些专用模块可以通过扩展的指令集或者提供配置寄存器来扩展处理器的处理功能。然而,本发明并非限于此,也可以包括实现其他功能的专用模块。See Figure 1. Fig. 1 is a schematic diagram of a processor and a dedicated module according to an embodiment of the present invention. The processor 106 (for example, CPU) exchanges data with the memory 102 through the system bus 104. In addition, the processor 106 performs data interaction with various special modules through the internal bus 110. That is to say, structurally, these dedicated modules are connected to the processor through the internal bus 110, so as to receive instructions or configuration signals from the processor. Furthermore, these dedicated modules can send a completion request to the processor after completing a specific task. In addition, these dedicated modules are also connected to each other through the internal bus 110. Among them, these dedicated modules can be a graphics processing module (Graphic Process Unit, GPU) 120, a vector computing module (Vector Process Unit) 112, a floating point processing module (Floatpoint Process Unit, FPU) 114, and direct memory access A module (Direct Memory Access, DMA) 108 or a similar dedicated processing unit capable of processing a special computing task. For example, artificial intelligence (AI) accelerator 118, fast Fourier transform (Fast Fourier Transform, FFT) 116, and so on. These dedicated modules can extend the processing capabilities of the processor by extending the instruction set or providing configuration registers. However, the present invention is not limited to this, and may also include dedicated modules for realizing other functions.
以处理器106对内存102的数据搬运为例。在处理器106和内存102之间增加一组DMA108作为专用模块。处理器106向DMA108发送配置信息,以对DMA108中的配置寄存器进行配置,将需要从内存102中搬运数据的所在地址、需要搬运的数据大小等信息告知DMA108。配置完成之后,DMA108通过系统总线104,按照处理器106的配置信息,向内存102进行数据搬运。而在DMA108进行搬运的过程中,处理器106并不介入DMA108搬运数据的过程,因此可以进行其他任务的处理。当DMA108完成搬运数据的任务结束后,DMA108将向处理器106发出中断请求。处理器106在接收到DMA108发出的中断请求后,停下正在处理的其他任务,响应DMA108发送的中断请求。在中断请求对应的中断服务函数中,处理器106处理DMA108搬运回的内存数据,并带着本次操作的结果以及状态继续执行被打断的任务。Take the data transfer of the processor 106 to the memory 102 as an example. A group of DMA 108 is added between the processor 106 and the memory 102 as a dedicated module. The processor 106 sends configuration information to the DMA 108 to configure the configuration registers in the DMA 108, and informs the DMA 108 of information such as the address where the data needs to be transferred from the memory 102 and the size of the data to be transferred. After the configuration is completed, the DMA 108 transfers data to the memory 102 through the system bus 104 according to the configuration information of the processor 106. In the process of transferring data by the DMA 108, the processor 106 does not intervene in the process of transferring data by the DMA 108, so other tasks can be processed. When the DMA 108 completes the task of transferring data, the DMA 108 will issue an interrupt request to the processor 106. After the processor 106 receives the interrupt request sent by the DMA 108, it stops other tasks being processed, and responds to the interrupt request sent by the DMA 108. In the interrupt service function corresponding to the interrupt request, the processor 106 processes the memory data transported back by the DMA 108, and continues to execute the interrupted task with the result and status of this operation.
DMA108任务完成后,处理器106响应DMA108的上述响应方式称为中断响应方式。请参见图2。图2为图1所示的处理器的中断响应方式的流程图。在中断响应方式中,如步骤S202所示,处理器(例如,CPU)配置专用模块(例如,DMA),并将关于专用模块的配置信号或其他指令发送给专用模块。此时,如步骤S222所示,专用模块从处理器接收配置信号,并配置专用模块中的配置寄存器,并且专用模块接收其他指令,以执行与这些指令相关的操作。在一个实施方式中,处理器向专用模块(例如,DMA)发送配置信号,以用于对专用模块中的配置寄存器进行配置。After the DMA108 task is completed, the above-mentioned response mode of the processor 106 in response to the DMA108 is called an interrupt response mode. See Figure 2. Fig. 2 is a flowchart of the interrupt response mode of the processor shown in Fig. 1. In the interrupt response mode, as shown in step S202, the processor (for example, CPU) configures a dedicated module (for example, DMA), and sends configuration signals or other instructions related to the dedicated module to the dedicated module. At this time, as shown in step S222, the dedicated module receives the configuration signal from the processor and configures the configuration register in the dedicated module, and the dedicated module receives other instructions to perform operations related to these instructions. In one embodiment, the processor sends a configuration signal to the dedicated module (for example, DMA) for configuring the configuration register in the dedicated module.
在步骤S204中,处理器执行其他任务。此时,如步骤S224所示,专用模块在对配置寄存器进行配置之后,开始工作。例如,专用模块开始执行与配置信号或其他指令相关的操作或处理。如步骤S226所示,在完成与配置信号或处理器发送的其他指令相关的操作或处理之后,专用模块发出任务完成请求至处理器。In step S204, the processor performs other tasks. At this time, as shown in step S224, the dedicated module starts to work after configuring the configuration register. For example, the dedicated module starts to perform operations or processing related to configuration signals or other instructions. As shown in step S226, after completing the operation or processing related to the configuration signal or other instructions sent by the processor, the dedicated module sends a task completion request to the processor.
在步骤S206中,虽然处理器在执行其他任务,但是一旦处理器接收到专用模块发出的任务完成请求,处理器正在执行的其他任务被打断。此时,处理器进入中断服务程序。In step S206, although the processor is executing other tasks, once the processor receives the task completion request sent by the dedicated module, other tasks being executed by the processor are interrupted. At this time, the processor enters the interrupt service routine.
如步骤S208所示,在执行完中断服务程序之后,处理器继续执行先前执行的、被打断的其他任务。As shown in step S208, after the interrupt service routine is executed, the processor continues to execute other tasks that were previously executed and were interrupted.
如步骤S210所示,在执行完先前执行的、被打断的任务后,处理器进入空闲状态,或者执行新的任务。As shown in step S210, after executing the previously executed and interrupted task, the processor enters an idle state or executes a new task.
需要指出的是,在处理器以中断响应的方式与专用模块进行交互时,存在一些缺点,如下所示:It should be pointed out that there are some shortcomings when the processor interacts with a dedicated module in an interrupt response mode, as shown below:
首先,当专用模块完成任务并发出中断请求后时,无论处理器当前在做什么工作,处理器都需要响应专用模块发送的中断请求,这意味着处理器当前处理的任务一定会被打断。因此,从某种程度上来说,这种交互和响应方式干扰了处理器正在处理的任务进程。以DMA搬运数据为例,如果DMA需要搬运的数据较小,但是搬运次数较多时,处理器正在处理的任务将不断被打断,致使系统效率的降低。First of all, when the dedicated module completes the task and issues an interrupt request, no matter what the processor is currently doing, the processor needs to respond to the interrupt request sent by the dedicated module, which means that the task currently processed by the processor will definitely be interrupted. Therefore, to some extent, this way of interaction and response interferes with the process of tasks being processed by the processor. Take DMA to move data as an example. If the data that DMA needs to move is small, but the number of transfers is large, the tasks that the processor is processing will be continuously interrupted, which will reduce the efficiency of the system.
其次,当专用模块完成任务时,若专用模块的任务处理结果与处理器当前处理的任务之间没有任何依赖关系,那么,打断处理器正在处理的任务是没有任何意义的。以DMA搬运数据为例,若处理器正在处理的任务不需要使用DMA搬运的数据,那么处理器此时响应DMA的中断请求,对于处理器正在处理的任务是没有意义的。并且,若如果处理器正在处理的任务具有较高的实时性要求,那么响应专用模块的中断请求,反而降低了处理器正在处理的任务实时性。Secondly, when the dedicated module completes the task, if there is no dependency between the task processing result of the dedicated module and the task currently processed by the processor, then it does not make any sense to interrupt the task being processed by the processor. Taking DMA to transfer data as an example, if the task being processed by the processor does not require data to be transferred by DMA, then the processor responds to the interrupt request of the DMA at this time, which is meaningless for the task being processed by the processor. Moreover, if the task being processed by the processor has higher real-time requirements, then responding to the interrupt request of the dedicated module will reduce the real-time performance of the task being processed by the processor.
基于上述问题,本申请实施例提供了一种处理器、任务响应方法、可移动 平台、相机及计算机可读存储介质,可提高系统效率。需要说明的是,本申请实施例提供了一种处理器、任务响应方法及计算机可读存储介质适用于任意具有加速单元或协处理单元的系统中。例如,所述处理器、任务响应方法可以应用于手机、电脑、个人平板或者PDA(Personal Digital Assistant,个人数字助理)等具备联网的设备中。或者,所述处理器、任务响应方法也可以应用于可移动平台(如无人机、无人车或者无人船)、相机、扫地机器人或者智能音箱等设备中。Based on the foregoing problems, the embodiments of the present application provide a processor, a task response method, a movable platform, a camera, and a computer-readable storage medium, which can improve system efficiency. It should be noted that the embodiments of the present application provide a processor, a task response method, and a computer-readable storage medium, which are applicable to any system with an acceleration unit or a co-processing unit. For example, the processor and the task response method can be applied to devices with networking, such as mobile phones, computers, personal tablets, or PDAs (Personal Digital Assistant, personal digital assistants). Alternatively, the processor and task response method can also be applied to mobile platforms (such as unmanned aerial vehicles, unmanned vehicles or unmanned ships), cameras, sweeping robots, or smart speakers and other equipment.
根据本发明的一实施方式,提出了一种处理器,包括:总线接口和控制寄存器。其中,总线接口,能够耦接于处理器外部的多个专用模块,用于接收关于多个专用模块的多个任务完成请求;以及控制寄存器,用于存储处理器响应多个专用模块的多个响应方式;其中,多个响应方式不同;其中,所述处理器根据控制寄存器中的相应的所述响应方式,响应多个专用模块的多个所述任务完成请求。According to an embodiment of the present invention, a processor is proposed, which includes a bus interface and a control register. Among them, the bus interface can be coupled to multiple dedicated modules outside the processor, and is used to receive multiple task completion requests related to multiple dedicated modules; and a control register is used to store the processor in response to multiple dedicated modules. Response mode; wherein multiple response modes are different; wherein, the processor responds to multiple task completion requests of multiple dedicated modules according to the corresponding response mode in the control register.
根据本发明的另一实施方式,提出了一种可移动平台。所述可移动平台包括上述处理器。According to another embodiment of the present invention, a movable platform is provided. The movable platform includes the above-mentioned processor.
根据本发明的另一实施方式,提出了一种相机。所述相机包括上述处理器。According to another embodiment of the present invention, a camera is provided. The camera includes the above-mentioned processor.
需要说明的是,本发明并非限于此,处理器也可以进一步包括一控制器,其与控制寄存器相耦接,用于根据控制寄存器中的相应的所述响应方式,响应多个所述专用模块的多个所述任务完成请求。请参见图3。图3为根据本发明另一实施方式的处理器和多个专用模块的示意图。处理器300包括控制器302,控制寄存器304,以及总线接口306。其中,总线接口306,能够耦接于处理器300外部的多个专用模块,用于接收关于这些专用模块的多个任务完成请求。控制寄存器304,用于存储处理器300响应这些专用模块的多个响应方式。这些多个所述响应方式互不相同。控制器302,与控制寄存器304相耦接,用于根据控制寄存器304中的相应的响应方式,响应这些专用模块的多个任务完成请求。It should be noted that the present invention is not limited to this, the processor may also further include a controller, which is coupled to the control register, for responding to a plurality of the dedicated modules according to the corresponding response mode in the control register A number of said task completion requests. See Figure 3. Fig. 3 is a schematic diagram of a processor and multiple dedicated modules according to another embodiment of the present invention. The processor 300 includes a controller 302, a control register 304, and a bus interface 306. Among them, the bus interface 306 can be coupled to multiple dedicated modules outside the processor 300 for receiving multiple task completion requests related to these dedicated modules. The control register 304 is used to store multiple response modes of the processor 300 in response to these dedicated modules. These multiple response modes are different from each other. The controller 302 is coupled to the control register 304, and is configured to respond to multiple task completion requests of these dedicated modules according to the corresponding response mode in the control register 304.
在图3中,处理器300(例如,CPU)通过内部总线与N个专用模块(Specific Function Unit,SFU)相接。系统对N个专用模块进行编号,并确保所有专用模块的编号不重复。如图3所示,N个专用模块为专用模块SFU 1 308,专用模块SFU 2 310,专用模块SFU 3 312......以及专用模块SFU N 314。由图3可知,在此实施例中,不需要对内部总线的连接方式作出改变。在另一实施方式中,系统可以对N个专用模块进行分类。例如,根据专用模块的功能和用途来进行分类,并根据不同的分类结果来给专用模块进行编号。举例来说,若存在两种不同功能的专用模块,则将不同功能的专用模块用不同的编码来编号。如果存在A功能的多个专用模块A1、A2和A3和B功能的多个专用模块B1、B2和B3,则将专用模块A1、A2和A3编码为000,001,010,将专用模块B1、B2和B3编码为100,101,110。也就是说,利用4位比特来编码专用模块A1、A2和A3和专用模块B1、B2和B3。其中,专用模块的编号的最高位为“0”时,表示专用模块是具有A功能的专用模块。专用模块的编号的最高位为“1”时,表示专用模块是具有B功能的专用模块。或者,专用模块的编号的最高位为“1”时,表示专用模块是具有A功能的专用模块。专用模块的编号的最高位为“0”时,表示专用模块是具有B功能的专用模块。也就是说,只要不同功能的专用模块的最高位的比特值不同即可。例如,若存在多个图形处理模块、多个人工智能加速器,则将多个图像处理模块的最高位编码为“0”,将多个人工智能加速器的最高位编码为“1”。然而,本发明并非限于此。其他可以实现专用模块编码分类的编码方式均落入本发明的保护范围。 In FIG. 3, a processor 300 (for example, a CPU) is connected to N specific function units (SFU) through an internal bus. The system numbers N dedicated modules and ensures that the numbers of all dedicated modules are not repeated. As shown in Figure 3, the N dedicated modules are dedicated module SFU 1 308, dedicated module SFU 2 310, dedicated module SFU 3 312... and dedicated module SFU N 314. It can be seen from FIG. 3 that in this embodiment, there is no need to change the connection mode of the internal bus. In another embodiment, the system can classify N dedicated modules. For example, the special modules are classified according to their functions and uses, and the special modules are numbered according to different classification results. For example, if there are two dedicated modules with different functions, the dedicated modules with different functions are numbered with different codes. If there are multiple dedicated modules A1, A2, and A3 with A function and multiple dedicated modules B1, B2, and B3 with B function, then the dedicated modules A1, A2, and A3 are encoded as 000, 001, 010, and the dedicated modules B1, B2 and B3 are coded as 100, 101, 110. That is, 4 bits are used to encode the dedicated modules A1, A2, and A3 and the dedicated modules B1, B2, and B3. Among them, when the highest bit of the number of the dedicated module is "0", it means that the dedicated module is a dedicated module with A function. When the highest bit of the dedicated module number is "1", it means that the dedicated module is a dedicated module with B function. Or, when the highest bit of the number of the dedicated module is "1", it means that the dedicated module is a dedicated module with A function. When the highest bit of the dedicated module number is "0", it means that the dedicated module is a dedicated module with B function. In other words, as long as the highest bit values of dedicated modules with different functions are different. For example, if there are multiple graphics processing modules and multiple artificial intelligence accelerators, the highest bit of the multiple image processing modules is coded as "0", and the highest bit of the multiple artificial intelligence accelerators is coded as "1". However, the present invention is not limited to this. Other coding methods that can realize the coding and classification of dedicated modules all fall into the protection scope of the present invention.
根据本发明的一实施方式,处理器采用多种响应方式响应N个专用模块。例如,多种响应方式包括中断方式和查询方式。在查询方式下,处理器(例如,CPU)通过执行查询指令,检查专用模块是否完成任务。若专用模块没有完成任务,则处理器进入中止状态;若检测到专用模块已完成该任务,处理器可以执行后续指令。由于处理器执行的查询指令是由根据算法或者系统需求所决定的,因此,可以确保在执行查询指令时,处理器处在空闲状态,或者处理器此时急需使用专用模块的计算结果。According to an embodiment of the present invention, the processor responds to the N dedicated modules in multiple response modes. For example, multiple response modes include interrupt mode and query mode. In the query mode, the processor (for example, CPU) checks whether the dedicated module has completed the task by executing the query instruction. If the dedicated module has not completed the task, the processor enters the suspended state; if it is detected that the dedicated module has completed the task, the processor can execute subsequent instructions. Since the query instruction executed by the processor is determined according to the algorithm or system requirements, it can be ensured that the processor is in an idle state when the query instruction is executed, or the processor urgently needs to use the calculation result of the dedicated module at this time.
在处理器300中,增加一组控制寄存器(Control Register,CR)304,且控制寄存器304的位宽为N比特。控制寄存器304上的每个比特与N个专用模块一一对应。也就是说,控制寄存器304上的每个比特分别对应于专用模块SFU 1 308,专用模块SFU 2 310,专用模块SFU 3 312......以及专用模块SFU N 314。 In the processor 300, a set of control registers (CR) 304 is added, and the bit width of the control register 304 is N bits. Each bit on the control register 304 has a one-to-one correspondence with N dedicated modules. That is, each bit on the control register 304 corresponds to the dedicated module SFU 1 308, the dedicated module SFU 2 310, the dedicated module SFU 3 312... and the dedicated module SFU N 314, respectively.
在一个实施方式中,控制寄存器304包括第一比特。第一比特用于存储对应于处理器300响应多个专用模块(例如,N个专用模块)中的第一专用模块(例如,专用模块SFU 1 308,专用模块SFU 2 310,专用模块SFU 3 312......以及专用模块SFU N 314中的一个)的第一响应方式。其中,当第一比特为“0”时,第一响应方式为中断方式;当第一比特为“1”时,第一响应方式为查询方式。或者,当第一比特为“1”时,第一响应方式为中断方式;当所述第一比特为“0”时,第一响应方式为查询方式。也就是说,只要两种响应方式对应的比特的比特值不同即可。 In one embodiment, the control register 304 includes the first bit. The first bit is used to store the first dedicated module (eg, dedicated module SFU 1 308, dedicated module SFU 2 310, dedicated module SFU 3 312) among a plurality of dedicated modules (eg, N dedicated modules) corresponding to the processor 300 response. ...And the first response mode of one of the dedicated modules SFU N 314). Among them, when the first bit is "0", the first response mode is the interrupt mode; when the first bit is "1", the first response mode is the query mode. Or, when the first bit is "1", the first response mode is an interrupt mode; when the first bit is "0", the first response mode is a query mode. In other words, as long as the bit values of the bits corresponding to the two response modes are different.
根据本发明的一实施方式,控制寄存器304的每个比特可以配置为逻辑1或者逻辑0。当控制寄存器304的某个比特配置为逻辑1时,处理器300响应对应的专用模块的响应方式为查询方式。也就是说,此时该比特配置在查询模式下。当控制寄存器304的某个比特配置为逻辑0时,处理器300响应对应的专用模块的响应方式为中断方式。也就是说,此时该比特配置在中断模式下。According to an embodiment of the present invention, each bit of the control register 304 can be configured as a logic 1 or a logic 0. When a certain bit of the control register 304 is configured as a logic 1, the response mode of the processor 300 in response to the corresponding dedicated module is the query mode. In other words, the bit is configured in the query mode at this time. When a bit of the control register 304 is configured as a logic 0, the response mode of the processor 300 in response to the corresponding dedicated module is an interrupt mode. In other words, the bit is configured in interrupt mode at this time.
另外,在其他实施方式中,当控制寄存器304的某个比特配置为逻辑0时,处理器300响应对应的专用模块的响应方式为查询方式。也就是说,此时该比特配置在查询模式下。当控制寄存器304的某个比特配置为逻辑1(即,该比特配置在中断模式下)时,处理器300响应对应的专用模块的响应方式为中断方式。也就是说,此时该比特配置在中断模式下。In addition, in other embodiments, when a bit of the control register 304 is configured as a logic 0, the response mode of the processor 300 in response to the corresponding dedicated module is the query mode. In other words, the bit is configured in the query mode at this time. When a certain bit of the control register 304 is configured as a logic 1 (that is, the bit is configured in the interrupt mode), the response mode of the processor 300 in response to the corresponding dedicated module is the interrupt mode. In other words, the bit is configured in interrupt mode at this time.
根据本发明的一实施方式,中断方式和查询方式可以共存在N个专用模块中。在配置控制寄存器304时,需要将与各个专用模块对应的比特配置在对应的逻辑值。对于处理器300来说,处理器300将根据控制寄存器304中的每一个比特所配置的逻辑值,确定N个专用模块中对应的专门模块所采取的响应方 式。当控制寄存器的比特配置为逻辑1时,处理器响应相应的专用模块的响应方式为查询方式,以及当控制寄存器的比特配置为逻辑0时,处理器响应相应的专用模块的响应方式为中断方式。若处理器300响应专用模块SFU 1 308和专用模块SFU 3 312的响应方式为查询方式,处理器300响应专用模块SFU 2 310和专用模块SFU N 314的响应方式为中断方式,则专用模块SFU 1 308和专用模块SFU 3 312对应的比特配置为逻辑1,专用模块SFU 2 310和专用模块SFU N 314对应的比特配置为逻辑0。在另一实施方式中,当控制寄存器的比特配置为逻辑0时,处理器响应相应的专用模块的响应方式为查询方式,以及当控制寄存器的比特配置为逻辑1时,处理器响应相应的专用模块的响应方式为中断方式。若处理器300响应专用模块SFU 1 308和专用模块SFU 3 312的响应方式为查询方式,处理器300响应专用模块SFU 2 310和专用模块SFU N 314的响应方式为中断方式,则专用模块SFU 1 308和专用模块SFU 3 312对应的比特配置为逻辑0,专用模块SFU 2 310和专用模块SFU N 314对应的比特配置为逻辑1。 According to an embodiment of the present invention, the interrupt mode and the query mode can coexist in N dedicated modules. When configuring the control register 304, it is necessary to configure the bits corresponding to each dedicated module at the corresponding logical value. For the processor 300, the processor 300 will determine the response mode taken by the corresponding special module among the N special modules according to the logic value configured by each bit in the control register 304. When the bit of the control register is configured as logic 1, the response mode of the processor in response to the corresponding dedicated module is query mode, and when the bit of the control register is configured as logic 0, the response mode of the processor in response to the corresponding dedicated module is interrupt mode . If the response mode of the processor 300 in response to the special module SFU 1 308 and the special module SFU 3 312 is query mode, and the response mode of the processor 300 in response to the special module SFU 2 310 and the special module SFU N 314 is interrupt mode, the special module SFU 1 The bits corresponding to 308 and the dedicated module SFU 3 312 are configured as logic 1, and the bits corresponding to the dedicated module SFU 2 310 and the dedicated module SFU N 314 are configured as logic 0. In another embodiment, when the bit of the control register is configured as logic 0, the response mode of the processor in response to the corresponding dedicated module is query mode, and when the bit of the control register is configured as logic 1, the processor responds to the corresponding dedicated module. The response mode of the module is interrupt mode. If the response mode of the processor 300 in response to the special module SFU 1 308 and the special module SFU 3 312 is query mode, and the response mode of the processor 300 in response to the special module SFU 2 310 and the special module SFU N 314 is interrupt mode, the special module SFU 1 The bits corresponding to 308 and the dedicated module SFU 3 312 are configured as logic 0, and the bits corresponding to the dedicated module SFU 2 310 and the dedicated module SFU N 314 are configured as logic 1.
请参见图4,图4为图1所示的控制寄存器配置的示意图。如图4所示,专用模块SFU 1 420,专用模块SFU 2 422和专用模块SFU 4 426对应于控制寄存器中的比特分别为比特402,比特404,和比特408,并且比特402,比特404,比特408的逻辑值均为1。专用模块SFU 3 424和专用模块SFU 5 428对应于控制寄存器中的比特分别为比特406和比特410,并且比特406和比特410的逻辑值均为0。若控制寄存器的比特配置为逻辑1时表示查询方式以及控制寄存器的比特配置为逻辑0时表示中断方式,则处理器响应专用模块SFU 1 420,专用模块SFU 2 422和专用模块SFU 4 426的响应方式均为查询方式,处理器响应专用模块SFU 3 424和专用模块SFU 5 428的响应方式均为中断方式。也就是说,处理器通过查询方式与专用模块SFU 1 420,专用模块SFU 2 422和专用模块SFU 4 426进行交互,处理器通过中断方式与专用模块SFU 3 424和专用模块SFU 5 428进行交互。反之,若控制寄存器的比特配置为逻辑0时表示查询方式以及控制寄存器的比特配置为逻辑1时表示中断方式,则处理器响应专用模块SFU 1 420,专用模块SFU 2 422和专用模块SFU 4 426的响应方式均为中断方式,处理器响应专用模块 SFU 3 424和专用模块SFU 5 428的响应方式均为查询方式。也就是说,处理器通过中断方式与专用模块SFU 1 420,专用模块SFU 2 422和专用模块SFU 4 426进行交互,处理器通过查询方式与专用模块SFU 3 424和专用模块SFU 5 428进行交互。因此,通过控制寄存器分别配置CPU响应N个专用模块的响应方式是确保系统具有灵活的任务响应方式的前提和基础。 Please refer to Fig. 4, which is a schematic diagram of the control register configuration shown in Fig. 1. As shown in Figure 4, the dedicated module SFU 1 420, the dedicated module SFU 2 422 and the dedicated module SFU 4 426 correspond to the bits in the control register being bit 402, bit 404, and bit 408, respectively, and bit 402, bit 404, bit The logical value of 408 is 1. The dedicated module SFU 3 424 and the dedicated module SFU 5 428 correspond to the bits in the control register being bit 406 and bit 410, respectively, and the logical values of bit 406 and bit 410 are both 0. If the bit configuration of the control register is logic 1, it means the query mode and when the bit configuration of the control register is logic 0, it means the interrupt mode, the processor responds to the response of the dedicated module SFU 1 420, the dedicated module SFU 2 422 and the dedicated module SFU 4 426 The modes are all query modes, and the response modes of the processor's response to the dedicated module SFU 3 424 and the dedicated module SFU 5 428 are all interrupt modes. In other words, the processor interacts with the special module SFU 1 420, the special module SFU 2 422, and the special module SFU 4 426 in the query mode, and the processor interacts with the special module SFU 3 424 and the special module SFU 5 428 in the interrupt mode. Conversely, if the bit configuration of the control register is logic 0, it means the query mode and when the bit configuration of the control register is logic 1, it means the interrupt mode. The processor responds to the dedicated module SFU 1 420, the dedicated module SFU 2 422 and the dedicated module SFU 4 426. The response mode of the processor is interrupt mode, and the response mode of the processor response to the dedicated module SFU 3 424 and the dedicated module SFU 5 428 is the query mode. That is to say, the processor interacts with the special module SFU 1 420, the special module SFU 2 422, and the special module SFU 4 426 through the interrupt mode, and the processor interacts with the special module SFU 3 424 and the special module SFU 5 428 through the query mode. Therefore, configuring the response mode of the CPU to respond to the N dedicated modules through the control register is the prerequisite and basis for ensuring that the system has a flexible task response mode.
需要说明的是,控制寄存器不仅可以包括指示处理器响应专用模块的响应方式的比特,并且控制寄存器还可以包括一些其他比特,以实现处理器响应专用模块的其他功能。例如,控制处理器可以包括多个比特,其中,这些比特用于存储对应于处理器关于多个任务完成请求中的至少一任务完成请求的响应时间。当处理器响应一专用模块的响应方式为中断方式时,处理器通过总线接口从该专用模块接收到至少一任务完成请求后,处理器中的控制器延迟响应至少一任务完成请求,其中延迟时间为所述响应时间。It should be noted that the control register may not only include bits indicating the response mode of the processor in response to the dedicated module, and the control register may also include some other bits to implement other functions of the processor in response to the dedicated module. For example, the control processor may include a plurality of bits, where these bits are used to store a response time corresponding to the processor's request for completion of at least one of the plurality of task completion requests. When the processor responds to a dedicated module in the interrupt mode, after the processor receives at least one task completion request from the dedicated module through the bus interface, the controller in the processor delays responding to at least one task completion request, where the delay time Is the response time.
在一个实施例中,响应时间为预设时间。在另一实施例中,响应时间可以根据处理器的任务完成情况和任务优先级来实时调整。例如,如果处理器当前处理的任务为优先级最高的任务,则即使处理器响应一专用模块的响应方式为中断方式,控制器不会在接收到关于此专用模块的任务完成请求之后马上响应该专用模块。而是延迟响应改任务完成请求。其中,延迟时间是依据系统估计的关于当前处理任务的完成时间而设置的。也就是说,该延迟时间可以根据任务的完成时间而实时调整。在此实施例中,处理器的灵活性更高。如此一来,处理器既能够保证最优级最高的任务得到及时处理,也能保证处理器响应专用模块的效率。此外,为简化计算,也可以事先预设响应时间。即,响应时间为预设时间。在一个实施方式中,响应时间是根据完成一个任务需要的最长时间而预先设置的。需要说明的是,以上实施方式仅用于解释说明本发明,而并非用于限制本发明。In one embodiment, the response time is a preset time. In another embodiment, the response time can be adjusted in real time according to the task completion status and task priority of the processor. For example, if the task currently processed by the processor is the highest priority task, even if the processor responds to a dedicated module in an interrupt mode, the controller will not immediately respond to the task completion request for this dedicated module. Dedicated module. Instead, it delays the response to the task completion request. Among them, the delay time is set according to the completion time of the current processing task estimated by the system. In other words, the delay time can be adjusted in real time according to the completion time of the task. In this embodiment, the flexibility of the processor is higher. In this way, the processor can not only ensure that the task with the highest optimal level is processed in time, but also ensure the efficiency of the processor in responding to the dedicated module. In addition, in order to simplify the calculation, the response time can also be preset in advance. That is, the response time is the preset time. In one embodiment, the response time is preset according to the longest time required to complete a task. It should be noted that the above embodiments are only used to explain the present invention, but not to limit the present invention.
在另一实施方式中,如果处理器当前处理的任务为即将处理完成的任务,则即使处理器响应一专用模块的响应方式为中断方式,处理器中的控制器不会在接收到关于此专用模块的任务完成请求之后马上响应该专用模块。处理 器中的控制器延迟响应该任务完成请求。其中,延迟时间为响应时间。响应时间可以是事先预设的响应时间。然而,如果处理器当前处理的任务为刚开始处理完成的任务,则如果处理器响应一专用模块的响应方式为中断方式,处理器中的控制器在接收到关于此专用模块的任务完成请求之后马上响应该专用模块。此时,延迟时间和响应时间均为0。在此实施方式中,可以根据处理器当前处理的任务的完成情况来决定响应专用模块的时间点。这样能够兼顾处理器当前处理的任务和专用模块的任务完成请求。In another embodiment, if the task currently processed by the processor is a task that is about to be processed, even if the processor responds to a dedicated module in an interrupt mode, the controller in the processor will not receive information about this dedicated module. After the task of the module completes the request, it responds to the dedicated module immediately. The controller in the processor delays in responding to the task completion request. Among them, the delay time is the response time. The response time can be a preset response time. However, if the task currently processed by the processor is a task that has just started to be processed, if the response mode of the processor in response to a dedicated module is interrupt mode, the controller in the processor receives the task completion request for this dedicated module. Respond to the dedicated module immediately. At this time, the delay time and response time are both 0. In this implementation manner, the time point for responding to the dedicated module can be determined according to the completion of the task currently processed by the processor. In this way, the task currently processed by the processor and the task completion request of the dedicated module can be taken into consideration.
根据本发明的又一实施方式,所述控制寄存器包括一些其他比特。其中,这些比特用于存储对应于处理器关于多个任务完成请求的响应条件。当处理器响应专用模块的响应方式为中断方式时,若满足响应条件,控制器响应专用模块的任务完成请求。响应条件为处理器通过总线接口从专用模块接收到专用模块的任务完成请求并且处理器处理完当前执行的任务。当采用此种实施方式时,可以保证处理器当前处理的任务完成不被专用模块的任务完成请求打扰。此实施方式适用于处理器当前处理的任何和专用模块完成的任务之间不相关的情形。According to another embodiment of the present invention, the control register includes some other bits. Among them, these bits are used to store the response conditions corresponding to the processor's request for completion of multiple tasks. When the response mode of the processor in response to the special module is interrupt mode, if the response condition is met, the controller responds to the task completion request of the special module. The response condition is that the processor receives the task completion request of the dedicated module from the dedicated module through the bus interface and the processor has processed the currently executing task. When this implementation manner is adopted, it can be ensured that the completion of the task currently processed by the processor is not disturbed by the task completion request of the dedicated module. This implementation is suitable for situations where there is no correlation between any task currently processed by the processor and the task completed by the dedicated module.
在一个实施方式中,若处理器响应专用模块的响应方式为查询方式,则控制器发送关于专用模块的至少一查询指令,以确定专用模块是否已经发出任务完成请求。根据本发明的一实施方式,至少一查询指令包括指令类型,以及待查询专用模块编号的编码,专用模块首地址,专用模块数量,以及可伸缩标志中的至少一个;其中,可伸缩标志用于指示至少一查询指令的尺寸是否可伸缩。在一个实施方式中,可伸缩标志指示查询指令的尺寸大小是否可以依据对应于查询指令的专用模块的数量而对应变化。In one embodiment, if the response mode of the processor in response to the dedicated module is the query mode, the controller sends at least one query instruction on the dedicated module to determine whether the dedicated module has issued a task completion request. According to an embodiment of the present invention, the at least one query command includes at least one of the command type, the code of the dedicated module number to be queried, the first address of the dedicated module, the number of dedicated modules, and the scalable flag; wherein the scalable flag is used for Indicates whether the size of at least one query command is scalable. In one embodiment, the scalable flag indicates whether the size of the query command can be correspondingly changed according to the number of dedicated modules corresponding to the query command.
在一实施例中,假设专用模块的数量为N,那么,需要使用log2N个比特对专用模块的编号进行编码。例如,若专用模块的数量N=32,那么,需要使用5个比特对专用模块的编号进行编码。对于具有32比特指令编码宽度的CPU而言,能够容纳一个或者多个的专用模块编号。根据本发明的实施方式,提供三种可采用的查询指令编码方案。请参见图5。图5为图1所示的处理器的查询指 令的示意图。在编码方式500中,查询指令仅包含一个带查询的专用模块编号numi。在编码方式502中,查询指令一次可以编码更多的专用模块编号(例如,专用模块编号numi,numj,numk,nump,numq)。区别于编码方式500,编码方式502能够充分利用指令编码中的保留空间(reserved)。在本实施实例中,假定查询指令本身需要7个比特对指令类型(在图5中用“opcode”表示)进行编码,那么剩余25个比特编码空间。因此,在此实施方式中,一条查询指令可以最多对5个专用模块的ID进行编码。在编码方式504中,定义了待查询专用模块的首地址(在图5中用“start_id”表示),以及待查询专用模块的数量(在图5中用“length”表示)。因此,编码方式504中可支持的最多待查询的专用模块编号(例如,专用模块的地址)为start_id、start_id+1、start_id+2、......、start_id+length。需要说明的是,上述三种查询指令的编码方式可以在一个实施例中同时被使用,并根据查询指令的指令类型,由处理器(例如,CPU)进行解码,并决策出需要待查询的专用模块编号。然而,本发明并非限于此,也可以仅使用上述三种查询指令的编码方法中的任意一种,或者上述三种查询指令的编码方法的任意组合,以增加查询指令使用的灵活性。In an embodiment, assuming that the number of dedicated modules is N, log2N bits need to be used to encode the number of dedicated modules. For example, if the number of dedicated modules N=32, then 5 bits need to be used to encode the number of dedicated modules. For a CPU with a 32-bit instruction code width, it can accommodate one or more dedicated module numbers. According to the embodiment of the present invention, three available query instruction coding schemes are provided. See Figure 5. Fig. 5 is a schematic diagram of the query instruction of the processor shown in Fig. 1. In the encoding method 500, the query instruction only includes a dedicated module number numi with query. In the encoding method 502, the query instruction can encode more dedicated module numbers (for example, the dedicated module numbers numi, numj, numk, nump, numq) at one time. Different from the encoding method 500, the encoding method 502 can make full use of the reserved space (reserved) in the instruction encoding. In this implementation example, assuming that the query instruction itself requires 7 bits to encode the instruction type (represented by "opcode" in FIG. 5), then there are 25 bits of encoding space remaining. Therefore, in this embodiment, one query command can encode the IDs of up to 5 dedicated modules. In the encoding method 504, the first address of the dedicated module to be queried (represented by "start_id" in FIG. 5) and the number of the dedicated module to be queried (represented by "length" in FIG. 5) are defined. Therefore, the maximum number of the dedicated module to be queried (for example, the address of the dedicated module) that can be supported in the encoding method 504 is start_id, start_id+1, start_id+2, ..., start_id+length. It should be noted that the encoding methods of the above three query instructions can be used in one embodiment at the same time, and according to the instruction type of the query instruction, the processor (for example, CPU) decodes it, and decides the specific query that needs to be queried. Module number. However, the present invention is not limited to this, and it is also possible to use only any one of the above-mentioned three query instruction encoding methods, or any combination of the above-mentioned three query instruction encoding methods, to increase the flexibility of using the query instruction.
请参见图6a和图6b。图6a为图1所示的处理器在根据本发明一实施方式的查询方式下响应专用模块的流程图。图6b为图1所示的处理器在根据本发明一实施方式的查询方式下响应专用模块的流程图。在查询方式中,处理器通过算法需求和CPU的空闲状态、算法执行程度,动态调度以确定在何时插入查询指令。例如,当处理器正在执行的任务需要第一专用模块执行的任务的处理结果时,控制器发出所述至少一查询指令。或者,当处理器处于空闲状态时,控制器发出所述至少一查询指令。See Figure 6a and Figure 6b. Fig. 6a is a flowchart of the processor shown in Fig. 1 responding to a dedicated module in a query mode according to an embodiment of the present invention. Fig. 6b is a flowchart of the processor shown in Fig. 1 responding to the dedicated module in the query mode according to an embodiment of the present invention. In the query mode, the processor dynamically schedules to determine when to insert query instructions based on algorithm requirements, the idle state of the CPU, and the degree of algorithm execution. For example, when the task being executed by the processor requires the processing result of the task executed by the first dedicated module, the controller issues the at least one query instruction. Or, when the processor is in an idle state, the controller issues the at least one query instruction.
处理器在对多个专用模块分配任务后,可以先执行其他任务。在图6a中,如步骤S602所示,处理器(例如,CPU)配置专用模块(例如,DMA或其他专用模块),并将关于专用模块的配置信号或其他指令发送给专用模块。此时,如步骤S620所示,专用模块从处理器接收配置信号,并配置专用模块中的配置寄存器,并且专用模块接收其他指令,以执行与这些指令相关的操作。在一个 实施方式中,处理器向专用模块(例如,DMA)发送配置信号,以用于对专用模块中的配置寄存器进行配置。After the processor allocates tasks to multiple dedicated modules, it can perform other tasks first. In FIG. 6a, as shown in step S602, the processor (for example, CPU) configures a dedicated module (for example, DMA or other dedicated module), and sends configuration signals or other instructions regarding the dedicated module to the dedicated module. At this time, as shown in step S620, the dedicated module receives the configuration signal from the processor and configures the configuration register in the dedicated module, and the dedicated module receives other instructions to perform operations related to these instructions. In one embodiment, the processor sends a configuration signal to a dedicated module (e.g., DMA) for configuring the configuration register in the dedicated module.
在步骤S604中,处理器执行其他任务。此时,如步骤S622所示,专用模块在对配置寄存器进行配置之后,开始工作。例如,专用模块开始执行与配置信号或其他指令相关的操作或处理。如步骤S624所示,在完成与配置信号或处理器发送的其他指令相关的操作或处理之后,专用模块发出任务完成请求至处理器。In step S604, the processor performs other tasks. At this time, as shown in step S622, the dedicated module starts to work after configuring the configuration register. For example, the dedicated module starts to perform operations or processing related to configuration signals or other instructions. As shown in step S624, after completing the operation or processing related to the configuration signal or other instructions sent by the processor, the dedicated module sends a task completion request to the processor.
在步骤S606中,处理器继续执行其他任务,直到处理器完成正在执行的任务并进入空闲状态。In step S606, the processor continues to execute other tasks until the processor completes the task being executed and enters an idle state.
如步骤S608所示,在进入空闲状态之后,处理器执行查询指令。在一个实施方式中,当处理器检测到专用处理模块发出的完成请求时,处理器处理专用模块完成的任务。As shown in step S608, after entering the idle state, the processor executes the query instruction. In one embodiment, when the processor detects a completion request issued by the dedicated processing module, the processor processes the task completed by the dedicated module.
如步骤S610所示,在处理完专用模块完成的任务后,处理器执行新的任务。在另一实施例中,在处理完专用模块完成的任务后,处理器进入执行空闲状态。As shown in step S610, after processing the tasks completed by the dedicated module, the processor executes a new task. In another embodiment, after processing the tasks completed by the dedicated module, the processor enters an execution idle state.
在图6b中,如步骤S632所示,处理器(例如,CPU)配置专用模块(例如,DMA),并将关于专用模块的配置信号或其他指令发送给专用模块。此时,如步骤S650所示,专用模块从处理器接收配置信号,并配置专用模块中的配置寄存器,并且专用模块接收其他指令,以执行与这些指令相关的操作。在一个实施方式中,处理器向专用模块(例如,DMA)发送配置信号,以用于对专用模块中的配置寄存器进行配置。In FIG. 6b, as shown in step S632, the processor (for example, CPU) configures a dedicated module (for example, DMA), and sends configuration signals or other instructions related to the dedicated module to the dedicated module. At this time, as shown in step S650, the dedicated module receives the configuration signal from the processor and configures the configuration register in the dedicated module, and the dedicated module receives other instructions to perform operations related to these instructions. In one embodiment, the processor sends a configuration signal to the dedicated module (for example, DMA) for configuring the configuration register in the dedicated module.
在步骤S634中,处理器执行其他任务。此时,如步骤S652所示,专用模块在对配置寄存器进行配置之后,开始工作。例如,专用模块开始执行与配置信号或其他指令相关的操作或处理。In step S634, the processor performs other tasks. At this time, as shown in step S652, the dedicated module starts to work after configuring the configuration register. For example, the dedicated module starts to perform operations or processing related to configuration signals or other instructions.
如步骤S636所示,若处理器发现正在执行的任务需要专用模块的计算结果时,处理器执行查询指令,中止执行当前正在执行的任务并等待专用模块的任务完成请求。如步骤S654所示,在完成与配置信号或处理器发送的其他指令相关的操作或处理之后,专用模块发出任务完成请求至处理器。As shown in step S636, if the processor finds that the task being executed requires the calculation result of the dedicated module, the processor executes the query instruction, suspends the execution of the task currently being executed, and waits for the task completion request of the dedicated module. As shown in step S654, after completing the operation or processing related to the configuration signal or other instructions sent by the processor, the dedicated module sends a task completion request to the processor.
在步骤S638中,在接收到专用模块发出的任务完成请求之后,处理器继续执行步骤S634中的其他任务(即,在步骤S636中中止执行的任务)。在一个实施方式中,当处理器执行完上述任务之后,进入空闲状态。In step S638, after receiving the task completion request issued by the dedicated module, the processor continues to execute other tasks in step S634 (that is, the task that was suspended in step S636). In one embodiment, the processor enters an idle state after executing the above-mentioned tasks.
如步骤S640所示,处理器执行新的任务。As shown in step S640, the processor executes a new task.
需要说明的是,在上述实施方式中,专用模块为图形处理模块,矢量计算模块,浮点处理模块,直接内存访问模块,人工智能加速器,快速傅里叶变换模块中的一个。此外,也可以使用与处理器交互的其他专用模块。在图6a中,使用查询指令时,专用模块已经完成任务,因此,处理器(例如,CPU)不会进入中止状态。然而,在图6b中,根据算法知晓,处理器在当前的其他任务中,需要使用专用模块的计算结果。因此,处理器执行查询指令,并等待专用模块的任务完成请求。区别于中断方式,使用查询方式时,由于处理器在已经完成其他任务进入空闲状态,或者处理器急需专用模块的计算结果的时候,处理器才执行查询指令。因此,使用查询方式能够根据充分提高系统的执行效率。It should be noted that, in the above embodiment, the dedicated module is one of a graphics processing module, a vector calculation module, a floating point processing module, a direct memory access module, an artificial intelligence accelerator, and a fast Fourier transform module. In addition, other dedicated modules that interact with the processor can also be used. In Figure 6a, when the query instruction is used, the dedicated module has completed the task, so the processor (for example, the CPU) will not enter the suspended state. However, in Figure 6b, according to the algorithm, the processor needs to use the calculation results of the dedicated module in other current tasks. Therefore, the processor executes the query instruction and waits for the task completion request of the dedicated module. Different from the interrupt mode, when the query mode is used, the processor executes the query instruction only when the processor enters the idle state after completing other tasks, or the processor urgently needs the calculation result of the dedicated module. Therefore, the use of query mode can fully improve the execution efficiency of the system.
在一实施方式中,当处理器和专用模块共同协作完成同一任务时,若处理器响应专用模块的响应方式为查询方式,则预估处理器需要利用第一专用模块的处理结果的时间点,并根据时间点,确定控制器发送关于第一专用模块的至少一查询指令的时间点,以减少所述处理器的等待时间。并且,该方法能提高处理器的计算效率。In one embodiment, when the processor and the dedicated module cooperate to complete the same task, if the response mode of the processor in response to the dedicated module is the query mode, the time point at which the processor needs to use the processing result of the first dedicated module is estimated, And according to the time point, the time point at which the controller sends at least one query instruction on the first dedicated module is determined, so as to reduce the waiting time of the processor. Moreover, this method can improve the computational efficiency of the processor.
处理器可以进一步包括进程状态寄存器,用于标记所述处理器所处理的进程的进程状态。进一步,根据进程状态寄存器的标记的进程状态,控制器动态调整处理器响应多个专用模块的响应方式,并将调整后的响应方式发送至控制寄存器。在此实施方式中,处理器能够实时调整处理器与多个专用模块的响应方式。The processor may further include a process status register for marking the process status of the process processed by the processor. Further, according to the marked process status of the process status register, the controller dynamically adjusts the response mode of the processor in response to multiple dedicated modules, and sends the adjusted response mode to the control register. In this embodiment, the processor can adjust the response mode of the processor and multiple dedicated modules in real time.
在一个实施方式中,如果处理器当前处理的任务为优先级最高的任务,或者处理器当前处理的任务比多个专用模块的优先级高,则将标记状态寄存器的值设置为“1”。当标记状态寄存器的值设置为“1”时,处理器不响应任何专用模块的响应方式。进一步,也可以用逻辑值“0”来实现上述方案。 例如,如果处理器当前处理的任务为优先级最高的任务,或者处理器当前处理的任务比多个专用模块的优先级高,则将标记状态寄存器的值设置为“0”。处理器不响应任何专用模块的响应方式。In one embodiment, if the task currently processed by the processor is the task with the highest priority, or the task currently processed by the processor is higher than the priority of multiple dedicated modules, the value of the flag status register is set to "1". When the value of the flag status register is set to "1", the processor does not respond to the response mode of any dedicated module. Further, the above scheme can also be realized by using a logical value "0". For example, if the task currently processed by the processor is the task with the highest priority, or the task currently processed by the processor is higher than the priority of multiple dedicated modules, the value of the flag status register is set to "0". The processor does not respond to the response mode of any dedicated module.
在另一实施方式中,如果处理器当前处理的任务为优先级最低的任务,或者处理器当前处理的任务比多个专用模块的优先级低,则将标记状态寄存器的值设置为“1”。只要处理器接收到来自专用模块的任务完全请求,处理器立即响应专用模块的响应方式。进一步,也可以用逻辑值“0”来实现上述方案。如果处理器当前处理的任务为优先级最低的任务,或者处理器当前处理的任务比多个专用模块的优先级低,则将标记状态寄存器的值设置为“0”。只要处理器接收到来自专用模块的任务完全请求,处理器立即响应专用模块的响应方式。In another embodiment, if the task currently processed by the processor is the task with the lowest priority, or the task currently processed by the processor is lower than the priority of multiple dedicated modules, the value of the flag status register is set to "1" . As long as the processor receives a complete task request from the dedicated module, the processor immediately responds to the response mode of the dedicated module. Further, the above scheme can also be realized by using a logical value "0". If the task currently processed by the processor is the task with the lowest priority, or the task currently processed by the processor is lower than the priority of multiple dedicated modules, the value of the flag status register is set to "0". As long as the processor receives a complete task request from the dedicated module, the processor immediately responds to the response mode of the dedicated module.
请参见图7a和图7b。图7a为图1所示的处理器使用根据本发明一实施方式的查询指令的伪代码的示意图。图7b为图1所示的处理器使用根据本发明另一实施方式的查询指令的伪代码的示意图。在使用查询指令时,可以非连续或连续使用多个查询指令。如图7a所示,伪代码700为非连续执行或者单个执行查询指令。具体来说,在代码行702~706中,先配置专用模块i~专用模块k。在代码行708中,处理器执行其他任务。在代码行710中,处理器执行查询指令1。在执行完代码行710之后,处理器执行其他一些代码(图未示)。在代码行712中,处理器执行查询指令2。在执行完代码行712之后,处理器执行其他一些代码(图为示)。在代码行714中,处理器执行其他指令。See Figure 7a and Figure 7b. Fig. 7a is a schematic diagram of the processor shown in Fig. 1 using a pseudo code of a query instruction according to an embodiment of the present invention. Fig. 7b is a schematic diagram of the processor shown in Fig. 1 using a pseudo code of a query instruction according to another embodiment of the present invention. When using query commands, you can use multiple query commands non-continuously or consecutively. As shown in FIG. 7a, the pseudo code 700 is a non-continuous execution or a single execution query instruction. Specifically, in the code lines 702 to 706, the dedicated module i to the dedicated module k are configured first. In code line 708, the processor performs other tasks. In code line 710, the processor executes query instruction 1. After executing the code line 710, the processor executes some other codes (not shown). In code line 712, the processor executes query instruction 2. After executing the code line 712, the processor executes some other codes (shown in the figure). In code line 714, the processor executes other instructions.
如图7b所示,伪代码720为连续执行多条查询指令。具体来说,在代码行722~726中,先配置专用模块i~专用模块k。在代码行728中,处理器执行其他任务。在代码行730中,处理器执行查询指令1。在执行完代码行730之后,处理器执行代码行732。在代码行732中,处理器执行查询指令2。在执行完代码行732之后,处理器执行其他一些代码(图为示)。在代码行734中,处理器执行其他指令。由上述可知,查询指令1和查询指令2为连续执行的两条查询指 令。在一个实施方式中,由于处理器连续使用多个查询指令,因此,只有全部查询指令指定的专用模块都上报任务完成请求后,处理器才能执行后续的其他指令。As shown in FIG. 7b, the pseudo code 720 is to execute multiple query instructions continuously. Specifically, in the code lines 722-726, the dedicated module i to the dedicated module k are configured first. In code line 728, the processor performs other tasks. In code line 730, the processor executes query instruction 1. After executing the code line 730, the processor executes the code line 732. In code line 732, the processor executes query instruction 2. After executing the code line 732, the processor executes some other codes (shown in the figure). In code line 734, the processor executes other instructions. It can be seen from the above that query instruction 1 and query instruction 2 are two query instructions that are executed continuously. In one embodiment, since the processor continuously uses multiple query instructions, the processor can execute other subsequent instructions only after all dedicated modules specified by the query instructions report the task completion request.
根据本发明的一实施方式,可以对多个专用模块使用同一条查询指令。在这条查询指令中,可以不必指出专用模块的ID(例如,专用模块编码),地址信息,或其他属性信息。此时,可以查询多个专用模块中是否有任意一个专用模块发出过任务完成请求。若存在任意一个专用模块发出过任务完成请求,则处理器响应该任务完成请求。如此一来,可以减少指令的种类,节约指令资源和存储查询指令的空间。此外,也可以对多个同一类型的专用模块使用同一查询指令。此时,需要在查询指令中添加关于同一类型的专用模块的识别信息。例如,这些专用模块的编码,或者属性标志。当同一类型的多个专用模块并行处理某些特定任务时,可以仅发送一条指令,以提高指令的利用率。在另一实施方式中,也可以使用多条查询指令,以对不同类型的专用模块应用不同的查询指令。然而,本发明并非限于此。在另一实施方式中,也可以结合上述的几个查询指令的使用方式,以提高查询指令使用的灵活度。According to an embodiment of the present invention, the same query command can be used for multiple dedicated modules. In this query instruction, it is not necessary to indicate the ID of the dedicated module (for example, the dedicated module code), address information, or other attribute information. At this point, you can query whether any one of the multiple dedicated modules has sent a task completion request. If any dedicated module has sent a task completion request, the processor responds to the task completion request. In this way, the types of instructions can be reduced, and instruction resources and space for storing query instructions can be saved. In addition, the same query command can also be used for multiple dedicated modules of the same type. At this time, it is necessary to add identification information about the same type of dedicated module in the query command. For example, the codes of these dedicated modules, or attribute flags. When multiple dedicated modules of the same type process certain specific tasks in parallel, only one instruction can be sent to improve the utilization rate of the instruction. In another embodiment, multiple query commands may also be used to apply different query commands to different types of dedicated modules. However, the present invention is not limited to this. In another embodiment, it is also possible to combine the above-mentioned ways of using several query instructions to improve the flexibility of using the query instructions.
处理器在执行查询指令时,在流水线处理中记录待查询的专用模块编号,并且在处理器中用独立的寄存器记录待查询的专用模块是否发出任务完成的请求。以具有3级流水线结构的处理器(例如,CPU)而言,处理器可分为取指、译码、执行三级流水线。查询指令流水到执行级后,将在执行级上记录待查询的专用模块编号。不同的查询指令编码方式需要不同的记录专用模块的方法。When the processor executes the query instruction, it records the number of the dedicated module to be queried in the pipeline processing, and uses an independent register in the processor to record whether the dedicated module to be queried has issued a task completion request. For a processor (for example, CPU) with a three-stage pipeline structure, the processor can be divided into three-stage pipelines of fetching, decoding, and execution. After the query instruction is streamed to the execution level, the number of the dedicated module to be queried will be recorded on the execution level. Different encoding methods of query instructions require different methods of recording dedicated modules.
根据本发明的一实施方式,处理器进一步包括一个或多个检测单元。也就是说,可以依据不同的硬件设计方案来确定检测单元的数量。根据本发明一实施方式,处理器包括检测模块。检测模块包括至少一个检测单元。其中,检测单元包括比较器和数据寄存器。数据寄存器用于存储与多个专用模块中第一专用模块对应的第一专用模块编号;以及当检测单元接收到第二任务完成请求时,检测单元确定第二任务请求的第二专用模块编号,并通过比较器将第二专用模块编号与数据寄存器中的第一专用模块编号进行比较,以确定 第二任务完成请求对应的专用模块。检测单元进一步包括结果寄存器。检测单元将多个比较结果存储于结果寄存器中。当控制器发送查询指令时,控制器根据结果寄存器中的取值,响应第二任务请求。According to an embodiment of the present invention, the processor further includes one or more detection units. In other words, the number of detection units can be determined according to different hardware design schemes. According to an embodiment of the present invention, the processor includes a detection module. The detection module includes at least one detection unit. Among them, the detection unit includes a comparator and a data register. The data register is used to store the first dedicated module number corresponding to the first dedicated module among the plurality of dedicated modules; and when the detecting unit receives the second task completion request, the detecting unit determines the second dedicated module number of the second task request, The second dedicated module number is compared with the first dedicated module number in the data register through a comparator to determine the dedicated module corresponding to the second task completion request. The detection unit further includes a result register. The detection unit stores multiple comparison results in the result register. When the controller sends the query command, the controller responds to the second task request according to the value in the result register.
检测模块进一步包括与门,所述与门用于接收来自多个第一检测单元的多个第一比较结果,以及所述与门将多个所述第一比较结果进行与运算,以确定多个所述专用模块是否均分别发出对应的任务完成请求。The detection module further includes an AND gate for receiving a plurality of first comparison results from a plurality of first detection units, and the AND gate performs an AND operation on a plurality of the first comparison results to determine a plurality of Whether the dedicated modules all issue corresponding task completion requests respectively.
请参见图8。图8为图1所示的处理器中的检测单元的示意图。如图8所示,处理器进一步检测单元800。检测单元800包括比较器802(例如,比较电路),数据寄存器804和结果寄存器806。专用模块发出任务完成请求之后,检测单元800中的比较器检测该任务完成请求对应的专用模块编号是否和检测单元800对应的数据寄存器804中存储的专用模块编号num一致。若检测单元800的数据寄存器804中存储的专用模块编号和该任务完成请求对应的专用模块编号一致,则检测结果为有效,并将检测结果记录到结果寄存器806中。若检测单元800的数据寄存器806中存储的专用模块编号和该任务完成请求对应的专用模块编号均不一致,则检测结果为无效。See Figure 8. Fig. 8 is a schematic diagram of a detection unit in the processor shown in Fig. 1. As shown in FIG. 8, the processor further detects the unit 800. The detection unit 800 includes a comparator 802 (for example, a comparison circuit), a data register 804 and a result register 806. After the dedicated module issues the task completion request, the comparator in the detection unit 800 detects whether the dedicated module number corresponding to the task completion request is consistent with the dedicated module number num stored in the data register 804 corresponding to the detection unit 800. If the dedicated module number stored in the data register 804 of the detection unit 800 is consistent with the dedicated module number corresponding to the task completion request, the detection result is valid, and the detection result is recorded in the result register 806. If the dedicated module number stored in the data register 806 of the detection unit 800 and the dedicated module number corresponding to the task completion request are both inconsistent, the detection result is invalid.
根据本发明的另一实施方式,处理器进一步包括标记寄存器,用于当所述处理器响应多个专用模块中的第一专用模块的响应方式为查询方式时,存储关于第一专用模块是否发出第一任务完成请求的记录。例如,处理器包括标记寄存器A。其中,标记寄存器A对应于专用模块SFU 0。若检测到专用模块SFU 0发出任务完成请求,则将标记寄存器A的值设置为“1”。否则,标记寄存器A的值为“0”。或者,若检测到专用模块SFU 0发出任务完成请求,则将标记寄存器A的值设置为“0”。否则,标记寄存器A的值为“1”。标记寄存器A可以是结果寄存器806。然而,标记寄存器也可以是能够存储关于专用模块是否发出任务完成请求的记录的其他寄存器。 According to another embodiment of the present invention, the processor further includes a flag register, which is used to store information about whether the first dedicated module of the plurality of dedicated modules is sent by the processor when the response mode of the first dedicated module is a query mode. The record of the first task completion request. For example, the processor includes a flag register A. Among them, the mark register A corresponds to the dedicated module SFU 0 . If it detects that the dedicated module SFU 0 sends a task completion request, the value of the flag register A is set to "1". Otherwise, the value of the flag register A is "0". Or, if the task completion request from the dedicated module SFU 0 is detected, the value of the flag register A is set to "0". Otherwise, the value of the flag register A is "1". The flag register A may be the result register 806. However, the flag register may also be another register capable of storing records about whether the dedicated module issues a task completion request.
根据本发明的一实施方式,若处理器响应多个专用模块的多个响应方式为查询方式,则控制器发送关于多个专用模块的查询指令,以确定多个专用处理模块是否发出多个任务完成请求。若所有任务完成请求均已发出,则处 理器执行与多个专用处理模块相关的任务。According to an embodiment of the present invention, if the multiple response modes of the processor in response to the multiple dedicated modules are the query mode, the controller sends query instructions on the multiple dedicated modules to determine whether the multiple dedicated processing modules issue multiple tasks. Complete the request. If all task completion requests have been issued, the processor executes tasks related to multiple dedicated processing modules.
请参见图9。图9为图1所示的处理器中的检测模块的示意图。处理器进一步包括检测模块900,以用于检测是否多个专用处理模块(例如,专用处理模块SFU 0,SFU 1,......,SFU N)均发出任务完成请求。检测模块900包括检测单元906~检测单元914。检测单元906~检测单元914分别包括比较器(例如,比较电路),数据寄存器和结果寄存器。其中,数据寄存器用于存储待查询的专用模块编码。结果寄存器用于存储比较器比较的结果。例如,检测单元906包括比较器904,数据寄存器916和结果寄存器920。其中,数据寄存器816用于存储专用模块编码numi。需要说明的是,检测单元908~检测单元914具有与检测单元906相似的结构。为求简洁,不再赘述。 See Figure 9. Fig. 9 is a schematic diagram of a detection module in the processor shown in Fig. 1. The processor further includes a detection module 900 for detecting whether multiple dedicated processing modules (for example, dedicated processing modules SFU 0 , SFU 1 ,..., SFU N ) all issue task completion requests. The detection module 900 includes a detection unit 906 to a detection unit 914. The detection unit 906 to the detection unit 914 respectively include a comparator (for example, a comparison circuit), a data register, and a result register. Among them, the data register is used to store the code of the special module to be queried. The result register is used to store the comparison result of the comparator. For example, the detection unit 906 includes a comparator 904, a data register 916, and a result register 920. Among them, the data register 816 is used to store the dedicated module code numi. It should be noted that the detection unit 908 to the detection unit 914 have a structure similar to that of the detection unit 906. For the sake of brevity, I won't repeat it.
专用模块SFU 0,专用模块SFU 1,专用模块SFU 2,......,以及专用模块SFU N中的任意一个发出任务完成请求之后,检测单元906~检测单元914中的每一个比较器分别检测该任务完成请求对应的专用模块编号是否和检测单元对应的数据寄存器中存储的专用模块编号一致。若检测单元906~检测单元914中存在至少一个数据寄存器中存储的专用模块编号和该任务完成请求对应的专用模块编号一致,则检测结果为有效,并将检测结果记录到结果寄存器中。若检测单元906~检测单元914中所有数据寄存器中存储的专用模块编号和该任务完成请求对应的专用模块编号均不一致,则检测结果为无效。当所有检测单元的结果寄存器都有效时,表明检测模块中待查询的所有专用模块都已经完成计算。此时,与门902的输出结果为有效。也就是说,处理器输出查询指令完成信号,查询指令离开执行级,处理器可以继续执行后续指令。 After any one of the special module SFU 0 , the special module SFU 1 , the special module SFU 2 ,..., and the special module SFU N issues a task completion request, each comparator in the detection unit 906 to the detection unit 914 Respectively detect whether the dedicated module number corresponding to the task completion request is consistent with the dedicated module number stored in the data register corresponding to the detection unit. If there is at least one special module number stored in the data register in the detection unit 906 to the detection unit 914 that is consistent with the special module number corresponding to the task completion request, the detection result is valid, and the detection result is recorded in the result register. If the dedicated module numbers stored in all data registers in the detection unit 906 to the detection unit 914 are inconsistent with the dedicated module numbers corresponding to the task completion request, the detection result is invalid. When the result registers of all detection units are valid, it indicates that all the dedicated modules to be queried in the detection modules have completed calculations. At this time, the output result of the AND gate 902 is valid. In other words, the processor outputs the query instruction completion signal, the query instruction leaves the execution level, and the processor can continue to execute subsequent instructions.
请参见图10。图10为根据本发明实施方式的任务响应方法的流程图。该任务响应方法可以应用于处理和专用模块之间的交互。其中专用模块可以为图形处理功能的图形处理模块(Graphic Process Unit,GPU)、矢量计算模块(Vector Process Unit)、浮点处理模块(Floatpoint Process Unit,FPU),直接内存访问模块(Direct MemoryAccess,DMA)或者类似的可处理某个特殊计算任务的专用处理单元,如人工智能(AI)加速器、快速傅里叶变换(Fast Fourier Transform, FFT)等等。结构上,这些专用模块通过内部总线与处理器进行连接并且进行通过内部总线互联。这些专用模块接收来自CPU的指令或者配置信号,也能在完成特定任务后向CPU发送完成请求。See Figure 10. Fig. 10 is a flowchart of a task response method according to an embodiment of the present invention. The task response method can be applied to the interaction between processing and dedicated modules. Among them, the dedicated module can be a graphics processing module (Graphic Process Unit, GPU), a vector computing module (Vector Process Unit), a floating point processing module (Floatpoint Process Unit, FPU), and a direct memory access module (Direct Memory Access, DMA). ) Or similar dedicated processing units that can handle a special computing task, such as artificial intelligence (AI) accelerators, fast Fourier transform (Fast Fourier Transform, FFT), and so on. Structurally, these dedicated modules are connected to the processor through an internal bus and are interconnected through the internal bus. These dedicated modules receive instructions or configuration signals from the CPU, and can also send completion requests to the CPU after completing specific tasks.
如图10所示,该任务响应方法包括步骤S1002至步骤S1006。As shown in FIG. 10, the task response method includes step S1002 to step S1006.
在步骤S1002中,接收耦接于总线接口的、位于处理器外部的多个专用模块对应的多个任务完成请求;In step S1002, receiving multiple task completion requests corresponding to multiple dedicated modules external to the processor coupled to the bus interface;
在步骤S1004中,获取处理器响应多个专用模块的多个响应方式,以及将多个所述响应方式存储于控制寄存器中;其中,多个响应方式不同;以及In step S1004, obtain multiple response modes of the processor in response to multiple dedicated modules, and store the multiple response modes in the control register; wherein, the multiple response modes are different; and
在步骤S1006中,根据控制寄存器中的相应的响应方式,响应多个专用模块的多个任务完成请求。In step S1006, respond to multiple task completion requests of multiple dedicated modules according to the corresponding response mode in the control register.
在一实施例中,多个响应方式包括中断方式和查询方式;以及任务响应方法包括:当处理器响应多个专用模块中的第一专用模块的响应方式为查询方式时,存储关于第一专用模块是否发出第一任务完成请求的记录。In an embodiment, the multiple response modes include an interrupt mode and a query mode; and the task response method includes: when the response mode of the processor responding to the first dedicated module of the plurality of dedicated modules is the query mode, storing information about the first dedicated module Whether the module issues a record of the first task completion request.
在一实施例中,利用多个比特存储对应于处理器关于多个任务完成请求中的至少一任务完成请求的响应时间;响应时间为预设时间;以及当处理器响应第一专用模块的第一响应方式为中断方式时,通过总线接口从第一专用模块接收到至少一任务完成请求后,延迟响应至少一任务完成请求,其中延迟时间为响应时间。In one embodiment, multiple bits are used to store the response time of the processor corresponding to at least one task completion request of the multiple task completion requests; the response time is a preset time; and when the processor responds to the first dedicated module When the one response mode is the interrupt mode, after receiving at least one task completion request from the first dedicated module through the bus interface, the response to the at least one task completion request is delayed, and the delay time is the response time.
在一实施例中,利用多个比特存储对应于处理器关于多个任务完成请求的响应条件;响应条件为处理器通过总线接口从多个专用模块中的第一专用模块接收到第一专用模块的任务完成请求并且处理器处理完当前执行的任务;以及当处理器响应第一专用模块的第一响应方式为中断方式时,若满足响应条件,响应第一专用模块的任务完成请求。In an embodiment, a plurality of bits are used to store a response condition corresponding to the processor's request for completion of a plurality of tasks; the response condition is that the processor receives the first dedicated module from the first dedicated module of the plurality of dedicated modules through the bus interface When the processor responds to the first dedicated module in the interrupt mode, and if the response condition is met, the processor responds to the task completion request of the first dedicated module.
在一实施例中,利用第一比特存储对应于处理器响应多个专用模块中的第一专用模块的第一响应方式;以及其中,当第一比特为0时,第一响应方式为中断方式;当第一比特为1时,第二响应方式为查询方式;或者当第一比特为1时,第一响应方式为中断方式;当第一比特为0时,第二响应方式 为查询方式。In an embodiment, the first bit is used to store the first response mode corresponding to the processor responding to the first dedicated module among the plurality of dedicated modules; and wherein, when the first bit is 0, the first response mode is an interrupt mode ; When the first bit is 1, the second response mode is the query mode; or when the first bit is 1, the first response mode is the interrupt mode; when the first bit is 0, the second response mode is the query mode.
在一实施例中,若处理器响应第一专用模块的响应方式为查询方式,则发送关于第一专用模块的至少一查询指令,以确定第一专用模块是否已经发出至少一第一任务完成请求。In an embodiment, if the response mode of the processor in response to the first dedicated module is the query mode, at least one query instruction regarding the first dedicated module is sent to determine whether the first dedicated module has issued at least one first task completion request .
在一实施例中,当处理器正在执行的任务需要第一专用模块执行的任务的处理结果时,处理器(例如,处理器中的控制器)发出至少一查询指令;或者当处理器处于空闲状态时,处理器(例如,处理器中的控制器)发出至少一查询指令。In an embodiment, when the task being executed by the processor requires the processing result of the task executed by the first dedicated module, the processor (for example, the controller in the processor) issues at least one query instruction; or when the processor is idle In the state, the processor (for example, the controller in the processor) issues at least one query command.
在一实施例中,至少一查询指令包括指令类型,以及专用模块编号的编码,专用模块首地址,专用模块数量,以及可伸缩标志中的至少一个;其中,可伸缩标志用于指示至少一查询指令的尺寸是否可伸缩。In one embodiment, the at least one query instruction includes at least one of the instruction type, the code of the dedicated module number, the first address of the dedicated module, the number of dedicated modules, and the scalable flag; wherein the scalable flag is used to indicate at least one query Whether the size of the instruction is scalable.
在一实施例中,通过数据寄存器,存储与多个专用模块对应的第一专用模块编号;以及当检测单元接收到第二任务完成请求时,通过检测单元确定第二任务请求的第二专用模块编号,并通过比较器将第二专用模块编号与数据寄存器中的第一专用模块编号进行比较,以确定第二任务完成请求对应的专用模块。In one embodiment, the first dedicated module number corresponding to the multiple dedicated modules is stored through the data register; and when the second task completion request is received by the detection unit, the second dedicated module requested by the second task is determined by the detection unit And compare the second dedicated module number with the first dedicated module number in the data register through a comparator to determine the dedicated module corresponding to the second task completion request.
在一实施例中,将多个比较结果存储于结果寄存器中;以及当处理器(例如,处理器中的控制器)发送查询指令时,根据多个结果寄存器中的取值,响应第二任务请求。In an embodiment, a plurality of comparison results are stored in a result register; and when the processor (for example, the controller in the processor) sends a query command, respond to the second task according to the values in the plurality of result registers request.
在一实施例中,接收多个比较结果,并将多个比较结果进行与运算,以确定多个专用模块是否均分别发出对应的任务完成请求。In an embodiment, a plurality of comparison results are received, and the plurality of comparison results are ANDed to determine whether a plurality of dedicated modules respectively issue corresponding task completion requests.
在一实施例中,若处理器响应多个专用模块的多个响应方式为查询方式,则发送关于多个专用模块的查询指令,以确定多个专用处理模块是否发出多个任务完成请求;若所有任务完成请求均已发出,则执行与多个专用处理模块相关的任务。In an embodiment, if the multiple response modes of the processor in response to multiple dedicated modules are query modes, it sends query instructions for multiple dedicated modules to determine whether multiple dedicated processing modules issue multiple task completion requests; if All task completion requests have been issued, and tasks related to multiple dedicated processing modules are executed.
在一实施例中,当处理器和多个专用模块中的第一专用模块共同协作完成同一任务时,若处理器响应第一专用模块的响应方式为查询方式,则预估 处理器需要利用第一专用模块的处理结果的时间点,并根据时间点,确定处理器(例如,处理器中的控制器)发送关于第一专用模块的至少一查询指令的时间点,以减少处理器的等待时间。In an embodiment, when the processor and the first dedicated module of the plurality of dedicated modules cooperate to complete the same task, if the response mode of the processor in response to the first dedicated module is the query mode, it is estimated that the processor needs to use the first dedicated module. The time point of the processing result of a dedicated module, and according to the time point, determine the time point at which the processor (for example, the controller in the processor) sends at least one query instruction about the first dedicated module, so as to reduce the waiting time of the processor .
在一实施例中,标记处理器所处理的进程的进程状态;以及根据进程状态寄存器的标记的进程状态,动态调整处理器响应多个专用模块的响应方式,并存储调整后的响应方式。In one embodiment, the process status of the process processed by the processor is marked; and the response mode of the processor in response to multiple dedicated modules is dynamically adjusted according to the marked process status of the process status register, and the adjusted response mode is stored.
本申请的实施例中还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序中包括程序指令,所述处理器执行所述程序指令,从而执行如下动作:The embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores a computer program, the computer program includes program instructions, and the processor executes the program instructions to execute the following action:
接收耦接于总线接口的、位于处理器外部的多个专用模块对应的多个任务完成请求;Receiving multiple task completion requests corresponding to multiple dedicated modules external to the processor coupled to the bus interface;
获取处理器响应多个专用模块的多个响应方式,以及将多个响应方式存储于控制寄存器中;其中,多个响应方式不同;以及Obtain multiple response modes of the processor in response to multiple dedicated modules, and store the multiple response modes in the control register; wherein, the multiple response modes are different; and
根据控制寄存器中的相应的响应方式,响应多个任务完成请求。According to the corresponding response mode in the control register, respond to multiple task completion requests.
在本申请实施例中,存储器,用于存储计算机程序,并可被配置为存储其它各种数据以支持在其所在设备上的操作。其中,处理器可执行存储器中存储的计算机程序,以实现相应控制逻辑。存储器可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEAROM),可擦除可编程只读存储器(EAROM),可编程只读存储器(AROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。In the embodiment of the present application, the memory is used to store a computer program, and can be configured to store various other data to support operations on the device where it is located. Among them, the processor can execute the computer program stored in the memory to realize the corresponding control logic. The memory can be implemented by any type of volatile or non-volatile storage devices or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEAROM), erasable and programmable Read only memory (EAROM), programmable read only memory (AROM), read only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
在本申请实施例中,处理器可以为任意可执行上述方法逻辑的硬件处理设备。可选地,处理器可以为中央处理器(Central Arocessing Unit,CAU)、图形处理器(GraAhics Arocessing Unit,GAU)或微控制单元(Microcontroller Unit,MCU);也可以为现场可编程门阵列(Field-Arogrammable Gate Array,FAGA)、可编程阵列逻辑器件(Arogrammable Array Logic,AAL)、通用阵列逻辑器件(General Array Logic,GAL)、复杂可编程逻辑器件(ComAlex  Arogrammable Logic Device,CALD)等可编程器件;或者为先进精简指令集(RISC)处理器(Advanced RISC Machines,ARM)或系统芯片(System on ChiA SOC)等等,但不限于此。In the embodiment of the present application, the processor may be any hardware processing device that can execute the foregoing method logic. Optionally, the processor may be a central processing unit (Central Arocessing Unit, CAU), a graphics processor (GraAhics Arocessing Unit, GAU), or a micro control unit (Microcontroller Unit, MCU); it may also be a Field Programmable Gate Array (Field Programmable Gate Array). -Arogrammable Gate Array (FAGA), Programmable Array Logic Device (Arogrammable Array Logic, AAL), General Array Logic Device (General Array Logic, GAL), Complex Programmable Logic Device (ComAlex Arogrammable Logic Device, CALD) and other programmable devices ; Or it may be an Advanced Reduced Instruction Set (RISC) processor (Advanced RISC Machines, ARM) or a system chip (System on ChiA SOC), but not limited to this.
需要说明的是,本文中的“第一”、“第二”等描述,是用于区分不同的消息、设备、模块等,不代表先后顺序,也不限定“第一”和“第二”是不同的类型。It should be noted that the descriptions of "first" and "second" in this article are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, nor do they limit the "first" and "second" Are different types.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present invention can be provided as a method, a system, or a computer program product. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may adopt the form of a computer program product implemented on one or a computer-usable storage medium (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program code.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或流程和/或方框图一个方框或方框中指定的功能的装置。The present invention is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present invention. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that the instructions executed by the processor of the computer or other programmable data processing equipment are used to generate It is a device that realizes the functions specified in a flow or a flow in the flowchart and/or a block or a block in the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或流程和/或方框图一个方框或方框中指定的功能。These computer program instructions can also be stored in a computer-readable memory that can guide a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in a flow chart or a flow and/or a block or a block in the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或流程和/或方框图一个方框或方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing functions specified in a flow or a flow in the flowchart and/or a block or a block in the block diagram.
在一个典型的配置中,计算设备包括一个或处理器(CAU)、输入/输出接口、 网络接口和内存。In a typical configuration, the computing device includes an OR processor (CAU), input/output interface, network interface, and memory.
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。The memory may include non-permanent memory in a computer-readable medium, random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of computer readable media.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(ARAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEAROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology. The information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase change memory (ARAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEAROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个......”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or equipment including a series of elements not only includes those elements, but also includes Other elements that are not explicitly listed, or also include elements inherent to such processes, methods, commodities, or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other same elements in the process, method, commodity, or equipment that includes the element.
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above descriptions are only examples of the present application, and are not used to limit the present application. For those skilled in the art, this application can have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the scope of the claims of this application.

Claims (33)

  1. 一种处理器,其特征在于,包括:A processor, characterized in that it comprises:
    总线接口,能够耦接于所述处理器外部的多个专用模块,用于接收关于多个所述专用模块的多个任务完成请求;以及A bus interface, capable of being coupled to a plurality of dedicated modules outside the processor, and configured to receive a plurality of task completion requests for the plurality of dedicated modules; and
    控制寄存器,用于存储所述处理器响应多个所述专用模块的多个响应方式;其中,多个所述响应方式不同;The control register is used to store a plurality of response modes of the processor in response to a plurality of the dedicated modules; wherein, a plurality of the response modes are different;
    其中,所述处理器根据所述控制寄存器中的相应的所述响应方式,响应多个所述专用模块的多个所述任务完成请求。Wherein, the processor responds to the multiple task completion requests of the multiple dedicated modules according to the corresponding response manner in the control register.
  2. 根据权利要求1所述的处理器,其特征在于,The processor of claim 1, wherein:
    多个所述响应方式包括中断方式和查询方式;以及The multiple response modes include interrupt mode and query mode; and
    所述处理器进一步包括标记寄存器,用于当所述处理器响应多个所述专用模块中的第一专用模块的响应方式为查询方式时,存储关于所述第一专用模块是否发出第一任务完成请求的记录。The processor further includes a flag register for storing information about whether the first dedicated module issues the first task when the response mode of the processor in response to the first dedicated module of the plurality of dedicated modules is a query mode Complete the requested record.
  3. 根据权利要求1所述的处理器,其特征在于,The processor of claim 1, wherein:
    所述控制寄存器包括多个比特;其中,多个所述比特用于存储对应于所述处理器关于多个所述任务完成请求中的至少一任务完成请求的响应时间;以及The control register includes a plurality of bits; wherein, the plurality of bits are used to store a response time corresponding to the processor for at least one task completion request among the plurality of task completion requests; and
    当所述处理器响应第一专用模块的第一响应方式为中断方式时,所述处理器通过所述总线接口从所述第一专用模块接收到所述至少一任务完成请求后,所述处理器延迟响应所述至少一任务完成请求,其中延迟时间为所述响应时间。When the first response mode of the processor in response to the first dedicated module is an interrupt mode, after the processor receives the at least one task completion request from the first dedicated module through the bus interface, the processing The device delays responding to the at least one task completion request, where the delay time is the response time.
  4. 根据权利要求1所述的处理器,其特征在于,The processor of claim 1, wherein:
    所述控制寄存器包括多个比特;其中,多个所述比特用于存储对应于所述处理器关于多个所述任务完成请求的响应条件;以及The control register includes a plurality of bits; wherein, the plurality of bits are used to store response conditions corresponding to the processor with respect to the plurality of task completion requests; and
    当所述处理器响应所述第一专用模块的第一响应方式为中断方式时,若满足所述响应条件,所述处理器响应所述第一专用模块的所述任务完成请求;When the first response mode of the processor in response to the first dedicated module is an interrupt mode, if the response condition is met, the processor responds to the task completion request of the first dedicated module;
    其中所述响应条件为所述处理器通过所述总线接口从多个所述专用模块中的第一专用模块接收到所述第一专用模块的任务完成请求并且所述处理器处理完当前执行的任务。The response condition is that the processor receives the task completion request of the first dedicated module from the first dedicated module of the plurality of dedicated modules through the bus interface, and the processor has processed the currently executing task. task.
  5. 根据权利要求1所述的处理器,其特征在于,The processor of claim 1, wherein:
    所述控制寄存器包括第一比特,所述第一比特用于存储对应于所述处理器响应多个所述专用模块中的第一专用模块的第一响应方式;以及The control register includes a first bit, and the first bit is used to store a first response manner corresponding to the processor responding to the first dedicated module among the plurality of dedicated modules; and
    其中,当所述第一比特为0时,所述第一响应方式为中断方式;当所述第一比特为1时,所述第一响应方式为查询方式;或者Wherein, when the first bit is 0, the first response mode is an interrupt mode; when the first bit is 1, the first response mode is a query mode; or
    当所述第一比特为1时,所述第一响应方式为中断方式;当所述第一比特为0时,所述第一响应方式为查询方式。When the first bit is 1, the first response mode is an interrupt mode; when the first bit is 0, the first response mode is a query mode.
  6. 根据权利要求1所述的处理器,其特征在于,The processor of claim 1, wherein:
    若所述处理器响应第一专用模块的响应方式为查询方式,则所述处理器发送关于所述第一专用模块的至少一查询指令,以确定所述第一专用模块是否已经发出第一任务完成请求。If the response mode of the processor in response to the first dedicated module is a query mode, the processor sends at least one query instruction on the first dedicated module to determine whether the first dedicated module has issued the first task Complete the request.
  7. 根据权利要求6所述的处理器,其特征在于,The processor of claim 6, wherein:
    当所述处理器正在执行的任务需要所述第一专用模块执行的任务的处理结果时,所述处理器发出所述至少一查询指令;或者When the task being executed by the processor requires the processing result of the task executed by the first dedicated module, the processor issues the at least one query instruction; or
    当所述处理器处于空闲状态时,所述处理器发出所述至少一查询指令。When the processor is in an idle state, the processor issues the at least one query instruction.
  8. 根据权利要求6所述的处理器,其特征在于,The processor of claim 6, wherein:
    所述至少一查询指令包括指令类型,以及专用模块编号的编码,专用模块首地址,专用模块数量,以及可伸缩标志中的至少一个;其中,所述可伸缩标志用于指示所述至少一查询指令的尺寸是否可伸缩。The at least one query instruction includes at least one of the instruction type, the code of the dedicated module number, the first address of the dedicated module, the number of dedicated modules, and a scalable flag; wherein, the scalable flag is used to indicate the at least one query Whether the size of the instruction is scalable.
  9. 根据权利要求1所述的处理器,其特征在于,进一步包括:The processor according to claim 1, further comprising:
    检测模块,包括检测单元,其中,所述检测单元,包括比较器和数据寄存器,所述数据寄存器用于存储与多个所述专用模块中第一专用模块对应的第一专用模块编号;以及The detection module includes a detection unit, wherein the detection unit includes a comparator and a data register, and the data register is used to store a first dedicated module number corresponding to a first dedicated module among the plurality of dedicated modules; and
    当所述检测单元接收到第二任务完成请求时,所述检测单元确定所述第 二任务请求的第二专用模块编号,并通过所述比较器将所述第二专用模块编号与所述数据寄存器中的所述第一专用模块编号进行比较,以确定所述第二任务完成请求对应的专用模块。When the detection unit receives the second task completion request, the detection unit determines the second dedicated module number of the second task request, and compares the second dedicated module number with the data through the comparator. The number of the first dedicated module in the register is compared to determine the dedicated module corresponding to the second task completion request.
  10. 根据权利要求9所述的处理器,其特征在于,The processor according to claim 9, wherein:
    所述检测单元进一步包括结果寄存器;以及The detection unit further includes a result register; and
    所述检测单元将比较结果存储于所述结果寄存器中,以及The detection unit stores the comparison result in the result register, and
    当所述处理器发送查询指令时,所述处理器根据所述结果寄存器中的取值,响应所述第二任务请求。When the processor sends the query instruction, the processor responds to the second task request according to the value in the result register.
  11. 根据权利要求9所述的处理器,其特征在于,The processor according to claim 9, wherein:
    所述检测模块进一步包括与门,所述与门用于接收来自多个第一检测单元的多个第一比较结果,以及所述与门将多个所述第一比较结果进行与运算,以确定多个所述专用模块是否均分别发出对应的任务完成请求。The detection module further includes an AND gate for receiving a plurality of first comparison results from a plurality of first detection units, and the AND gate performs an AND operation on a plurality of the first comparison results to determine Whether a plurality of the dedicated modules respectively issue corresponding task completion requests.
  12. 根据权利要求1所述的处理器,其特征在于,The processor of claim 1, wherein:
    若所述处理器响应多个所述专用模块的多个所述响应方式为查询方式,则所述处理器发送关于多个所述专用模块的查询指令,以确定多个所述专用处理模块是否发出多个任务完成请求;If the multiple of the response modes of the processor in response to the multiple of the dedicated modules are query modes, the processor sends a query instruction on the multiple of the dedicated modules to determine whether the multiple of the dedicated processing modules are Issue multiple task completion requests;
    若所有任务完成请求均已发出,则所述处理器执行与多个所述专用处理模块相关的任务。If all task completion requests have been issued, the processor executes tasks related to a plurality of the dedicated processing modules.
  13. 根据权利要求1所述的处理器,其特征在于,The processor of claim 1, wherein:
    当所述处理器和多个所述专用模块中的第一专用模块共同协作完成同一任务时,若所述处理器响应所述第一专用模块的响应方式为查询方式,则预估所述处理器需要利用所述第一专用模块的处理结果的时间点,并根据所述时间点,确定所述处理器发送关于所述第一专用模块的至少一查询指令的时间点,以减少所述处理器的等待时间。When the processor and the first dedicated module of the plurality of dedicated modules cooperate to complete the same task, if the response mode of the processor in response to the first dedicated module is a query mode, the processing is estimated The processor needs to use the time point of the processing result of the first dedicated module, and according to the time point, determine the time point at which the processor sends at least one query instruction about the first dedicated module, so as to reduce the processing time. The waiting time of the device.
  14. 根据权利要求1所述的处理器,其特征在于,进一步包括:The processor according to claim 1, further comprising:
    进程状态寄存器,用于标记所述处理器所处理的进程的进程状态;以及Process status register, used to mark the process status of the process processed by the processor; and
    根据所述进程状态寄存器的标记的所述进程状态,所述处理器动态调整 所述处理器响应多个所述专用模块的响应方式,并将调整后的响应方式发送至所述控制寄存器。According to the process status marked by the process status register, the processor dynamically adjusts the response mode of the processor in response to the plurality of dedicated modules, and sends the adjusted response mode to the control register.
  15. 根据权利要求1所述的处理器,其特征在于,The processor of claim 1, wherein:
    多个所述专用模块为图形处理模块,矢量计算模块,浮点处理模块,直接内存访问模块,人工智能加速器,快速傅里叶变换模块中的一个。The multiple dedicated modules are one of a graphics processing module, a vector calculation module, a floating point processing module, a direct memory access module, an artificial intelligence accelerator, and a fast Fourier transform module.
  16. 一种任务响应方法,其特征在于,包括:A task response method is characterized in that it includes:
    接收耦接于总线接口的、位于处理器外部的多个专用模块对应的多个任务完成请求;Receiving multiple task completion requests corresponding to multiple dedicated modules external to the processor coupled to the bus interface;
    获取所述处理器响应多个所述专用模块的多个响应方式,以及将多个所述响应方式存储于控制寄存器中;其中,多个所述响应方式不同;以及Acquire multiple response modes of the processor in response to the multiple dedicated modules, and store the multiple response modes in a control register; wherein, a plurality of the response modes are different; and
    根据所述控制寄存器中的相应的所述响应方式,响应多个所述专用模块的多个所述任务完成请求。According to the corresponding response manner in the control register, respond to the multiple task completion requests of the multiple dedicated modules.
  17. 根据权利要求16所述的任务响应方法,其特征在于,The task response method of claim 16, wherein:
    多个所述响应方式包括中断方式和查询方式;以及The multiple response modes include interrupt mode and query mode; and
    所述任务响应方法包括:The task response method includes:
    当所述处理器响应多个所述专用模块中的第一专用模块的响应方式为查询方式时,存储关于所述第一专用模块是否发出第一任务完成请求的记录。When the response mode of the processor in response to the first dedicated module among the plurality of dedicated modules is a query mode, a record about whether the first dedicated module issues a first task completion request is stored.
  18. 根据权利要求16所述的任务响应方法,其特征在于,The task response method of claim 16, wherein:
    利用多个比特存储对应于所述处理器关于多个所述任务完成请求中的至少一任务完成请求的响应时间;所述响应时间为预设时间;以及Using a plurality of bits to store a response time corresponding to the processor with respect to at least one task completion request among the plurality of task completion requests; the response time is a preset time; and
    当所述处理器响应第一专用模块的第一响应方式为中断方式时,通过所述总线接口从所述第一专用模块接收到所述至少一任务完成请求后,延迟响应所述至少一任务完成请求,其中延迟时间为所述响应时间。When the first response mode of the processor in response to the first dedicated module is the interrupt mode, after receiving the at least one task completion request from the first dedicated module through the bus interface, it delays responding to the at least one task Complete the request, where the delay time is the response time.
  19. 根据权利要求16所述的任务响应方法,其特征在于,The task response method of claim 16, wherein:
    利用多个比特存储对应于所述处理器关于多个所述任务完成请求的响应条件;所述响应条件为所述处理器通过所述总线接口从多个所述专用模块中的第一专用模块接收到所述第一专用模块的任务完成请求并且所述处理器处 理完当前执行的任务;以及A plurality of bits are used to store a response condition corresponding to the processor with respect to the plurality of task completion requests; the response condition is that the processor obtains the first dedicated module from the plurality of dedicated modules through the bus interface Receiving the task completion request of the first dedicated module and the processor has processed the currently executing task; and
    当所述处理器响应所述第一专用模块的第一响应方式为中断方式时,若满足所述响应条件,响应所述第一专用模块的所述任务完成请求。When the first response mode of the processor in response to the first dedicated module is an interrupt mode, if the response condition is satisfied, the processor responds to the task completion request of the first dedicated module.
  20. 根据权利要求16所述的任务响应方法,其特征在于,The task response method of claim 16, wherein:
    利用第一比特存储对应于所述处理器响应多个所述专用模块中的第一专用模块的第一响应方式;以及Using the first bit to store the first response mode corresponding to the processor in response to the first dedicated module among the plurality of dedicated modules; and
    其中,当所述第一比特为0时,所述第一响应方式为中断方式;当所述第一比特为1时,所述第二响应方式为查询方式;或者Wherein, when the first bit is 0, the first response mode is an interrupt mode; when the first bit is 1, the second response mode is a query mode; or
    当所述第一比特为1时,所述第一响应方式为中断方式;当所述第一比特为0时,所述第二响应方式为查询方式。When the first bit is 1, the first response mode is an interrupt mode; when the first bit is 0, the second response mode is a query mode.
  21. 根据权利要求16所述的任务响应方法,其特征在于,The task response method of claim 16, wherein:
    若所述处理器响应第一专用模块的响应方式为查询方式,则发送关于所述第一专用模块的至少一查询指令,以确定所述第一专用模块是否已经发出至少一第一任务完成请求。If the response mode of the processor in response to the first dedicated module is the query mode, at least one query instruction on the first dedicated module is sent to determine whether the first dedicated module has issued at least one first task completion request .
  22. 根据权利要求21所述的任务响应方法,其特征在于,The task response method of claim 21, wherein:
    当处理器正在执行的任务需要所述第一专用模块执行的任务的处理结果时,所述处理器发出所述至少一查询指令;或者When the task being executed by the processor requires the processing result of the task executed by the first dedicated module, the processor issues the at least one query instruction; or
    当所述处理器处于空闲状态时,所述处理器发出所述至少一查询指令。When the processor is in an idle state, the processor issues the at least one query instruction.
  23. 根据权利要求21所述的任务响应方法,其特征在于,The task response method of claim 21, wherein:
    所述至少一查询指令包括指令类型,以及专用模块编号的编码,专用模块首地址,专用模块数量,以及可伸缩标志中的至少一个;其中,所述可伸缩标志用于指示所述至少一查询指令的尺寸是否可伸缩。The at least one query instruction includes at least one of the instruction type, the code of the dedicated module number, the first address of the dedicated module, the number of dedicated modules, and a scalable flag; wherein, the scalable flag is used to indicate the at least one query Whether the size of the instruction is scalable.
  24. 根据权利要求16所述的任务响应方法,其特征在于,The task response method of claim 16, wherein:
    通过数据寄存器,存储与多个所述专用模块对应的第一专用模块编号;以及Store the first dedicated module numbers corresponding to the plurality of dedicated modules through a data register; and
    当检测单元接收到第二任务完成请求时,通过所述检测单元确定所述第二任务请求的第二专用模块编号,并通过比较器将所述第二专用模块编号与 所述数据寄存器中的所述第一专用模块编号进行比较,以确定所述第二任务完成请求对应的专用模块。When the detection unit receives the second task completion request, the detection unit determines the second dedicated module number of the second task request, and compares the second dedicated module number with the number in the data register through a comparator. The first dedicated module numbers are compared to determine the dedicated module corresponding to the second task completion request.
  25. 根据权利要求24所述的任务响应方法,其特征在于,The task response method of claim 24, wherein:
    将多个比较结果存储于结果寄存器中;以及Store multiple comparison results in the result register; and
    当处理器发送查询指令时,根据多个所述结果寄存器中的取值,响应所述第二任务请求。When the processor sends the query instruction, it responds to the second task request according to the values in the multiple result registers.
  26. 根据权利要求25所述的任务响应方法,其特征在于,The task response method of claim 25, wherein:
    接收多个所述比较结果,并将多个所述比较结果进行与运算,以确定多个所述专用模块是否均分别发出对应的任务完成请求。Receive a plurality of the comparison results, and perform an AND operation on the plurality of the comparison results to determine whether a plurality of the dedicated modules respectively issue corresponding task completion requests.
  27. 根据权利要求16所述的任务响应方法,其特征在于,The task response method of claim 16, wherein:
    若所述处理器响应多个所述专用模块的多个所述响应方式为查询方式,则发送关于多个所述专用模块的查询指令,以确定多个所述专用处理模块是否发出多个任务完成请求;If the plurality of the response modes of the processor in response to the plurality of the dedicated modules are query modes, then send a query instruction on the plurality of the dedicated modules to determine whether the plurality of the dedicated processing modules issue multiple tasks Complete the request;
    若所有任务完成请求均已发出,则执行与多个所述专用处理模块相关的任务。If all task completion requests have been issued, then tasks related to a plurality of the dedicated processing modules are executed.
  28. 根据权利要求16所述的任务响应方法,其特征在于,The task response method of claim 16, wherein:
    当所述处理器和多个所述专用模块中的第一专用模块共同协作完成同一任务时,若所述处理器响应所述第一专用模块的响应方式为查询方式,则预估所述处理器需要利用所述第一专用模块的处理结果的时间点,并根据所述时间点,确定所述处理器发送关于所述第一专用模块的至少一查询指令的时间点,以减少所述处理器的等待时间。When the processor and the first dedicated module of the plurality of dedicated modules cooperate to complete the same task, if the response mode of the processor in response to the first dedicated module is a query mode, the processing is estimated The processor needs to use the time point of the processing result of the first dedicated module, and according to the time point, determine the time point at which the processor sends at least one query instruction about the first dedicated module, so as to reduce the processing time. The waiting time of the device.
  29. 根据权利要求16所述的任务响应方法,其特征在于,The task response method of claim 16, wherein:
    标记所述处理器所处理的进程的进程状态;以及Mark the process status of the process processed by the processor; and
    根据所述进程状态寄存器的标记的所述进程状态,动态调整所述处理器响应多个所述专用模块的响应方式,并存储调整后的响应方式。According to the process status marked by the process status register, the response mode of the processor in response to the plurality of dedicated modules is dynamically adjusted, and the adjusted response mode is stored.
  30. 根据权利要求16所述的任务响应方法,其特征在于,The task response method of claim 16, wherein:
    多个所述专用模块为图形处理模块,矢量计算模块,浮点处理模块,直 接内存访问模块,人工智能加速器,快速傅里叶变换模块中的一个。The multiple dedicated modules are one of a graphics processing module, a vector calculation module, a floating point processing module, a direct memory access module, an artificial intelligence accelerator, and a fast Fourier transform module.
  31. 一种存储有计算机指令的计算机可读存储介质,其特征在于,当所述计算机指令被一个或多个处理器执行时,所述一个或多个处理器执行包括以下的动作:A computer-readable storage medium storing computer instructions, wherein when the computer instructions are executed by one or more processors, the one or more processors perform actions including the following:
    接收耦接于总线接口的、位于处理器外部的多个专用模块对应的多个任务完成请求;Receiving multiple task completion requests corresponding to multiple dedicated modules external to the processor coupled to the bus interface;
    获取所述处理器响应多个所述专用模块的多个响应方式,以及将多个所述响应方式存储于控制寄存器中;其中,多个所述响应方式不同;以及Acquire multiple response modes of the processor in response to the multiple dedicated modules, and store the multiple response modes in a control register; wherein, a plurality of the response modes are different; and
    根据所述控制寄存器中的相应的所述响应方式,响应多个所述任务完成请求。Responding to a plurality of the task completion requests according to the corresponding response manner in the control register.
  32. 一种可移动平台,其特征在于,所述可移动平台包括根据权利要求1~15所述的处理器。A movable platform, wherein the movable platform comprises the processor according to claims 1-15.
  33. 一种相机,其特征在于,所述相机包括根据权利要求1~15所述的处理器。A camera, characterized in that the camera comprises the processor according to claims 1-15.
PCT/CN2019/129100 2019-12-27 2019-12-27 Processor, task response method, movable platform, and camera WO2021128249A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980050197.0A CN112513809A (en) 2019-12-27 2019-12-27 Processor, task response method, movable platform and camera
PCT/CN2019/129100 WO2021128249A1 (en) 2019-12-27 2019-12-27 Processor, task response method, movable platform, and camera

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/129100 WO2021128249A1 (en) 2019-12-27 2019-12-27 Processor, task response method, movable platform, and camera

Publications (1)

Publication Number Publication Date
WO2021128249A1 true WO2021128249A1 (en) 2021-07-01

Family

ID=74924085

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/129100 WO2021128249A1 (en) 2019-12-27 2019-12-27 Processor, task response method, movable platform, and camera

Country Status (2)

Country Link
CN (1) CN112513809A (en)
WO (1) WO2021128249A1 (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1342940A (en) * 2000-09-06 2002-04-03 国际商业机器公司 Coprocessor with multiple logic interface
CN101567078A (en) * 2009-03-27 2009-10-28 西安交通大学 Dual-bus visual processing chip architecture
CN101980149A (en) * 2010-10-15 2011-02-23 无锡中星微电子有限公司 Main processor and coprocessor communication system and communication method
CN102141904A (en) * 2011-03-31 2011-08-03 杭州中天微系统有限公司 Data processor supporting interrupt shielding instruction
CN103019835A (en) * 2011-09-26 2013-04-03 同方股份有限公司 System and method for optimizing interruption resources in multi-core processor
CN205899270U (en) * 2016-06-23 2017-01-18 陕西宝成航空仪表有限责任公司 Two redundant ARINC429 bus interface systems of high reliability
US20180143935A1 (en) * 2016-11-23 2018-05-24 Infineon Technologies Austria Ag Bus Device with Programmable Address

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5796984A (en) * 1996-01-26 1998-08-18 Dell Usa, L.P. Operating system independent apparatus and method for eliminating peripheral device functions
US7774563B2 (en) * 2007-01-09 2010-08-10 International Business Machines Corporation Reducing memory access latency for hypervisor- or supervisor-initiated memory access requests
CN104951412B (en) * 2015-06-06 2018-01-02 华为技术有限公司 A kind of storage device accessed by rambus

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1342940A (en) * 2000-09-06 2002-04-03 国际商业机器公司 Coprocessor with multiple logic interface
CN101567078A (en) * 2009-03-27 2009-10-28 西安交通大学 Dual-bus visual processing chip architecture
CN101980149A (en) * 2010-10-15 2011-02-23 无锡中星微电子有限公司 Main processor and coprocessor communication system and communication method
CN102141904A (en) * 2011-03-31 2011-08-03 杭州中天微系统有限公司 Data processor supporting interrupt shielding instruction
CN103019835A (en) * 2011-09-26 2013-04-03 同方股份有限公司 System and method for optimizing interruption resources in multi-core processor
CN205899270U (en) * 2016-06-23 2017-01-18 陕西宝成航空仪表有限责任公司 Two redundant ARINC429 bus interface systems of high reliability
US20180143935A1 (en) * 2016-11-23 2018-05-24 Infineon Technologies Austria Ag Bus Device with Programmable Address

Also Published As

Publication number Publication date
CN112513809A (en) 2021-03-16

Similar Documents

Publication Publication Date Title
US10877766B2 (en) Embedded scheduling of hardware resources for hardware acceleration
CN107046508B (en) Message receiving method and network equipment
US8250164B2 (en) Query performance data on parallel computer system having compute nodes
JP6387571B2 (en) Apparatus, method, system, program, and computer-readable recording medium
JP6776696B2 (en) Parallel information processing equipment, information processing methods, and programs
US8478926B1 (en) Co-processing acceleration method, apparatus, and system
US20210103818A1 (en) Neural network computing method, system and device therefor
US20120089761A1 (en) Apparatus and method for processing an interrupt
US20170068571A1 (en) Fine-Grained Heterogeneous Computing
US9152587B2 (en) Virtualized interrupt delay mechanism
US9715392B2 (en) Multiple clustered very long instruction word processing core
TWI754310B (en) System and circuit of pure functional neural network accelerator
WO2021128249A1 (en) Processor, task response method, movable platform, and camera
CN112527559B (en) Internet of things data backup method and device
CN111078587B (en) Memory allocation method and device, storage medium and electronic equipment
CN112799598A (en) Data processing method, processor and electronic equipment
CN112084023A (en) Data parallel processing method, electronic equipment and computer readable storage medium
CN105718990A (en) Cell array calculation system and communication method between cells
WO2021179222A1 (en) Scheduling device, scheduling method, accelerating system and unmanned aerial vehicle
CN111340202B (en) Operation method, device and related product
CN113626080B (en) Data processing device and related product
CN116257471A (en) Service processing method and device
US20130238877A1 (en) Core system for processing an interrupt and method for transmission of vector register file data therefor
CN105718993A (en) Cell array calculation system and communication method therein
US20120047285A1 (en) Interrupt-based command processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19957118

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19957118

Country of ref document: EP

Kind code of ref document: A1