CN111666103A

CN111666103A - Instruction processing method and device

Info

Publication number: CN111666103A
Application number: CN202010381861.4A
Authority: CN
Inventors: 刘浩楠
Original assignee: New H3C Semiconductor Technology Co Ltd
Current assignee: New H3C Semiconductor Technology Co Ltd
Priority date: 2020-05-08
Filing date: 2020-05-08
Publication date: 2020-09-15

Abstract

The application provides an instruction processing method and device, wherein the method is applied to a hardware acceleration module and comprises the following steps: receiving an acceleration instruction sent by a CPU, wherein the acceleration instruction comprises attribute information of a task to be executed; constructing a first task instruction for completing the task to be executed according to the attribute information of the task to be executed, wherein the first task instruction comprises a task target and a data storage position; and sending the constructed first task instruction to a corresponding task completion module so that the task completion module completes a corresponding task to be executed according to the task target, and sending an execution result and the data storage position to a storage control unit, wherein the storage control unit stores the execution result to a corresponding storage unit according to the data storage position so that the CPU obtains the execution result from the storage unit.

Description

Instruction processing method and device

Technical Field

The present application relates to the field of communications technologies, and in particular, to a method and an apparatus for processing an instruction.

Background

In a computer system, a program usually has a CPU issuing a certain instruction as a start-stop boundary. Most programs can be viewed as an ordered set of instructions. For some programs with lengthy flow or complicated operations, the relevant instructions occupy a significant portion of the entire instruction set space in order to complete the corresponding operations.

If a program has a relatively fixed instruction issuing sequence, that is, a complex workflow is completely fixed on a frame (for example, a special register is called every time a certain program is executed, and only data read from the register is different every time), a CPU generally performs a large amount of repetitive work on such programs or instructions (for example, the CPU needs to issue a plurality of instructions to complete a table lookup, and the construction of each instruction needs a plurality of instructions to complete steps of logical operation, logical judgment, address calculation, and the like), so that great waste is caused on the performance of instruction set space, CPU working efficiency, and the like.

Disclosure of Invention

In view of this, the present application provides an instruction processing method and apparatus, so as to solve the problem in the prior art that, for some fixed flows, a CPU performs a large amount of repetitive work on the fixed flows, which causes great waste on performance such as instruction set space and CPU operating efficiency.

In a first aspect, the present application provides an instruction processing method, where the method is applied to a hardware acceleration module, and the method includes:

receiving an acceleration instruction sent by a CPU, wherein the acceleration instruction comprises an acceleration mark and attribute information of a task to be executed;

constructing a first task instruction for completing the task to be executed according to the acceleration identification and the attribute information of the task to be executed, wherein the first task instruction comprises a task target and a data storage position;

and sending the constructed first task instruction to a corresponding task completion module so that the task completion module completes a corresponding task to be executed according to the task target, and sending an execution result and the data storage position to a storage control unit, wherein the storage control unit stores the execution result to a corresponding storage unit according to the data storage position so that the CPU obtains the execution result from the storage unit.

In a second aspect, the present application provides an instruction processing apparatus, the apparatus comprising:

the system comprises a receiving unit, a processing unit and a processing unit, wherein the receiving unit is used for receiving an acceleration instruction sent by a CPU (central processing unit), and the acceleration instruction comprises attribute information of a task to be executed;

the construction unit is used for constructing a first task instruction for completing the task to be executed according to the attribute information of the task to be executed, and the first task instruction comprises a task target and a data storage position;

and the sending unit is used for sending the constructed first task instruction to a corresponding task completion module so that the task completion module completes a corresponding task to be executed according to the task target, and sending an execution result and the data storage position to the storage control unit, and the storage control unit stores the execution result to the corresponding storage unit according to the data storage position so that the CPU obtains the execution result from the storage unit.

Therefore, by applying the instruction processing method and the apparatus provided by the present application, after the hardware acceleration module receives the acceleration instruction sent by the CPU, a first task instruction for completing the task to be executed is constructed according to the acceleration flag included in the acceleration instruction and the attribute information of the task to be executed. The task instruction comprises a task target and a data storage position. The hardware acceleration module sends the task instruction to the corresponding task completion module, so that the task completion module completes the corresponding task to be executed according to the task target, the execution result and the data storage position are sent to the storage control unit, the storage control unit stores the execution result to the corresponding storage unit according to the data storage position, and the CPU obtains the execution result from the storage unit.

The mode solves the problem that in the prior art, for certain fixed flows, a large amount of repetitive work exists in the CPU, and great waste is caused to the performance such as instruction set space, CPU working efficiency and the like. The aim of simplifying the instruction set is achieved, the number of specific flow instructions is reduced, and the working efficiency of the CPU is improved.

Drawings

Fig. 1 is a flowchart of an instruction processing method according to an embodiment of the present application;

fig. 2 is a structural diagram of an instruction processing apparatus according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the corresponding listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

The following describes the instruction processing method provided in the embodiment of the present application in detail. Referring to fig. 1, fig. 1 is a flowchart of an instruction processing method according to an embodiment of the present disclosure. The method is applied to a hardware acceleration module and specifically comprises the following steps.

And step 110, receiving an acceleration instruction sent by the CPU, wherein the acceleration instruction comprises an acceleration mark and attribute information of a task to be executed.

Specifically, in the embodiment of the present application, the CPU supports a multi-threaded operating mode. Supposing that a thread in an active state in a CPU is about to enter a task taking or task completing process, the CPU generates an acceleration instruction according to a current task, wherein the acceleration instruction comprises an acceleration mark and attribute information of a task to be executed.

The CPU sends an acceleration instruction to the hardware acceleration module. The hardware acceleration module obtains an acceleration flag from the acceleration instruction. And finishing self-starting according to the acceleration identifier.

The attribute information of the task to be executed specifically refers to information indicating characteristics of the task to be executed, for example, information indicating characteristics of an execution operation such as reading, writing, and comparison.

In the embodiment of the present application, the acceleration instruction further includes a task target, a task address, control information, and the like. The control information specifically refers to the number of times that the hardware acceleration module constructs the task instruction subsequently. For example, if the control information is 3, the hardware acceleration module needs to construct 3 task instructions subsequently, and complete 3 acceleration processes.

And 120, constructing a first task instruction for completing the task to be executed according to the acceleration identifier and the attribute information of the task to be executed, wherein the first task instruction comprises a task target and a data storage position.

Specifically, after receiving the acceleration instruction, the hardware acceleration module obtains the acceleration flag and the attribute information of the task to be executed. And constructing a first task instruction for finishing the task to be executed according to the attribute information of the task to be executed. Wherein the piece of task instruction comprises a task target primary data storage location.

Further, if the acceleration instruction does not include the acceleration flag, the hardware acceleration module executes the acceleration instruction according to the existing flow for completing the instruction issued by the CPU, which is similar to the existing instruction completing process and will not be repeated here.

Still further, the acceleration instruction in step 110 further includes a thread number, and the hardware acceleration module takes over the thread described in step 110 from the CPU according to the thread number before constructing a task instruction to complete the task to be executed, and then completes a series of fixed works (e.g., Load/Store, DMA, etc.) instead of the CPU. After the thread is released, the CPU switches to other threads and maintains the thread in a working state. Therefore, the purpose of simplifying the software processing flow is achieved, and the computing performance of the CPU is optimized.

Step 130, sending the constructed first task instruction to a corresponding task completion module, so that the task completion module completes a corresponding task to be executed according to the task target, and sending an execution result and the data storage position to a storage control unit, where the storage control unit stores the execution result to a corresponding storage unit according to the data storage position, so that the CPU obtains the execution result from the storage unit.

Specifically, after the hardware acceleration module replaces the CPU to construct a first task instruction, the hardware acceleration module sends the task instruction to the corresponding task completion module. Therefore, after the task completion module receives the task instruction, the task target is obtained from the task instruction. And executing the task to be executed according to the task target, and after the execution is finished, sending an execution result (which may specifically include a final execution result, intermediate process data and the like) and a data storage position to the storage control unit.

And after receiving the execution result and the data storage position, the storage control unit stores the execution result into the corresponding storage unit according to the data storage position, so that the CPU acquires the execution result from the storage unit.

Further, after the hardware acceleration module sends the first task instruction to the corresponding task completion module, the hardware acceleration module periodically scans the storage control unit to check whether an execution result exists at the storage control unit. The hardware acceleration module can judge whether the execution result stored by the storage control unit has the execution result corresponding to the thread number according to the thread number. And if so, the hardware acceleration module acquires an execution result corresponding to the thread number. The execution result comprises attribute information of a task to be executed required by constructing a second task instruction, and the first task instruction and the second task instruction belong to the same task.

It can be understood that, after obtaining the execution result corresponding to the first task instruction, the hardware acceleration module obtains the attribute information of the task to be executed again from the execution result. According to the acquired attribute information, the hardware acceleration module constructs a second task instruction, and repeatedly executes the steps 120-130 until the constructed task instruction has the same numerical value as the control information.

Further, the task completion module may specifically refer to a generic name of a module for completing a task to be executed, and may include a task target analysis module, a task target forwarding module, and a task execution module. The task target analysis module and the task target forwarding module can be arranged in the thread management module.

Further, before the foregoing step 110, a process that the hardware acceleration module negotiates with the CPU a storage address for storing an execution result in the storage unit is further included, and by this process, the hardware acceleration module only needs to track a completion condition of the acceleration instruction to prompt the CPU to finish the acceleration process, and does not need to provide an address for storing the execution result in the register, and the CPU will autonomously obtain all the execution results in the agreed address.

The CPU and the hardware acceleration module agree on a storage address of an execution result, wherein the address can be a designated address of the CPU in a storage unit, and the address is an address included in the storage unit directly accessible to the CPU. The CPU notifies the designated address to the hardware acceleration module so that the hardware acceleration module determines the address as an address storing the execution result.

In the embodiment of the application, the hardware acceleration module is in the thread management module, and the most important function of the hardware acceleration module is to simulate the instruction of the CPU. In the application, besides the hardware acceleration module, the thread management module only modifies the original logic circuit as little as possible (for example, a plurality of modules involved in the hardware acceleration process need to check the progress of the current acceleration process so as to confirm whether the modules need to participate in the next step, and the newly added module will complete the tracking work of the acceleration process). Related or similar behaviors are often put in the same module to be realized in the hardware acceleration module, and the capability of selecting a branch in the behaviors is sacrificed by fake instructions, so that a hardware tracking mechanism is needed to ensure that the hardware acceleration module can accurately grasp what behavior needs to be realized at any time in a process that a CPU cannot participate. The tracking function is realized by setting a special state (Status) register, data processing is needed in all the processes, and the analyzing module can see the state of the current thread to judge the progress of the accelerated process. The status register is updated when the forged instruction is issued and the instruction is replied and written back.

The hardware instruction acceleration is adopted instead of the specific function module acceleration, so that specific functions such as memory access and memory comparison are guaranteed to be completed on the premise that the corresponding function module is not required to be added. The CPU peripheral task completion module for executing each acceleration instruction still identifies the original instruction format, and the hardware acceleration module only needs to construct the acceleration instruction with the format consistent with the instruction format issued by the CPU to replace the information such as the specific task target, the data storage position and the like originally provided for the task completion modules by the CPU.

It should be noted that each instruction involved in the acceleration flow has a certain dependency, the result of completion of execution of each instruction will participate in constructing the next instruction, otherwise instruction acceleration will lose meaning. All data related to the execution result is written back by the CPU and the hardware acceleration module in a conventional mode. The hardware acceleration module only needs to track the completion condition of the acceleration instruction and prompts the CPU to finish the acceleration process, but does not need to provide an address for storing an execution result in a register, and the CPU can spontaneously acquire all the execution results in the appointed address. By means of the prior arrangement mode, the CPU can more freely access any storage address without being limited by the instruction format in the process of searching the execution result, and can change the instruction reply format by using hardware logic and delete \ increase the information which is not required by software.

Therefore, by applying the instruction processing method and the instruction processing device provided by the application, after the hardware acceleration module receives the acceleration instruction sent by the thread management module, at least one task instruction for completing the task to be executed is constructed according to the attribute information of the task to be executed, which is included in the acceleration instruction. Each task instruction comprises a task target and a data storage position. And the hardware acceleration module sends the task instruction to the corresponding task completion module, so that the task completion module completes the corresponding task to be executed according to the task target and stores the execution result to the address corresponding to the data storage position. And after receiving a first notification message sent by the task completion module, the hardware acceleration module stores the execution result stored in the data storage position into the RAM, and the CPU can directly access the RAM and obtain the execution result from the RAM.

The method solves the problem that in the prior art, for certain fixed flows, a large amount of repetitive work exists in the CPU, and great waste is caused to the performance such as an instruction set space, a CPU instruction execution cycle and the like. The aim of simplifying the instruction set is achieved, the number of specific flow instructions is reduced, and the execution time of a CPU is reduced.

Optionally, after the step 150, a process of sending a notification message to the CPU by the hardware acceleration module is further included, and by this process, the CPU is prompted to finish the acceleration flow.

Specifically, the hardware acceleration module generates a notification message after completing all task instructions of the acceleration flow. The hardware acceleration module sends a notification message to the CPU.

And after receiving the notification message, the CPU determines that the hardware acceleration module has constructed all task instructions of the acceleration flow. The CPU accesses the appointed address in the storage unit appointed in advance by the hardware acceleration module, and obtains an execution result from the appointed address.

Optionally, in this embodiment of the present application, a process of avoiding other instructions issued by the CPU by the hardware acceleration module is further included.

Specifically, in the embodiment of the present application, a high priority is set for an instruction issued by a CPU (software, microcode layer), and when the CPU issues the instruction with the high priority and the hardware acceleration module issues a forged task instruction, the instruction issued by the CPU is avoided. Therefore, the storage unit is also required to store the task instruction forged by the current hardware acceleration module, and the hardware acceleration module waits for the interval of the instruction issued by the CPU and issues the forged task instruction to the corresponding task completion module.

Further, the hardware acceleration module receives a pause indication; according to the pause indication, the hardware acceleration module stores a currently constructed task instruction into a storage space reserved for a thread (the thread is indicated by a thread number included in the acceleration instruction) in a storage unit; and after receiving a starting instruction, the hardware acceleration module sends a currently constructed task instruction to the corresponding task completion module.

It can be understood that the storage unit provides an independent storage space for each thread, and under this condition, the CPU can apply the hardware acceleration module for a service that multiple threads simultaneously perform task fetching/task completing instruction acceleration.

It should be noted that the module for sending the pause instruction and the start instruction may specifically be any one of a task target analysis module, a task target forwarding module, and a task execution module, that is, the module for executing any link task except for the hardware acceleration module in the acceleration process is sent.

When determining that the instruction issued by the CPU exists currently or the instruction issued by the CPU is not issued currently, other task modules can trigger to generate a pause instruction or a start instruction and send the pause instruction or the start instruction to the hardware acceleration module.

Optionally, in this embodiment of the present application, after the hardware acceleration module completes all task instructions of the acceleration flow, the hardware acceleration module returns the thread to the CPU. When the CPU is switched to the thread subsequently, the CPU can directly process data, so that the process of accessing the remote memory for many times is omitted, and the time delay caused by reading and writing of the memory and routing is saved.

It should be noted that, in the embodiment of the present application, if there is no correlation among a plurality of instructions in a program, the hardware acceleration module cannot determine data or an execution result required by the CPU, and thus cannot replace the CPU to complete issuing a fake task instruction. Therefore, in the embodiment of the application, the information required by the hardware acceleration module to forge the next task instruction is uniformly stored in the on-chip memory control unit. When the hardware acceleration module forges the task instruction, information is uniformly acquired from the on-chip memory control unit, and the task instruction is forged by using the acquired information.

As can be seen from the above, the hardware acceleration module needs to have the functions of address calculation, instruction encoding and decoding, data processing, and the like, like the CPU, in order to achieve the predetermined target.

The information carried by the acceleration command is specifically shown in table 1 below, by way of example and not by way of limitation.

In one example, taking a data comparison task as an example, the existing process needs two instructions to read back and store data in two address spaces into an adjacent on-chip memory that can be quickly accessed by a CPU, then the CPU initiates a data comparison instruction, and finally, the CPU writes a comparison result into a certain register or a certain storage address by using a write instruction. After the instruction processing method provided by the embodiment of the application is adopted, only the CPU needs to carry three addresses in one acceleration instruction containing the acceleration mark at one time. After receiving the acceleration instruction, the hardware acceleration module actively forges and issues two task instructions (in this example, the task instruction is a read instruction) in sequence according to the first two addresses. When the hardware acceleration module scans that the storage control unit has a corresponding execution result, it forges again and issues a task instruction (in this example, the task instruction is a comparison instruction). And finally, after the hardware acceleration module scans the corresponding execution result existing in the storage control unit again, the execution result is written back according to the third address, and the CPU is informed that the execution task is completed.

Based on the same inventive concept, the embodiment of the present application further provides an instruction processing apparatus corresponding to the instruction processing method described in fig. 1. Referring to fig. 2, fig. 2 is a structural diagram of an instruction processing apparatus according to an embodiment of the present application, where the apparatus includes:

a receiving unit 210, configured to receive an acceleration instruction sent by a CPU, where the acceleration instruction includes attribute information of a task to be executed;

a constructing unit 220, configured to construct a first task instruction for completing the task to be executed according to the attribute information of the task to be executed, where the first task instruction includes a task target and a data storage location;

the sending unit 230 is configured to send the constructed first task instruction to a corresponding task completion module, so that the task completion module completes a corresponding task to be executed according to the task target, and sends an execution result and the data storage location to a storage control unit, and the storage control unit stores the execution result to a corresponding storage unit according to the data storage location, so that the CPU obtains the execution result from the storage unit.

Optionally, the apparatus further comprises: a negotiation unit (not shown in the figure) for negotiating with the CPU a memory address for storing the execution result in the memory unit.

Optionally, the acceleration instruction further comprises a thread number; the device further comprises: and a monitoring unit (not shown in the figure) for monitoring the thread indicated by the thread number according to the thread number.

Optionally, the apparatus further comprises: a scanning unit (not shown in the figure) for periodically scanning the storage control unit;

a judging unit (not shown in the figure) for judging whether an execution result corresponding to the thread number exists in the execution results stored by the storage control unit according to the thread number;

and an obtaining unit (not shown in the figure), configured to obtain, if the first task instruction and the second task instruction belong to the same task, an execution result corresponding to the thread number, where the execution result includes attribute information of a task to be executed, where the attribute information is required to construct a second task instruction, and the first task instruction and the second task instruction belong to the same task.

Optionally, the sending unit 230 is further configured to send a notification message to the CPU, where the notification message is used to enable the CPU to determine that the task to be executed has been completed.

Optionally, the receiving unit 210 is further configured to receive a pause indication;

the device further comprises: a storage unit (not shown in the figure) for storing a currently constructed task instruction into a storage space reserved for the thread indicated by the thread number in the storage unit according to the suspension indication;

the sending unit 230 is further configured to send the currently constructed task instruction to the corresponding task completion module after receiving the start instruction.

Therefore, by applying the instruction processing device provided by the application, after receiving the acceleration instruction sent by the CPU, the device constructs a first task instruction for completing the task to be executed according to the acceleration flag included in the acceleration instruction and the attribute information of the task to be executed. The task instruction comprises a task target and a data storage position. The device sends the task instruction to the corresponding task completion module, so that the task completion module completes the corresponding task to be executed according to the task target, the execution result and the data storage position are sent to the storage control unit, the storage control unit stores the execution result to the corresponding storage unit according to the data storage position, and the CPU obtains the execution result from the storage unit.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.

For the embodiment of the instruction processing apparatus, since the content of the related method is substantially similar to that of the foregoing method embodiment, the description is relatively simple, and for the relevant points, reference may be made to part of the description of the method embodiment.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. An instruction processing method, applied to a hardware acceleration module, the method comprising:

and sending the constructed first task instruction to a corresponding task completion module so that the task completion module completes a corresponding task to be executed according to the task target, and sending an execution result and the data storage position to a storage control unit, wherein the storage control unit stores the execution result to a corresponding storage unit according to the data storage position, so that the CPU obtains the execution result from the storage unit.

2. The method of claim 1, wherein before the hardware acceleration module receives an acceleration instruction sent by a CPU, the method further comprises:

and negotiating with the CPU about a storage address used for storing the execution result in the storage unit.

3. The method of claim 1, wherein the acceleration instruction further comprises a thread number;

before the constructing completes the first task instruction of the task to be executed, the method further comprises:

and monitoring the thread indicated by the thread number according to the thread number.

4. The method of claim 3, further comprising:

periodically scanning the storage control unit;

judging whether an execution result corresponding to the thread number exists in the execution results stored by the storage control unit according to the thread number;

and if the first task instruction and the second task instruction belong to the same task, acquiring an execution result corresponding to the thread number, wherein the execution result comprises attribute information of a task to be executed required by constructing the second task instruction, and the first task instruction and the second task instruction belong to the same task.

5. The method of claim 1, further comprising:

and sending a notification message to the CPU, wherein the notification message is used for enabling the CPU to determine that the task to be executed is executed and completed.

6. The method of claim 3, further comprising:

receiving a pause indication;

according to the pause indication, storing a currently constructed task instruction into a storage space reserved for the thread indicated by the thread number in the storage unit;

and after receiving a starting instruction, sending the currently constructed task instruction to a corresponding task completion module.

7. An instruction processing apparatus, characterized in that the apparatus comprises:

8. The apparatus of claim 7, further comprising:

and the negotiation unit is used for negotiating with the CPU about a storage address used for storing the execution result in the storage unit.

9. The apparatus of claim 7, wherein the acceleration instruction further comprises a thread number; the device further comprises:

and the monitoring unit is used for monitoring the thread indicated by the thread number according to the thread number.

10. The apparatus of claim 9, further comprising:

a scanning unit for periodically scanning the storage control unit;

a judging unit, configured to judge whether an execution result corresponding to the thread number exists in the execution results stored in the storage control unit according to the thread number;

and the obtaining unit is used for obtaining an execution result corresponding to the thread number if the thread number exists, wherein the execution result comprises attribute information of a task to be executed required by constructing a second task instruction, and the first task instruction and the second task instruction belong to the same task.

11. The apparatus according to claim 7, wherein the sending unit is further configured to send a notification message to the CPU, and the notification message is configured to enable the CPU to determine that the task to be executed has been completed.

12. The apparatus of claim 9, wherein the receiving unit is further configured to receive a pause indication;

the device further comprises: the storage unit is used for storing a currently constructed task instruction into a storage space reserved for the thread indicated by the thread number in the storage unit according to the suspension indication;

and the sending unit is also used for sending the currently constructed task instruction to the corresponding task completion module after receiving the starting instruction.