CN116243983A

CN116243983A - Processor, integrated circuit chip, instruction processing method, electronic device, and medium

Info

Publication number: CN116243983A
Application number: CN202310341078.9A
Authority: CN
Inventors: 代亚东; 王京
Original assignee: Kunlun Core Beijing Technology Co ltd
Current assignee: Kunlun Core Beijing Technology Co ltd
Priority date: 2023-03-31
Filing date: 2023-03-31
Publication date: 2023-06-09

Abstract

The present disclosure provides a processor, an integrated circuit chip, an instruction processing method, an electronic device, a storage medium, and a program product, and relates to the field of computer technology, in particular to the chip technology field and the processor technology field. The specific implementation scheme is as follows: the processor comprises: a plurality of coprocessors; at least one processor core configured to generate a plurality of first instructions and a plurality of second instructions; and an instruction scheduling unit configured to sequentially send the plurality of first instructions to the plurality of coprocessors, and send the target second instructions to the target coprocessors if it is determined that at least one target first instruction is executed; the target coprocessor is a coprocessor executing at least one target first instruction.

Description

Processor, integrated circuit chip, instruction processing method, electronic device, and medium

Technical Field

The present disclosure relates to the field of computer technology, and in particular, to the field of chip technology and the field of processor technology.

Background

Since there are dependencies between multiple instructions included in the same program, operations between multiple coprocessors in a processor for processing the multiple instructions may also have dependencies accordingly.

In order to avoid such dependency being destroyed when facing a plurality of programs to be processed, processing of a next program may be started when it is determined that a previous program is completed by the coprocessor. But this results in a reduced utilization of the co-processor.

Disclosure of Invention

The present disclosure provides a processor, an integrated circuit chip, an instruction processing method, an electronic device, a storage medium, and a program product.

According to an aspect of the present disclosure, there is provided a processor including: a plurality of coprocessors; at least one processor core configured to generate a plurality of first instructions and a plurality of second instructions; and an instruction scheduling unit configured to sequentially send the plurality of first instructions to the plurality of coprocessors, and send the target second instructions to the target coprocessors if it is determined that at least one target first instruction is executed; the target coprocessor is a coprocessor executing at least one target first instruction.

According to another aspect of the present disclosure, there is provided an integrated circuit chip including: the embodiment of the disclosure shows a processor.

According to another aspect of the present disclosure, there is provided an instruction processing method including: generating a plurality of first instructions and a plurality of second instructions; sequentially sending a plurality of first instructions to a plurality of coprocessors; and sending the target second instruction to the target coprocessor if it is determined that the at least one target first instruction is executed; the target coprocessor is a coprocessor executing at least one target first instruction.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods shown in the embodiments of the present disclosure.

According to another aspect of the disclosed embodiments, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method shown in the disclosed embodiments.

According to another aspect of the disclosed embodiments, there is provided a computer program product comprising a computer program/instruction, characterized in that the computer program/instruction, when executed by a processor, implements the steps of the method shown in the disclosed embodiments.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 schematically illustrates a structural schematic of a processor according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a structural diagram of a processor according to another embodiment of the present disclosure;

FIG. 3 schematically illustrates a schematic diagram of an example instruction execution process;

FIG. 4A schematically illustrates a schematic diagram of an instruction scheduling process according to an embodiment of the present disclosure;

FIG. 4B schematically illustrates a schematic diagram of an instruction execution process according to an embodiment of the present disclosure;

FIG. 5A schematically illustrates a schematic of an instruction scheduling process according to another embodiment of the present disclosure

Drawing of the figure

FIG. 5B schematically illustrates a schematic diagram of an instruction execution process according to another embodiment of the present disclosure;

fig. 6 schematically illustrates a structural schematic of an integrated circuit chip according to an embodiment of the present disclosure;

FIG. 7 schematically illustrates a flow chart of an instruction processing method according to an embodiment of the disclosure; and

FIG. 8 schematically illustrates a block diagram of an example electronic device that may be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The present disclosure provides a processor comprising: a plurality of coprocessors; at least one processor core configured to generate a plurality of first instructions and a plurality of second instructions; and an instruction scheduling unit configured to sequentially send the plurality of first instructions to the plurality of coprocessors, and send the target second instructions to the target coprocessors if it is determined that at least one target first instruction is executed; the target coprocessor is a coprocessor executing at least one target first instruction.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.

The structure of the processor will be described below with reference to fig. 1.

Fig. 1 schematically illustrates a structural schematic of a processor according to an embodiment of the present disclosure.

As shown in fig. 1, processor 100 includes a plurality of coprocessors 110, at least one processor core 120, and an instruction scheduling unit 130. The plurality of coprocessors 110 includes a coprocessor 110_1, a coprocessor 110_2, and a coprocessor 110_3. In an embodiment of the present disclosure, the number of coprocessors included in the plurality of coprocessors 110 shown in fig. 1 and the number of processor cores included in the at least one processor core 120 are both schematically illustrated.

For example, processor 100 may be a heterogeneous processor, with the plurality of coprocessors 110 and processor cores 120 forming the heterogeneous processor.

For example, multiple coprocessors 120 may each assist the main processor in handling a particular task, and data exchange may be implemented between the multiple coprocessors 120 using a particular address space.

For example, the at least one processor 120 may be a central processing unit (Central Processing Unit, CPU) or a micro control unit (Micro controller Unit, MCU). At least one processor core 120 may form a main processor of processor 100. In the case where at least one processor core 120 comprises one processor core, the main processor of processor 100 is a single-core processor. In the case where at least one processor core 120 includes multiple processor cores, the main processor of processor 100 is a multi-core processor. The host processor may call a coprocessor to cause the coprocessor to assist in handling a particular task.

According to an embodiment of the present disclosure, at least one processor core 120 generates a plurality of first instructions and a plurality of second instructions. For example, the plurality of first instructions may be a plurality of instructions of a first program, the plurality of second instructions may be a plurality of instructions of a second program, and the first program and the second program may be two programs independent of each other. For example, the first program and the second program may be two computing tasks, respectively, and the two computing tasks are independent from each other.

For example, the at least one processor core 120 may generate a first program based on the first computing request, the first program including a plurality of first instructions. The at least one processor core 120 may also generate a second program based on the second calculation request, the second program including a plurality of second instructions. In the event that it is determined that the at least one processor core 120 includes a plurality of processor cores, the plurality of processor cores may generate a plurality of first instructions based on the first computing request, respectively, and the plurality of processor cores may also generate a plurality of second instructions based on the second computing request, respectively. For example, the first computing request may include an image compression request, the first program may include an image compression task, and the plurality of first instructions may include a plurality of instructions that instruct processing of the image. The second computing request may include a speech recognition request, the second program may include a speech recognition task, and the plurality of second instructions may include a plurality of instructions that direct parsing the speech. The execution of the first program and the execution of the second program are independent of each other.

The execution of each of the plurality of second instructions is independent of the execution of each of the plurality of first instructions. For example, the execution of the second plurality of instructions may be performed independently of the execution of the first plurality of instructions and the execution results.

For example, the main processor may be an accelerator or a processor for deep learning. The plurality of coprocessors 110 may be used in one or more of a convolution module, a full connection module, a pooling module, and an activation module, respectively, which the host processor may process neural network-based computing tasks by invoking. The main processor may also be an accelerator or a processor for big data processing, for example. The plurality of coprocessors 110 may be respectively used for one or more of a filtering module, a connecting module, a sorting module, and an aggregation module, and the main processor may process big data-based computing tasks by calling the filtering module, the connecting module, the sorting module, and the aggregation module.

According to an embodiment of the present disclosure, the instruction scheduling unit 130 sequentially transmits a plurality of first instructions to the plurality of coprocessors 110, and transmits a target second instruction to the target coprocessors in case it is determined that at least one target first instruction is executed to be completed. For example, the at least one target first instruction is at least one first instruction of the plurality of first instructions that is preferentially executed, the target second instruction is a second instruction of the plurality of second instructions that is preferentially executed, and the target coprocessor is a coprocessor that executes the at least one target first instruction.

For example, the plurality of first instructions may include a first instruction 0, a first instruction 1, and a first instruction 2. The instruction scheduling unit 130 may send the first instruction 0, the first instruction 1, and the first instruction 2 to the coprocessors 110_1, 110_2, and 110_3, respectively. According to the dependency relationship among the first instruction 0, the first instruction 1 and the first instruction 2, the execution process of the plurality of first instructions may include the coprocessor 110_1 executing the first instruction 0, resulting in a first execution result 0. The coprocessor 110_2 executes the first instruction 1 based on the first execution result 0, resulting in the first execution result 1. The coprocessor 110_3 executes the first instruction 2 based on the first execution result 1, resulting in the first execution result 2. At this time, since the execution of the first instruction 1 and the first instruction 2 is realized based on the execution result of the first instruction 0, the first instruction 0 can be regarded as the target first instruction to be preferentially executed among the plurality of first instructions. For example, in the case where it is determined that the plurality of first instructions includes a plurality of first instructions 0, the at least one target first instruction includes a plurality of first instructions 0.

For example, the plurality of second instructions may include second instruction 0, second instruction 1, and second instruction 2. According to the dependency relationship among the second instruction 0, the second instruction 1 and the second instruction 2, the execution process of the plurality of second instructions may include the coprocessor 110_1 executing the second instruction 0, to obtain a second execution result 0. The coprocessor 110_2 executes the second instruction 1 based on the second execution result 0, resulting in the second execution result 1. The coprocessor 110_3 executes the second instruction 2 based on the second execution result 1, resulting in the second execution result 2. At this time, since the execution of the second instruction 1 and the second instruction 2 is realized based on the execution result of the second instruction 0, the second instruction 0 can be regarded as the target second instruction to be preferentially executed among the plurality of second instructions.

The coprocessor 110_1 is configured to execute a first instruction 0 and a second instruction 0, and the coprocessor 110_1 may be a target coprocessor.

In the case where it is determined that the first instruction 0 of the plurality of first instructions has been executed, only the coprocessor 110_2 and the coprocessor 110_3 continue to participate in the subsequent processing of the first program, the coprocessor 110_1 does not participate in the execution of the first instruction 1 and the first instruction 2, and the instruction scheduling unit 130 considers that the coprocessor 110_1 enters an idle state for the first program, and the dependency of the coprocessor 110_1 on the coprocessor 110_2 and the coprocessor 110_3 on the first program is released. To increase the utilization of the coprocessor, the instruction dispatch unit 130 may send a second instruction 0 to the coprocessor 110_1. Since the coprocessor 110_1 no longer participates in the subsequent processing of the first program, the coprocessor 110_1 executes the second instruction 0, and does not destroy the dependency between the plurality of first instructions and the dependency between the coprocessor 110_2 and the coprocessor 110_3, thereby ensuring that the first program can be processed.

According to the embodiments of the present disclosure, in the process of processing a first program by the processor 100, the execution states of a plurality of first instructions in the first program and the running states of a plurality of coprocessors are determined by using the instruction scheduling unit, and under the condition that the processing process of the first program is not affected, the utilization rate of the coprocessors can be improved by starting to process a plurality of second instructions of a second program in advance by the coprocessors in an idle state. In addition, the processing procedure of the first program does not need to be waited, the second program is started to be processed in advance, and the processing progress of the second program can be quickened.

According to an embodiment of the present disclosure, before sending the target second instruction to the target coprocessor, the instruction scheduling unit 130 may further obtain, if it is determined that at least one target first instruction is executed, a first memory address related to a subsequent instruction and a second memory address related to the target second instruction; and sending the target second instruction to the target coprocessor if the first access address and the second access address are determined to be different. The subsequent instruction is at least one subsequent first instruction of at least one target first instruction of the plurality of first instructions.

According to an embodiment of the present disclosure, in a case where it is determined that the first memory address and the second memory address have the same address, the instruction scheduling unit 130 modifies the second memory address so that the second memory address is different from the first memory address.

For example, in the event that it is determined that a first instruction 0 of the plurality of first instructions has been executed to completion, instruction dispatch unit 130 may fetch memory addresses associated with first instruction 1, first instruction 2, and second instruction 0. For example, the first memory address 1 associated with the first instruction 1 includes a read address of the first instruction 1, a read address of an operand corresponding to the first instruction 1, and a write address of a first execution result 1 of the first instruction 1. The first memory address 2 associated with the first instruction 2 includes a read address of the first instruction 2, a read address of an operand corresponding to the first instruction 2, and a write address of a first execution result 2 of the first instruction 2. The second access address 0 associated with the second instruction 0 includes a read address of the second instruction 0, a read address of an operand corresponding to the second instruction 0, and a write address of a second execution result 0 of the second instruction 0.

For example, in the case that it is determined that the first address 1 and the first address 2 are different from the second address 0, the instruction scheduling unit 130 considers that the first address 1 and the first address 2 do not conflict with the second address 0, and the execution of the second instruction 0 does not affect the execution of the first instruction 1 and the first instruction 2. In this case, the instruction scheduling unit 130 may send the second instruction 0 to the coprocessor 110_1, thereby avoiding that the execution of the second instruction 0 affects the processing of the first program.

For example, in the case that it is determined that the first address 1 and the second address 0 have the same address and/or the first address 2 and the second address 0 have the same address, the instruction scheduling unit 130 considers that the first address 1 and/or the first address 2 have address conflicts with the second address 0, and the execution of the second instruction 0 affects the execution of the first instruction 1 and/or the first instruction 2. In this case, the instruction scheduling unit 130 may modify the second address 0 to make the second address 0 different from the first address 1 and the first address 2, so as to avoid that the execution process of the second instruction 0 affects the execution processes of the first instruction 1 and the first instruction 2.

For example, first memory address 1 includes addresses 0-100, and second memory address 0 includes addresses 50-150. Since the first address 1 and the second address 0 are determined to have the same addresses 100-150, when the second instruction 0 is executed, a read operand error or a write execution result error occurs, and the first instruction 1 is executed with a read operand error or a write execution result error. Since the first instruction 1 may be in a state of being executed, the instruction dispatch unit 130 may determine a free memory area from the memory and offset the second address 0 to the free memory area. For example, when it is determined that the area of addresses 101 to 201 is a free memory area, the second memory address 0 is shifted to addresses 101 to 201. At this time, the execution of the second instruction 0 may be realized based on the addresses 101 to 201. For example, the memory may be a memory, a cache, or an external memory, and the access address is an address of a storage area in the memory.

According to the embodiment of the disclosure, in order to avoid the influence of the execution process of the first instruction being executed, before the second instruction is sent to the target coprocessor, the instruction scheduling unit is used for comparing the memory address of the second instruction with the memory addresses of the instructions being executed and not being executed in the first program, and modifying the memory address of the second instruction with address conflict, so that the coprocessor can execute the second instruction based on the modified memory address, and the utilization rate of the coprocessor is improved.

The structure of the processor provided by the present disclosure will be described below with reference to fig. 2.

Fig. 2 schematically illustrates a structural diagram of a processor according to another embodiment of the present disclosure.

As shown in fig. 2, processor 200 includes a plurality of coprocessors 210, a plurality of processor cores 220, an instruction scheduling unit 230, and a memory 240. The plurality of coprocessors 210 and the plurality of processor cores 220 are similar to the plurality of coprocessors 110 and the at least one processor and 120, respectively, shown in fig. 1, and for brevity, the disclosure is not repeated here for like parts.

According to an embodiment of the present disclosure, the plurality of coprocessors 210 includes coprocessor 0, coprocessor 1, and coprocessor 2. The plurality of processor cores 220 includes processor core 0, processor core 1, and processor core 2. In an embodiment of the present disclosure, the number of coprocessors included in the plurality of coprocessors 210 and the number of processor cores included in the plurality of processor cores 220 shown in fig. 2 are both schematically illustrated.

According to an embodiment of the present disclosure, instruction dispatch unit 230 includes at least one buffer 231 and a prefetch unit 232.

According to an embodiment of the present disclosure, the at least one buffer 231 stores a plurality of first instructions and a plurality of second instructions, respectively, from the at least one processor core 210. According to the execution sequence of the first instructions, the prefetch unit 232 sequentially sends the first instructions of the at least one buffer 231 to the corresponding coprocessors of the coprocessors 210; acquiring execution states of a plurality of first instructions in the plurality of coprocessors 210; and according to the execution state, sending the target second instruction to the target coprocessor under the condition that at least one target first instruction is determined to be executed by the target coprocessor.

As shown in fig. 2, in the case where it is determined that the at least one buffer 231 includes a plurality of buffers, the at least one buffer 231 may include a buffer 0, a buffer 1, and a buffer 2, the plurality of buffers shown in fig. 2 include a number of buffers, which are schematically illustrated, and the number of buffers may be the same as the number of processor cores, and the plurality of buffers may be in one-to-one correspondence with the plurality of processor cores. The number of buffers may be the same as the number of coprocessors.

For example, buffer 0, buffer 1, and buffer 2 may be in one-to-one correspondence with processor core 0, processor core 1, and processor core 2, respectively. Buffer 0 may store instructions generated by processor core 0, buffer 1 may store instructions generated by processor core 1, and buffer 2 may store instructions generated by processor core 2.

For example, the order of execution of the plurality of first instructions may be to characterize dependencies between the plurality of first instructions. For example, the plurality of first instructions includes a first instruction 0, a first instruction 1, and a first instruction 2, and the execution order of the plurality of first instructions is first instruction 0→first instruction 1→first instruction 2 according to the dependency relationship among the first instruction 0, the first instruction 1, and the first instruction 2.

The processor core 0, the processor core 1 and the processor core 2 may sequentially generate a first instruction 0, a first instruction 1 and a first instruction 2, and sequentially write the first instruction 0, the first instruction 1 and the first instruction 2 into the corresponding buffer 0, the buffer 1 and the buffer 2.

Depending on the order of execution, prefetch unit 232 sends a first instruction 0 in buffer 0 to coprocessor 0, coprocessor 0 being operable to execute the first executed instruction of the plurality of instructions. In the case where it is determined that the coprocessor 0 executes the first instruction 0, resulting in the first execution result 0, the prefetch unit 232 sends the first instruction 1 in the buffer 1 to the coprocessor 1, and the coprocessor 1 may be used to execute the instruction executed by the second one of the plurality of instructions. In case it is determined that the coprocessor 1 executes the first instruction 1, resulting in a first execution result 1, the prefetch unit 232 sends the first instruction 2 in the buffer 2 to the coprocessor 2, and the coprocessor 2 may be used to execute the third executed instruction of the plurality of instructions.

For example, first execution result 0, first execution result 1, and first execution result 2 may be written to corresponding coprocessor write memory 240. For example, the Memory 240 may be a Static Random-Access Memory (SRAM).

In accordance with an embodiment of the present disclosure, in the event that a plurality of first instructions are determined to be each written to at least one buffer 231, at least one processor core 220 generates a plurality of second instructions and writes the plurality of second instructions to at least one buffer 231.

For example, the plurality of second instructions include a second instruction 0, a second instruction 1, and a second instruction 2, and in the case where it is determined that the first instruction 0, the first instruction 1, and the first instruction 2 are sequentially written into the corresponding buffer 0, buffer 1, and buffer 2, the processor core 0, the processor core 1, and the processor core 2 may sequentially generate the second instruction 0, the second instruction 1, and the second instruction 2, and sequentially write the second instruction 0, the second instruction 1, and the second instruction 2 into the corresponding buffer 0, buffer 1, and buffer 2.

For example, in the case that it is determined that the plurality of first instructions are all written into the corresponding buffer, the processor core starts to generate the plurality of second instructions, which can ensure that the plurality of first instructions included in the first program can be all sent to the corresponding coprocessor when the first program is processed, and also facilitate the prefetch unit 232 to obtain the execution states of all the first instructions included in the first program.

For example, the execution order of the plurality of second instructions may be such that the first executed instruction is second instruction 0→second instruction 1→second instruction 2.

For example, each of the

registers

0, 1, and 2 may include an instruction queue for storing instructions generated by the corresponding processor core. In the case where the first instruction 0 of the instruction queue of the buffer 0 is sent to the coprocessor 0, the second instruction 0 is located at the exit of the instruction queue of the buffer 0, and the second instruction 0 is in a state that can be read.

At this time, the coprocessor 0 corresponding to the second instruction 0 is a target coprocessor, and the prefetch unit 232 may obtain the execution states of the first instructions in the coprocessors 210, and determine whether the coprocessor 0 has completed execution of the first instruction 0.

In the event that it is determined that the first instruction 0 has been executed by the coprocessor 0 and that only the first instruction 1 and the first instruction 2 remain unexecuted among the plurality of first instructions, the prefetch unit 232 considers that the coprocessor 0 enters an idle state for the first program. In the case that it is determined that the memory address associated with the first instruction 1 and the memory address associated with the first instruction 2 are different from the memory address associated with the second instruction 0, in order to improve the utilization rate of the coprocessor, the prefetch unit 232 may send the second instruction 0 of the second program to the coprocessor 0, so that the coprocessor 0 starts to execute the second instruction 0.

According to the embodiment of the disclosure, in the process of processing the first program, under the condition that the processing process of the first program is not affected, the pre-fetching unit 232 starts to execute the second instruction 0 of the second program in advance by the coprocessor 0 in an idle state for the first program, so that the processing progress of the second program can be accelerated, and the utilization rate of the coprocessor is improved.

Memory 240 may store at least one execution result of at least one target first instruction, according to embodiments of the present disclosure. The coprocessor deletes the at least one execution result from the memory 240 in the event that it is determined that the at least one execution result is not associated with a subsequent instruction of the at least one target first instruction.

For example, in the case where the coprocessor 0 executes the first instruction 0 to obtain the first execution result 0, the coprocessor 0 may write the first execution result 0 into the memory 240. The address of the first execution result 0 written into the memory 240 is the address of the memory associated with the first instruction 0.

Since the execution of the first instruction 1 involves the first execution result 0, the address of the first execution result 0 written into the memory 240 is also the memory address associated with the first instruction 1. At this time, in the case where the coprocessor 0 completes execution of the first instruction 0, the coprocessor 0 may delete the first instruction 0 and the corresponding operand of the first instruction 1 from the memory 240, thereby releasing the memory space occupied by the first instruction 0 and the corresponding operand.

In the case that it is determined that the coprocessor 1 completes execution of the first instruction 1 and the execution process of the first instruction 2 does not use the first execution result 0, the coprocessor 1 may delete the first execution result 0 from the memory 240, thereby releasing the memory space occupied by the first execution result 0.

For example, prefetch unit 232 may delay sending second instruction 0 to coprocessor 0 upon determining that the write address of first execution result 0 is the same address as the associated memory address of second instruction 0. In the case where it is determined that the memory space occupied by the first execution result 0 is released, the prefetch unit 232 transmits the second instruction 0 to the coprocessor 0.

And caching the instructions generated by the processor cores through the cache, and monitoring the execution states of the instructions by utilizing the prefetch unit. Under the condition that the coprocessors in the idle state exist in the coprocessors and the conflict of access addresses does not exist between the second instruction which is preferentially executed in the second program and the first instruction which is not executed in the first program, the second instruction which is preferentially executed in the second program is timely executed by the coprocessors in the idle state, the execution process of the second program is accelerated, and the utilization rate of the coprocessors is improved. In addition, the hardware resource expense generated by the buffer and the prefetch unit is small, and the operation performance of the coprocessor and the processor core is not affected.

The instruction execution process of the processor provided by the present disclosure will be described below with reference to fig. 3, 4A, 4B, 5A, and 5B.

It should be noted that, the first program executed by the processor is program 0, and the second program executed by the processor is program 1. The first program includes a first instruction 0, a first instruction 1, and a first instruction 2, which are an instruction i0_0, an instruction i0_1, and an instruction i0_2, respectively. The second program includes a second instruction 0, a second instruction 1, and a second instruction 2, which are an instruction i1_0, an instruction i1_1, and an instruction i1_2, respectively. The plurality of buffers includes buffer 0, buffer 1, and buffer 2, and the plurality of coprocessors includes coprocessor 0, coprocessor 1, and coprocessor 2. The correspondence between the plurality of instructions, the plurality of registers, and the plurality of coprocessors is similar to the embodiment shown in fig. 2, and for brevity, this disclosure will not be repeated here.

FIG. 3 schematically illustrates a schematic diagram of an example instruction execution process.

As shown in fig. 3, the instruction execution process 300 includes:

the processor core sends the instruction i0_0 of the program 0 to the coprocessor 0, when the coprocessor 0 completes execution of the instruction i0_0, the coprocessor 0 sends a corresponding synchronous instruction s0_0 to the processor core, and the synchronous instruction s0_0 may instruct the processor core to control the coprocessor 1 having a dependency relationship with the coprocessor 0 to start executing the instruction i0_1.

The processor core sends the instruction i0_1 of the program 0 to the coprocessor 1, when the coprocessor 1 completes execution of the instruction i0_1, the coprocessor 1 sends a corresponding synchronous instruction s0_1 to the processor core, and the synchronous instruction s0_1 may instruct the processor core to control the coprocessor 2 having a dependency relationship with the coprocessor 1 to start executing the instruction i0_2.

The processor core sends the instruction i0_2 of the program 0 to the coprocessor 2, and when the coprocessor 2 completes execution of the instruction i0_2, the coprocessor 2 sends a corresponding synchronous instruction s0_2 to the processor core, where the synchronous instruction s0_2 may indicate that the coprocessor completes processing of the program 0.

When the processor core sends the instruction i0_2 of the program 0 to the coprocessor 2, the processor core determines that the last instruction i0_2 of the program 0 is executed by the coprocessor 2, and in this case, in the case that it is determined that the memory addresses related to the program 1 are different from the memory addresses related to the program 0, the processor core may start executing the program 1. The processor core sends the instruction i1_0 of the program 1 to the coprocessor 0, when the coprocessor 0 completes execution of the instruction i1_0, the coprocessor 0 sends a corresponding synchronous instruction s1_0 to the processor core, and the synchronous instruction s1_0 may instruct the processor core to control the coprocessor 1 having a dependency relationship with the coprocessor 0 to start executing the instruction i1_1.

The subsequent execution of procedure 1 is similar to that of procedure 0, and for brevity, this disclosure is not repeated here.

In the instruction execution process shown in fig. 3, since there is a dependency relationship among the instruction i0_0, the instruction i0_1 and the instruction i0_2, the processor core cannot send the instruction i0_2 to the corresponding coprocessor 2 immediately after generating the instruction i0_2, which causes a delay in the time when the processor core sends the instruction i1_0. As shown in fig. 3, when coprocessor 1 executes instruction i0_1, there is a bubble 310 in the running process of coprocessor 0, which means that coprocessor 0 is in an idle state for program 0, and thus the utilization of coprocessor 0 decreases.

Fig. 4A schematically illustrates a schematic diagram of an instruction scheduling process according to an embodiment of the present disclosure.

As shown in fig. 4A, in instruction dispatch process 400a, instruction i0_0 is stored in instruction queue 410 of register 0, instruction i0_1 is stored in instruction queue 420 of register 1 and instruction i0_2 is stored in instruction queue 430 of register 2.

When coprocessor 0 completes execution of instruction i0_0, coprocessor 0 writes synchronization instruction s0_0 into instruction queue 410 and the instruction dispatch unit synchronizes synchronization instruction s0_0 into instruction queue 420. In response to the synchronous instruction s0_0, coprocessor 1 begins executing instruction i0_1 from instruction queue 420.

At this time, the instruction scheduling unit may also acquire the instructions being executed and the instructions not being executed in the

instruction queues

410, 420, and 430. In the event that it is determined that instruction queue 410, instruction queue 420, and instruction queue 430 do not include instruction i0_0 and that there is no address conflict between instruction i1_0 and instructions i0_1 and i0_2, the instruction dispatch unit may send instruction i1_0 in instruction queue 410 to coprocessor 0, coprocessor 0 begins execution of instruction i1_0, and processing procedure 1 begins.

When coprocessor 1 completes execution of instruction i0_1, coprocessor 1 writes synchronization instruction s0_1 into instruction queue 420 and the instruction dispatch unit synchronizes synchronization instruction s0_1 into instruction queue 430. In response to the synchronous instruction s0_1, coprocessor 2 begins executing instruction i0_2 from instruction queue 430.

At this time, when the coprocessor 0 completes execution of the instruction i1_0, the instruction scheduling unit may also fetch the instructions being executed and the instructions not being executed in the

instruction queues

410, 420, and 430. In the event that it is determined that none of instruction queue 410, instruction queue 420, and instruction queue 430 includes instruction i0_1 and that there is no memory address conflict between instruction i1_1 and instruction i0_2, the instruction dispatch unit may send instruction i1_1 in instruction queue 410 to coprocessor 1, and coprocessor 1 begins executing instruction i1_1.

Fig. 4B schematically illustrates a schematic diagram of an instruction execution process according to an embodiment of the present disclosure. FIG. 4B illustrates the execution flow of the instructions in the instruction queue of FIG. 4A.

As shown in fig. 4B, the instruction execution process 400B includes:

the processor core sends program 0 to the instruction dispatch unit, program 0 including instruction i0_0, instruction i0_1, and instruction i0_2. In the case where the processor core has sent instructions i0_0, i0_1, and i0_2 of program 0 to the instruction dispatch unit, the processor core may send program 1 to the instruction dispatch unit, program 1 including instructions i1_0, i1_1, and i1_2.

The instruction dispatch unit sends instruction i0_0 to coprocessor 0, starting handler 0. When coprocessor 0 completes execution of instruction i0_0, coprocessor 0 sends corresponding synchronization instruction s0_0 to the instruction dispatch unit. In response to the synchronization instruction s0_0, the instruction dispatch unit sends instruction i0_1 to coprocessor 1. When coprocessor 1 completes execution of instruction i0_1, coprocessor 1 sends corresponding synchronization instruction s0_1 to the instruction dispatch unit.

While the instruction scheduling unit is sending the instruction i0_1 to the coprocessor 1, in the case that it is determined that the instruction i0_0 is processed and that there is no address conflict between the instruction i1_0 and the instructions i0_1 and i0_2, the instruction scheduling unit may send the instruction i1_0 to the coprocessor 0, and the coprocessor 0 starts to execute the instruction i1_0 and starts to process the program 1. When coprocessor 0 completes execution of instruction i1_0, coprocessor 0 sends corresponding synchronization instruction s1_0 to the instruction dispatch unit.

In response to the synchronization instruction s0_1, the instruction scheduling unit sends the instruction i0_2 to the coprocessor 2, and when the coprocessor 2 completes execution of the instruction i0_2, the coprocessor 2 sends the corresponding synchronization instruction s0_2 to the instruction scheduling unit.

While the instruction dispatch unit is sending the instruction i0_2 to the coprocessor 2, in response to synchronizing the instruction s1_0, the instruction dispatch unit may send the instruction i1_1 to the coprocessor 1, and the coprocessor 1 starts executing the instruction i1_1 in case it is determined that the instruction i0_1 is processed and that there is no address conflict between the instruction i1_1 and the instruction i0_2. When coprocessor 1 completes execution of instruction i1_1, coprocessor 1 sends corresponding synchronization instruction s1_1 to the instruction dispatch unit.

Fig. 5A schematically illustrates a schematic diagram of an instruction scheduling process according to another embodiment of the present disclosure.

As shown in fig. 5A, in instruction scheduling process 500a, instruction queue 510 of buffer 0 stores instruction i0_0, instruction queue 520 of buffer 1 stores instruction i0_0 and instruction i0_1, and instruction queue 530 of buffer 2 stores instruction i0_2.

At the same time that coprocessor 0 completes execution of instruction i0_0 in instruction queue 510, the instruction dispatch unit fetches the instructions being executed and the instructions not being executed in instruction queue 510, instruction queue 520, and instruction queue 530. In the event that determination is made that instruction queue 520 includes instruction i0_0, instruction dispatch unit sends instruction i0_0 in instruction queue 520 to coprocessor 0, coprocessor 0 executing instruction i0_0.

While coprocessor 0 is executing instruction i0_0 in instruction queue 520, the instruction dispatch unit may send instruction i0_1 in instruction queue 520 to coprocessor 1, and coprocessor 1 begins executing instruction i0_1.

While coprocessor 0 completes processing instructions i0_0 in instruction queue 520, the instruction dispatch unit may fetch instructions being executed and instructions not being executed in instruction queue 510, instruction queue 520, and instruction queue 530. In the event that it is determined that neither instruction queue 510, instruction queue 520, nor instruction queue 530 includes instruction i0_0 and that instruction i1_0 has no address conflict with instructions i0_1 and i0_2, the instruction dispatch unit may send instruction i1_0 in instruction queue 510 to coprocessor 0, and coprocessor 0 begins executing instruction i1_0, thereby beginning processing procedure 1.

The subsequent execution of program 0 and the subsequent execution of program 1 are similar to the embodiments shown in fig. 4A and 4B, and for brevity, this disclosure will not be repeated here.

Fig. 5B schematically illustrates a schematic diagram of an instruction execution process according to another embodiment of the present disclosure. FIG. 5B illustrates the execution flow of the instructions in the instruction queue of FIG. 5A.

As shown in fig. 5B, the instruction execution process 500B includes:

In the case where it is determined that the instruction scheduling unit has not yet executed the instruction i0_0 at the same time as the instruction scheduling unit transmits the instruction i0_1 to the coprocessor 1, the instruction scheduling unit may transmit the instruction i0_0, which has not yet been executed, to the coprocessor 0 again, and the coprocessor 0 executes the instruction i0_0.

Under the condition that the second instruction I0_0 is processed and the conflict between the instruction I1_0 and the instruction I0_2 does not exist, the instruction scheduling unit can send the instruction I1_0 to the coprocessor 0, and the coprocessor 0 starts to execute the instruction I1_0, so that the utilization rate of the coprocessor 0 is improved.

According to the embodiment of the disclosure, an instruction scheduling unit is utilized to cache an instruction sent by a processor core, and schedule an instruction which is not executed in a program 0 and a plurality of instructions which are not executed in a program 1 according to execution states of the plurality of instructions of the program 0. The non-executed instruction is executed by fully utilizing the coprocessor in the idle state, so that under the condition that the coprocessor 0 completes the instruction I0_0, the coprocessor 0 can synchronously execute the next instruction with the coprocessor 1, the idle time of the coprocessor is reduced, the utilization rate of the coprocessor is improved, and the processing progress of a program is accelerated.

The integrated circuit chip provided by the present disclosure will be described below in connection with fig. 6.

Fig. 6 schematically illustrates a structural schematic of an integrated circuit chip according to an embodiment of the present disclosure.

The integrated circuit chip 600 includes a processor 610 according to any of the embodiments of the present disclosure. For example, integrated circuit chip 600 may include processor 100 as shown in fig. 1.

Any of the multiple modules in integrated circuit chip 600 may be combined in one module or any of the modules may be split into multiple modules according to embodiments of the present disclosure. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. At least one of the integrated circuit chips 600 according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-a-substrate, a system-on-a-package, an Application Specific Integrated Circuit (ASIC), or as hardware or firmware in any other reasonable manner of integrating or packaging the circuits, or as any one of or a suitable combination of any of the three. Or at least one of the integrated circuit chips 600 may be at least partially implemented as a computer program module which, when executed, performs the corresponding function.

The instruction processing method provided by the present disclosure will be described below with reference to fig. 7.

Fig. 7 schematically illustrates a flow chart of an instruction processing method according to an embodiment of the disclosure.

As shown in fig. 7, the instruction processing method 700 includes operations S710 to S730.

In operation S710, a plurality of first instructions and a plurality of second instructions are generated.

Then, the plurality of first instructions are sequentially transmitted to the plurality of coprocessors in operation S720.

Then, in case it is determined that at least one target first instruction is executed to be completed, a target second instruction is transmitted to the target coprocessor in operation S730.

According to an embodiment of the present disclosure, the at least one target first instruction is at least one first instruction of the plurality of first instructions that is preferentially executed, the target second instruction is a second instruction of the plurality of second instructions that is preferentially executed, and the target coprocessor is a coprocessor that executes the at least one target first instruction.

According to an embodiment of the present disclosure, operation S710 may be performed by at least one processor core 120 in the embodiment shown in fig. 1, and operations S720 through S730 may be performed by the instruction scheduling unit 130 in the embodiment shown in fig. 1.

In accordance with an embodiment of the present disclosure, operation S730, in a case where it is determined that at least one target first instruction is executed to be completed, transmits a target second instruction to a target coprocessor, includes: acquiring a first memory address related to a subsequent instruction and a second memory address related to a target second instruction under the condition that at least one target first instruction is determined to be executed, wherein the subsequent instruction is at least one subsequent first instruction of at least one target first instruction in a plurality of first instructions; and sending the target second instruction to the target coprocessor if the first access address and the second access address are determined to be different.

In accordance with an embodiment of the present disclosure, operation S730, in a case where it is determined that at least one target first instruction is executed, transmits a target second instruction to a target coprocessor, further includes: and under the condition that the first access address and the second access address have the same address, modifying the second access address to enable the second access address to be different from the first access address.

According to an embodiment of the present disclosure, the plurality of first instructions and the plurality of second instructions may be stored in at least one buffer. Operation S730, in a case where it is determined that the at least one target first instruction is executed, transmits a target second instruction to the target coprocessor, further includes: according to the execution sequence of the first instructions, sequentially sending the first instructions in at least one buffer to corresponding coprocessors in the coprocessors; acquiring execution states of a plurality of first instructions in a plurality of coprocessors; and according to the execution state, sending the target second instruction to the target coprocessor under the condition that at least one target first instruction is determined to be executed by the target coprocessor.

For example, in the event that it is determined that the plurality of first instructions are each written to at least one buffer, the at least one processor core generates a plurality of second instructions and writes the plurality of second instructions to the at least one buffer.

According to an embodiment of the present disclosure, at least one execution result of at least one target first instruction may be stored through a memory. And deleting the at least one execution result from the memory under the condition that the at least one execution result is not related to the subsequent instruction of the at least one target first instruction.

According to an embodiment of the present disclosure, the execution of each of the plurality of second instructions and the execution of each of the plurality of first instructions are independent of each other.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

FIG. 8 schematically illustrates a block diagram of an example electronic device that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the electronic device 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

Various components in device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the respective methods and processes described above, such as a chip authentication method. For example, in some embodiments, the chip authentication method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM802 and/or communication unit 809. When a computer program is loaded into RAM 803 and executed by computing unit 801, one or more steps of the chip authentication method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the instruction processing method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above can be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A processor, comprising:

a plurality of coprocessors;

at least one processor core configured to generate a plurality of first instructions and a plurality of second instructions; and

an instruction scheduling unit configured to sequentially send the plurality of first instructions to the plurality of coprocessors, and send a target second instruction to the target coprocessors if it is determined that at least one target first instruction is executed;

The at least one target first instruction is at least one first instruction which is preferentially executed in the plurality of first instructions, the target second instruction is a second instruction which is preferentially executed in the plurality of second instructions, and the target coprocessor is a coprocessor which executes the at least one target first instruction.

2. The processor of claim 1, wherein the instruction dispatch unit is further configured to:

acquiring a first memory address related to a subsequent instruction and a second memory address related to the target second instruction under the condition that the at least one target first instruction is determined to be executed, wherein the subsequent instruction is at least one subsequent first instruction of the at least one target first instruction in the plurality of first instructions; and

and sending the target second instruction to the target coprocessor under the condition that the first access address and the second access address are different.

3. The processor of claim 2, wherein the instruction scheduling unit is further configured to:

and under the condition that the first access address and the second access address have the same address, modifying the second access address to enable the second access address to be different from the first access address.

4. The processor of claim 1, wherein the instruction scheduling unit comprises:

at least one buffer configured to store the plurality of first instructions and the plurality of second instructions from the at least one processor core, respectively; and

a prefetch unit configured to:

sequentially sending the first instructions in the at least one buffer to corresponding coprocessors in the coprocessors according to the execution sequence of the first instructions;

acquiring execution states of the first instructions in the coprocessors; and

and according to the execution state, sending the target second instruction to the target coprocessor under the condition that the execution of the at least one target first instruction by the target coprocessor is determined to be completed.

5. The processor of claim 4, wherein the at least one processor core is further configured to:

generating the plurality of second instructions in case it is determined that the plurality of first instructions are all written to the at least one buffer; and

writing the plurality of second instructions to the at least one register.

6. The processor of claim 1, further comprising:

A memory configured to store at least one execution result of the at least one target first instruction;

and deleting the at least one execution result from the memory by the coprocessor under the condition that the at least one execution result is not related to the subsequent instructions of the at least one target first instruction.

7. The processor of any one of claims 1 to 6, wherein execution of each of the plurality of second instructions is independent of execution of each of the plurality of first instructions.

8. An integrated circuit chip, comprising:

the processor of any one of claims 1 to 7.

9. An instruction processing method, comprising:

generating a plurality of first instructions and a plurality of second instructions;

sequentially sending the first instructions to a plurality of coprocessors; and

in the event that it is determined that the at least one target first instruction is executed to completion, sending a target second instruction to the target coprocessor;

10. The method of claim 9, wherein the sending the target second instruction to the target coprocessor if it is determined that the at least one target first instruction is executed to completion comprises:

11. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 9 to 10.

12. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 9-10.

13. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of any of claims 9 to 10.