CN114637536A - Task processing method, computing coprocessor, chip and computer equipment - Google Patents

Task processing method, computing coprocessor, chip and computer equipment Download PDF

Info

Publication number
CN114637536A
CN114637536A CN202210304928.3A CN202210304928A CN114637536A CN 114637536 A CN114637536 A CN 114637536A CN 202210304928 A CN202210304928 A CN 202210304928A CN 114637536 A CN114637536 A CN 114637536A
Authority
CN
China
Prior art keywords
command
list
module
data processing
control information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210304928.3A
Other languages
Chinese (zh)
Inventor
马亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Denglin Technology Co ltd
Original Assignee
Shanghai Denglin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Denglin Technology Co ltd filed Critical Shanghai Denglin Technology Co ltd
Priority to CN202210304928.3A priority Critical patent/CN114637536A/en
Publication of CN114637536A publication Critical patent/CN114637536A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3877Concurrent instruction execution, e.g. pipeline or look ahead using a slave processor, e.g. coprocessor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3887Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Advance Control (AREA)

Abstract

The application belongs to the technical field of chips and discloses a task processing method, a computing coprocessor, a chip and computer equipment. The hardware command queue module receives command submission control information sent by the command queue management module, reads a command submission list from the memory based on the command submission control information, and sends command execution control information contained in the command submission list to the programmable command processing module; and a programmable command processing module reading the command execution list from the memory based on the received command execution control information and distributing the commands contained in the command execution list to the data processing units through the unit management module. In this way, commands can be distributed flexibly and efficiently.

Description

Task processing method, computing coprocessor, chip and computer equipment
Technical Field
The application relates to the technical field of chips, in particular to a task processing method, a computing coprocessor, a chip and computer equipment.
Background
In the application scenario of complex technologies such as a deep neural network, a computer device usually processes a computing task of the complex technology with the characteristics of large data volume, large computation amount, multiple computation types and the like through a computation coprocessor. The computing coprocessor is used for assisting the central processing unit to complete the processing work which cannot be executed or has low execution efficiency, and the computing coprocessor is developed and applied. The computing coprocessor comprises a front-end control engine and a data processing unit. The front-end control engine distributes each command of the complex-technology calculation task to each data processing unit in the calculation coprocessor based on the task instruction of the host processor, and executes the distributed command through the data processing unit.
However, there is often a problem of poor flexibility in distributing commands related to complex computational tasks by the front-end control engine.
Disclosure of Invention
The embodiment of the application aims to provide a task processing method, a computing coprocessor, a chip and computer equipment, which are used for flexibly distributing commands when relevant commands of complex computing tasks are distributed through a front-end control engine.
In one aspect, a task processing method is provided, which is applied to a front-end control engine in a computing coprocessor, the computing coprocessor further includes at least one data processing unit, the front-end control engine includes a hardware command queue module and a programmable command processing module, the hardware command queue module is based on fixed-function hardware execution commands, the programmable command processing module has a programmable function, and the method includes:
the hardware command queue module reads a command submission list from the memory based on the received command submission control information corresponding to the task to be processed, and sends command execution control information contained in the command submission list to the programmable command processing module;
and the programmable command processing module reads the command execution list from the memory based on the received command execution control information, and distributes the commands contained in the command execution list to the matched data processing units in the at least one data processing unit so that each data processing unit can execute the distributed commands.
In the implementation process, the programmable command processing module has the characteristic of high flexibility, so that the hardware command queue module is combined with the programmable command processing module, the flexibility of the front-end control engine for processing commands is improved, after the hardware command queue module is combined with the programmable command processing module, the hardware command queue module has the characteristic of high parallelism of the hardware command queue module and the characteristic of programmability of the programmable command processing module, complex programming design and high microprocessor design cost are not needed, the flexibility of the front-end control engine for processing commands is improved, and the parallelism and the high efficiency of task processing are maintained.
On one hand, the front-end control engine applied to the computing coprocessor is provided, the computing coprocessor further comprises at least one data processing unit, and the front-end control engine comprises a hardware command queue module and a programmable command processing module;
the hardware command queue module is used for reading a command submission list from the memory based on the received command submission control information corresponding to the task to be processed and sending command execution control information contained in the command submission list to the programmable command processing module;
and the programmable command processing module is used for reading the command execution list from the memory based on the received command execution control information and distributing the commands contained in the command execution list to the matched data processing units in the at least one data processing unit so that each data processing unit can execute the distributed commands.
In one aspect, a chip is provided, comprising a front-end control engine, the chip having at least one data processing unit, the front-end control engine comprising: a hardware command queue module and a programmable command processing module;
the hardware command queue module is used for reading a command submission list from the memory based on the received command submission control information corresponding to the task to be processed and sending command execution control information contained in the command submission list to the programmable command processing module;
and the programmable command processing module is used for reading the command execution list from the memory based on the received command execution control information and distributing the commands contained in the command execution list to the matched data processing units in the at least one data processing unit so that each data processing unit can execute the distributed commands.
In one aspect, a computing coprocessor is provided, comprising a front-end control engine for allocating commands to at least one data processing unit for executing the steps of the method provided in any of the various alternative implementations of task processing described above, and at least one data processing unit for executing the allocated commands.
In one aspect, a chip is provided, which includes a computing coprocessor and a memory, where the memory stores computer-readable instructions that, when executed by the computing coprocessor, perform the steps of the method provided in any of the various alternative implementations of task processing.
In one aspect, a computer device is provided comprising a computing coprocessor and a memory, the memory storing computer-readable instructions which, when executed by the computing coprocessor, perform the steps of the method provided in any of the various alternative implementations of task processing described above.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic structural diagram of a computing coprocessor according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a front-end control engine according to an embodiment of the present disclosure;
fig. 3 is a flowchart illustrating an implementation of a task processing method 300 according to an embodiment of the present disclosure;
FIG. 4 is a diagram illustrating a structure of a two-level list according to an embodiment of the present disclosure;
FIG. 5 is a diagram illustrating a structure of a three-level list according to an embodiment of the present disclosure;
fig. 6 is a flowchart of an implementation of a task switching method 600 according to an embodiment of the present application;
fig. 7 is a block diagram of a front-end control engine 700 according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. The components of the embodiments of the present application, as generally described in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
First, some terms referred to in the embodiments of the present application will be described to facilitate understanding by those skilled in the art.
The terminal equipment: may be a mobile terminal, a fixed terminal, or a portable terminal such as a mobile handset, station, unit, device, multimedia computer, multimedia tablet, internet node, communicator, desktop computer, laptop computer, notebook computer, netbook computer, tablet computer, personal communication system device, personal navigation device, personal digital assistant, audio/video player, digital camera/camcorder, positioning device, television receiver, radio broadcast receiver, electronic book device, gaming device, or any combination thereof, including the accessories and peripherals of these devices, or any combination thereof. It is also contemplated that the terminal device can support any type of interface to the user (e.g., wearable device), and the like.
A server: the cloud server can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server for providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, big data and artificial intelligence platforms and the like.
Graphics Processing Unit (GPU): the image processor is also called a display core, a visual processor and a display chip, and is a microprocessor which is specially used for image operation work on personal computers, workstations, game machines and some mobile devices (such as tablet computers, smart phones and the like).
Artificial Intelligence (AI): the method is a theory, method, technology and application system for simulating, extending and expanding human intelligence by using a digital computer or a machine controlled by the digital computer, sensing the environment, acquiring knowledge and obtaining the best result by using the knowledge. The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level.
In order to better execute more various and complex tasks and meet the requirements of various task scenes, the embodiment of the application provides a front-end control engine of a coprocessor, and also provides a task processing method, a computing coprocessor, a chip and computer equipment, wherein related commands of complex computing tasks can be flexibly distributed through the front-end control engine.
Fig. 1 is a schematic structural diagram of a computing coprocessor according to an embodiment of the present application. The compute coprocessor includes a front-end control engine, a command network, one or more data processing units, and a memory interface.
A computing coprocessor: which may be referred to as a coprocessor, is a processor in the computer device except for the host processor (i.e., host CPU) and is used to execute task commands issued by the host CPU.
For example, the computing coprocessor may be a heterogeneous computing processor, a GPU, a General-purpose graphics processing unit (GPGPU), and the like.
Optionally, the computer device may be a server or a terminal device, and is not limited herein. The computing coprocessor may have a variety of dedicated hardware computing units that handle computing tasks associated with complex technologies such as artificial intelligence. The command from the host CPU executed by the compute coprocessor may be in a synchronous or asynchronous manner.
In an application scenario of synchronous operation of a computing coprocessor and a host CPU, the computing coprocessor needs to wait for input data of a certain data processing task prepared by the host CPU, or the host CPU needs to wait for a certain operation result of the computing coprocessor.
A front-end control engine: the system is used for communicating with the host CPU, reading and analyzing a task instruction issued by the host CPU aiming at the task to be processed, and distributing each command for executing the task to be processed to each data processing unit according to the task instruction.
A data processing unit: the front-end control engine is connected with the command network and used for processing commands distributed by the front-end control engine, and the execution result can be returned to the front-end control engine.
Optionally, there may be one or more data processing units. The types of the plurality of data processing units may be the same or different. As in fig. 1, the data processing unit includes: data processing unit 0, data processing unit 1 … …, and data processing unit N. Wherein N is a natural number.
Alternatively, the Data processing unit may be a pure computing unit, such as convolution or matrix multiplication acceleration hardware for artificial intelligence operations, a programmable Single Instruction Multiple Data (SIMD) processor, a pure Data handling unit, and an application specific Data processing unit (e.g., video codec, picture processor), etc.
In an application scenario, multiple data processing units need to complete the same computing task synchronously, and the front-end control engine can perform synchronous operation on execution commands among the multiple data processing units. These synchronization operations may be accomplished by the front end control engine executing special wait commands or atomic operations.
A memory interface: for reading and writing the memory.
Fig. 2 is a schematic structural diagram of a front-end control engine according to an embodiment of the present disclosure. The structure of the front-end control engine in fig. 1 described above will be specifically described with reference to fig. 2. As shown in fig. 2, the front end control engine may include a host interface module, a hardware command queue module, and a programmable command processing module. Optionally, the front-end control engine may further include a command queue management module and/or an element management module.
A host interface module: the front-end control engine is used for realizing the communication between the host CPU and the front-end control engine and receiving commands such as command list updating or task suspension and the like sent by the host CPU.
Optionally, the host CPU may be in communication connection with the command queue management module through the host interface module, or may be in communication connection with the hardware command queue module directly through the host interface module. The host interface module may be a host interface register.
A command queue management module: the method is used for scheduling each hardware command queue module, controlling the switching of computing tasks and the like. The command queue management module may be programmable or non-programmable.
In one embodiment, the command queue management module receives commands such as command list update or task suspension issued by the host CPU through the host interface module, and allocates command submission control information for reading each command submission list to each idle hardware command queue module based on the commands issued by the host CPU.
The command submission control information at least comprises a storage address of the command submission list.
In one embodiment, the host CPU sets the host interface registers when determining that the command list is updated. The command queue management module reads register information (namely, a command issued by a CPU) in a host interface register, reads a command management list from a memory based on the register information when determining that a command list is updated based on the register information, acquires command submission control information contained in the command management list, and sequentially distributes the command submission control information corresponding to each command submission list to an idle hardware command queue module according to the priority of each command submission list.
A hardware command queue module: the command execution module is based on fixed function hardware execution commands and is used for submitting control information based on the distributed commands, reading and analyzing each command execution control information in the command submission list and sending each command execution control information to the programmable command processing module.
In one embodiment, the hardware command queue module receives command submission control information sent by the command queue management module.
In one embodiment, the hardware command queue module reads register information written by a host CPU into a host interface register, acquires command submission control information based on the register information, and sends each command execution control information in a command submission list to the programmable command processing module.
The front-end control engine comprises one or more hardware command queue modules, each hardware command queue module can be executed in parallel, and each hardware command queue module is correspondingly connected with one or more independent programmable command processing modules. The programmable command processing module has the characteristic of high flexibility, the hardware command queue module is combined with the programmable command processing module, the flexibility of the front-end control engine for processing commands is improved, and the plurality of hardware command queue modules can distribute the commands in parallel.
In one embodiment, each hardware command queue module is correspondingly connected with an independent programmable command processing module, and sends command execution control information to the connected programmable command processing module.
In practical applications, the programmable command processing modules connected to different hardware command queue modules may be the same or different, and are not limited herein.
The programmable command processing module: the command execution list is used for storing commands to be executed, the management register corresponding to the management module is used for storing the commands in the command execution list, and the management register corresponding to the management module is used for storing the commands in the command execution list.
The programmable command processing module can repeatedly execute a section of command in the command execution list in a conditional execution mode.
In one embodiment, the programmable command processing module may be a programmable command processor, and may be a Reduced Instruction Set Computing (RISC) processor.
In practical applications, the programmable command processing module may be set according to practical application scenarios, which is not limited herein.
Therefore, through the conditional execution mode of the programmable command processing module, a section of command in the command execution list can be repeatedly executed on the computing coprocessor, the host CPU is not required to frequently prepare each command list for the computing coprocessor, and the communication bandwidth between the front-end control engine and the host CPU is reduced. For complex computing tasks of technologies such as deep learning networks and the like, whether a subsequent command is executed depends on the result of the current command execution, if a front-end control engine only adopts fixed-function hardware to execute the command, the command operation result needs to be repeatedly returned to a host CPU, and whether the subsequent command is executed or not is judged according to the command issued by the host CPU to the command operation result, so that a gap is generated during command execution, and further the computing efficiency is reduced. In the embodiment of the application, the programmable command processing module can directly judge the result of the current data processing command, so that the processing time of waiting for the CPU of the host computer is avoided, and the load of the CPU of the host computer is also reduced. And, by multiple programmable processors that can be executed in parallel, the hardware complexity of designing a single high-performance processor can be avoided, as well as frequent context switching operations caused by implementing multi-threaded operations through software. Furthermore, different from a boot mode (Bootloader) adopted by a traditional microprocessor, the programmable command processing module can be directly started by the hardware command queue module, the input of each start is a section of command to be executed in the command execution list, and after the execution of the command execution list is finished, the programmable command processing module can directly enter a dormant state, so that the consumed power consumption is saved.
A unit management module: and the data processing unit is used for reading each command written into the management register by each programmable command processing module and distributing each command of the management register to the matched data processing unit in each data processing unit according to the state of each data processing unit.
Therefore, the programmable command processing module interacts with the unit management module in a mode of reading and writing the management register, and the expandability and the programmability are improved.
Referring to fig. 3, a flowchart of an implementation of a task processing method 300 according to an embodiment of the present application is provided, where the method may be performed by a front-end control engine of a coprocessor, for example, the front-end control engine shown in fig. 1 or fig. 2. The task processing method will be specifically described below with reference to the computing coprocessor shown in fig. 1 and the front-end control engine shown in fig. 2. The specific implementation flow of the method can comprise the following steps:
step 301: and the hardware command queue module reads the command submission list from the memory based on the received command submission control information corresponding to the task to be processed.
Specifically, when the host CPU determines that there is a task to be processed, the command list is updated, and a command list update instruction is issued to the front-end control engine through the host interface module. The front-end control engine selects at least one command submitting control information corresponding to the task to be processed in sequence based on the command list updating instruction according to an allocation rule, and allocates the selected command submitting control information to the idle hardware command queue module. Each hardware command queue module performs the following steps: when it is determined that the command submission control information is received, a command submission list is read from the memory based on the command submission control information.
In one embodiment, software (e.g., an application) of the host CPU generates command lists for executing tasks to be processed, stores the command lists in the memory, and issues a command list update instruction to the front-end control engine.
The command list includes at least two levels, and optionally, the command list may include two levels, a level 2 list, i.e., a command submission list, and a level 1 list, i.e., a command execution list. The command list may also include three levels of lists, a level 3 list, i.e., a command management list, a level 2 list, i.e., a command submission list, and a level 1 list, i.e., a command execution list.
Fig. 4 is a diagram illustrating a structure of a two-level list. The command submission list and the command execution list are included in fig. 4. Fig. 5 is a diagram illustrating a structure of a three-level list. Included in fig. 5 are a command management list, a command submission list, and a command execution list.
Wherein, the command execution list contains M1 commands, namely command 0, command 1 … … command M1, which can be read and resolved by the programmable command processing module, and the resolved command is sent to the data processor unit of the heterogeneous computer for execution.
The command submission list includes command execution control information of M2 command execution lists, namely, command execution list 0, command execution list 1 … …, and command execution list M2. The command execution control information in the command submission list may be sent by the hardware command queue module to the programmable command processing module.
Alternatively, the command submission list may be designed as a circular buffer structure or a linked list structure.
Wherein the ring buffer structure accomplishes the management of the command submission list through a write pointer and a read pointer. When the software (e.g., software driver) of the host processor adds a new command execution control information of the command execution list to the command submission list, the host processor updates the write pointer, and when the hardware command queue module reads the command execution control information of the command execution list, the hardware command queue module updates the read pointer.
If the ring buffer structure is adopted, software does not need to frequently allocate memory, the consumed storage resources are reduced, and the operation steps are simplified. Optionally, the command submission list may also store commands of hardware command types, and be executed by fixed-function hardware of the front-end control engine, such as synchronization operations of some command queues or cache maintenance commands of the programmable command processing module.
The command management list contains command submission control information for M3 command submission lists, namely, command submission list 0, command submission list 1, command submission list 2 … …, and command submission list M3.
Wherein M1, M2 and M3 are all natural numbers.
In the embodiment of the application, by introducing the third-level list, namely the command management list, the consumed resources of the on-chip register (namely the host interface register) can be reduced. When the number of command submission lists is large, if the information of each command submission list is stored, a large amount of hardware resources are consumed, and the burden of the host CPU for reading and writing the host interface register is increased. After storing this information in the form of a third-level list in the memory, the host CPU only needs to specify the memory address and length of the command management list through the host interface register. More command submission lists can be supported, and good expandability is achieved.
It should be noted that the length of the list can be adjusted according to different data processing tasks.
In practical application, the number, level and length of the command list may be set according to a practical application scenario, and are not limited herein.
In the embodiment of the present application, the command list adopts an organization form of a multi-level structure. The advantage of using a multi-level command structure is that hardware can execute commands asynchronously. If only one level of command list is available, the current list needs to be waited for releasing the storage resource, a plurality of command lists prepared by software of the CPU end of the host computer are blocked, and the hardware command queue module cannot read the next command list in advance, so that the performance of the processor is reduced.
When acquiring and distributing command submission control information based on the difference of the number of stages of the command list, any one of the following modes can be adopted:
mode 1: the command queue management module receives command management control information issued by a host CPU through the host interface module, reads a command management list from the memory based on the command management control information, and respectively sends at least one command submission control information contained in the command management list to the idle hardware command queue module in each hardware command queue module.
Wherein the command management control information is used to indicate whether the command list is updated.
Mode 2: the command queue management module receives at least one command submission control message issued by the host CPU through the host interface module, and respectively sends the at least one command submission control message to the idle hardware command queue module in each hardware command queue module.
Optionally, the host interface module may be a host interface register and a data communication interface, which are readable and writable by the host.
In one embodiment, when acquiring and distributing the command submission control information, the following steps may be adopted:
s3011: and when the host CPU determines that the command list is updated, setting operation for indicating the command list to be updated is carried out on the host interface register.
In one embodiment, a status flag bit indicating whether the command is updated is set in the host interface register. One Bit (Bit) of the host interface register indicates the update status of a command list. If the status flag bit is 1, it indicates that the command list is updated, and if the status flag bit is 0, it indicates that the command is not updated. When the host CPU determines that the command list is updated, a write operation is performed on the host interface register to set the corresponding status flag bit in the host interface register to 1.
In practical application, the status flag bit may be set according to a practical application scenario, which is not limited herein.
Optionally, when the host CPU determines that the command list is updated, control information for acquiring the command list may be written, so that the front-end control engine may read the indicated corresponding command list from the memory based on the control information of the command list.
S3012: the front end control engine reads the register information in the host interface register.
Specifically, the front end control engine may periodically read register information in the host interface registers.
The register information is information written into the host interface register by the host processor. The register information includes a command list update status determined based on the status flag bit.
In the embodiment of the application, the front-end control engine and the host CPU interact with each other in a fixed function hardware (for example, a host interface register), and the host CPU can directly submit a command to the front-end control engine in a register reading and writing mode. Compared with a mode of directly interacting with a host CPU through a programmable command processing module, the mode provided by the embodiment of the application has the advantages of simple software programming, higher interaction efficiency, good compatibility with the existing computing programming interface and the like, simplifies the programming interface of the heterogeneous computing processor to the host CPU, avoids repeated query of a front-end control engine to a command list in a memory, and saves storage bandwidth and memory access power consumption.
Further, after determining that the register information in the host interface register is read, the host CPU performs a setting operation on the host interface register to indicate that the command list is not updated currently, e.g., a clear operation may be performed on the host interface register.
S3013: and if the command list updating state contained in the register information represents command list updating, the front-end control engine acquires at least one command submission control information based on the register information.
Specifically, when the front-end control engine acquires the command and submits the control information, any one of the following modes may be adopted:
mode 1: based on the register information, at least one command submission control information is obtained.
Optionally, the command submission control information may be one or more, and each command submission control information is used to read one command submission list. The command submission list may be a circular buffer structure or a linked list structure.
If the command submission list is of a ring buffer structure, the command submission control information includes a storage address and a read-write pointer of the command submission list. If the command submission list is a linked list structure, the command submission control information includes a storage address and a length of the command submission list.
In one embodiment, at least one command submission control information contained in the register information is obtained.
Specifically, the host CPU writes register information containing command submission control information and a command list update status in the host interface register. The front-end control engine acquires at least one command submission control information contained in the register information.
In one embodiment, one bit of the host interface register indicates an update status of a command submission list, a correspondence between a bit position and command submission control information of the command submission list is preset, and at least one piece of command submission control information is obtained according to the correspondence and the bit position indicating the update of the command list in the host interface register.
Mode 2: the method includes reading a command management list from a memory based on register information, and acquiring at least one command submission control information contained in the command management list.
In one embodiment, the command management control information contained in the register information is acquired, the command management list is read from the memory based on the command management control information, and one or more command submission control information is selected from the command management list according to an allocation rule to allocate the selected command submission control information to the corresponding hardware command queue module.
Wherein the command management list holds command submission control information for each command submission list.
In one embodiment, one bit of the host interface register indicates an update status of a command management list, a corresponding relationship between a bit position and command management control information of the command management list is preset, and at least one piece of command management control information is obtained according to the corresponding relationship and the bit position indicating the update of the command list in the host interface register. The method further includes reading a command management list from the memory based on the command management control information, and selecting one or more command submission control information from the command management list according to an allocation rule to allocate the selected command submission control information to the corresponding hardware command queue module.
Optionally, the allocation rule may adopt at least one of the following modes:
mode 1: and distributing the command submission control information to the corresponding hardware command queue module according to a polling algorithm.
Mode 2: the command submission control information is distributed to the respective hardware command queue modules according to a greedy algorithm.
Mode 3: and distributing the command submission control information to the corresponding hardware command queue modules according to the sequence of the command submission control information.
Mode 4: and distributing the command submission control information to the corresponding hardware command queue module according to the priority of the command submission control information.
Wherein the polling algorithm is a fair selection algorithm. A greedy algorithm is an algorithm that can be performed with a certain list in priority. The list priority may be set according to a user indication.
In practical applications, the allocation rule may be set according to practical application scenarios, which is not limited herein.
S3014: the front-end control engine distributes at least one command submission control information to a hardware command queue module which is idle in the front-end control engine.
Specifically, the following steps are executed in a loop until there is no pending command submission control information:
if the idle hardware command queue module is determined to exist, adopting an allocation rule, submitting control information from a command selected from the command management list, and sending the control information to the idle hardware command queue module.
It should be noted that the execution subjects of S3012-S3014 may be command queue management modules in the front-end control engine.
It should be noted that the commands of the computing task are usually stored in the memory in the form of a command list. The commands of the same command submission list may be executed sequentially by a hardware command queue module in the front end control engine. The commands of different command submission lists may be in parallel at multiple hardware command queue modules of the front-end control engine. In order to improve the parallelism of command execution, the commands of the independent computing tasks are often distributed into different command submission lists, so that the tasks can be processed in parallel by utilizing a plurality of hardware command queue modules of the front-end control engine. The number of command submission lists is a natural number and may be greater than the number of actual hardware command execution queues. Therefore, the front-end control engine is required to schedule a plurality of command submission lists, and each command submission list is sent to a free hardware command queue module for execution through an allocation rule.
Step 302: and the hardware command queue module sends the command execution control information contained in the command submission list to the programmable command processing module.
Specifically, each hardware command queue module respectively executes the following steps for each entry in the command submission list in sequence: and if determining that one to-be-processed item in the command submission list belongs to the control information type, sending the item analysis information of the to-be-processed item belonging to the control information type to the programmable command processing module as command execution control information.
In one embodiment, a hardware command queue module is taken as an example for explanation, and the hardware command queue module sequentially executes the following steps for each entry in a command submission list:
s3021: analyzing a to-be-processed item in the command submission list to obtain item analysis information of the to-be-processed item.
S3022: and if the entry to be processed is determined to belong to the control information type based on the entry analysis information, sending the entry analysis information of the entry to be processed to a programmable command processing module as command execution control information.
The information types may include: a control information type and a hardware command type.
Optionally, when determining the information type of the entry, any one of the following manners may be adopted:
mode 1: and acquiring the information type contained in the item analysis information.
Mode 2: and acquiring the information type set for the item analysis information.
Specifically, the corresponding relationship between the item analysis information and the information type is preset, so that the information type of any item can be acquired according to the corresponding relationship.
Thus, the command execution control information in the command submission list can be sent to the programmable command processing module to execute the received command execution control information by the programmable command processing module.
It should be noted that the front-end control engine may also include a hardware command queue module, and each hardware command queue module may be connected to one or more programmable command processing modules, so that the hardware command queue module may send one command execution control message to the corresponding connected programmable command processing module at a time.
S3023: and if the entry analysis information is determined to belong to the hardware command type, sending the entry analysis information to corresponding fixed function hardware, so that the fixed function hardware executes the entry analysis information.
In one embodiment, the entry resolution information is assigned to each fixed function hardware according to the function of the fixed function hardware.
Further, if the command submission list is of a ring buffer structure, after determining that the entry or the entry parsing information is sent to another module, sending a read pointer update instruction to the memory, so that the memory updates the read pointer of the command submission list based on the read pointer update instruction.
If the command submission list is in a linked list structure, after determining that the item or the item analysis information is sent to other modules, judging whether the item is the last item in the command submission list, and if so, jumping to the next command submission list to be executed.
Step 303: and a programmable command processing module for reading the command execution list from the memory based on the received command execution control information.
Step 304: and the programmable command processing module distributes the commands contained in the command execution list to the matched data processing units in the at least one data processing unit so that each data processing unit can execute the distributed commands.
Specifically, the following steps are executed in a loop until the command execution list is determined to be executed:
s3041: and decoding a target entry to be executed currently in the command execution list to obtain a decoded command.
S3042: and if the decoded command is determined to belong to the management command type, distributing the decoded command to the matched data processing unit, or distributing the decoded command to the matched data processing unit through the unit management module.
Specifically, the command of the management command type is a management command that needs to be processed by the data processor unit.
When the decoded command is distributed to the matched data processing unit through the unit management module, the following steps can be adopted:
each programmable command processing module stores the decoded command to the management register. And the unit management module reads the command in the management register and distributes the read command to a target unit in the at least one data processing unit according to the unit type corresponding to the read command. The target unit is the matched data processing unit in the at least one data processing unit. There may be one or more matching target units.
The method may further include determining, from the at least one data processing unit, that each command matches a corresponding target unit according to a unit type corresponding to each command in the command execution list, and then performing command allocation to allocate the command read from the management register to the matched target unit.
Wherein the unit types are divided according to the functions of the data processing units. The unit types of the different data processing units may be the same or different.
Optionally, in the process of allocating the command to the target unit in the at least one data processing unit according to the unit type corresponding to the command, the following manner may be adopted:
if a plurality of commands corresponding to the unit type of the target unit are determined, determining the command priority of each command corresponding to the unit type of the target unit, and sequentially sending each command corresponding to the unit type of the target unit to the idle data processing units in the target unit according to the command priority.
In practical application, the command priority may be set according to a practical application scenario, for example, the order in which the unit management module reads the commands is not limited herein.
Thus, the unit management module of the front-end control engine arbitrates the acquired commands, determines the data processing units matched with the commands according to the arbitration result, and sends the commands to the matched and idle data processing units through the command network according to the command priority, the operation state and the unit type of the data processing units.
In one embodiment, the data processing unit is started after the data processing unit matched with the command is determined.
Further, after the unit management module determines that the command is executed, the unit management module sends a command execution result to the programmable command processing module corresponding to the executed command. The command execution result may be sent after execution of a command or a group of commands is completed.
It should be noted that each programmable command processing module writes a command to the management register of the same unit management module. That is, the unit management module may receive the commands sent by the programmable command processing modules, and since different commands may need to be processed by the same data processing unit, the unit management module needs to arbitrate and distribute the read commands.
For example, if the programmable command processing module determines that the decoded command belongs to an operation command type, such as a command for operations of addition, subtraction, multiplication, logic, or the like, the decoded command may be sent to the operation module. If the decoded command is determined to belong to a memory access command type, such as a command for calculating a memory access address, the memory access interface may be invoked to execute the decoded command. If the decoded command is determined to belong to the control command type, the read pointer of the command execution list can be updated and the execution sequence of the command execution list can be changed after judgment is carried out based on the decoded command.
Therefore, the programmable command processing module can complete command management of a complex computing task for the combination of commands of various command types, and can support flexible processing of various tasks in complex scenes.
After the decoded command is distributed to the matched data processing unit, S3043 may be further executed: and receiving a command execution result returned by the data processing unit.
And if the command is determined to be required to be sequentially executed based on the command execution result, updating the next entry to the target entry to be currently executed according to the entry sequence in the command execution list.
And if the jump execution command is determined to be needed based on the command execution result, determining a target jump entry in the command execution list based on the command execution result, and updating the target jump entry to a target entry to be executed currently.
In both of the above manners, the item to be processed next can be switched and selected as the object of the next decoding process according to the command execution result, that is, the target item in the step S3041 is updated.
In the embodiment of the application, if the host CPU needs the computing coprocessor to execute a higher-priority data processing task, the currently processed computing task may be suspended by the front-end control engine to switch the high-priority data processing task.
Optionally, the task processing method according to the embodiment of the present application may further include a task switching method, and referring to fig. 6, the task switching method for implementing the flowchart of the task switching method 600 according to the embodiment of the present application may include:
step 601: and if receiving a task suspension notice sent by the host processor, sending a task suspension command to each hardware command queue module.
In one embodiment, when the host CPU determines that a higher priority data processing task needs to be executed by the compute coprocessor, it issues a task suspension notification to the front-end control engine by writing a host interface register. The command queue management module sends a task suspension command to the hardware command queue module after receiving a task suspension notice issued by the CPU of the host by reading the interface register of the host.
Step 602: each hardware command queue module stops reading the command submission list based on the received task suspension command, stores a read pointer of the current command submission list, and sends a task suspension request to the programmable command processing module.
Specifically, after determining that a task suspension command is received, the hardware command queue module stops reading the command submission list, stores a read pointer of the current command submission list (i.e., the command submission list for stopping reading), sends a task suspension request to the programmable command processing module in an interrupt manner, and waits for the programmable command processing module to be idle.
Step 603: and the programmable command processing module stops reading the command execution list based on the task suspension request.
Specifically, after receiving a task termination request sent in an interrupt manner, the programmable command processing module starts to execute task switching, i.e., stops reading the command execution list, and stores a read pointer of the current command execution list and a value of the general register in the memory.
Step 604: the programmable command processing module stores a read pointer of a command execution list for stopping reading and sends an early stop signal to each data processing unit.
Specifically, the programmable command processing module may also wait for the data processing unit to process the current command.
In one embodiment, the programmable command processing module may employ the steps of:
s6041: the programmable command processing module sends a task suspension command to the unit management module and waits for the completion of the data processing unit module.
In the embodiment of the application, the programmable command processing module simplifies the flow of switching hardware processing tasks in a programmable mode, only needs to store a small amount of rule information such as a read pointer and a general register of a command list, avoids the problems of state and irregular hardware information of a large amount of hardware execution needing to be preserved, facilitates storage and debugging of a memory, and also facilitates reloading of information when a task is continued next time.
S6042: the unit management module waits for the data processing unit to process the current command or sends an early termination signal to the data processing unit.
Specifically, when 6042 is executed, any one or a combination of the following ways may be adopted:
mode 1: the unit management module waits for the data processing unit to complete processing of the assigned command.
Mode 2: the unit management module sends an early termination signal to the data processing unit.
Step 605: and after determining that the task is suspended, the front-end control engine returns task suspension completion information to the host processor.
Specifically, after the front-end control engine determines that the data processing unit stops operating, task termination completion information is returned to the host processor.
When step 605 is executed, the following steps may be adopted:
s6051: and after determining that the stop operation of each data processing unit is completed, the unit management module sends a switching completion signal to the programmable command processing module.
S6052: and after the front-end control engine determines that all the hardware command queue modules stop tasks, the front-end control engine sends task stopping completion information to the host CPU through the host interface module.
Optionally, the front-end control engine may send the task suspension completion information to the host CPU through the host interface register in an interrupt manner.
Step 606: and when the host CPU determines that the task suspension completion information is received, the host CPU issues a task switching instruction to the front-end control engine.
Specifically, the host CPU determines that all tasks are terminated, and issues a task switching instruction to the front-end control engine after each command list is updated.
Step 607: the front-end control engine obtains command submission control information based on the task switching instruction, and executes the switched task based on the command submission control information.
Through the implementation mode, the task switching can be completed with less interaction between the host processor and the coprocessor, the computing tasks with various priorities can be efficiently and flexibly executed under the condition of facing complex scenes, and for example, the task switching processing can be rapidly carried out when a task with a high priority is temporarily received.
Specifically, when step 607 is executed, the specific steps refer to step 301 to step 304 above, so as to execute the data processing task with high priority, which is not described herein again.
Further, after the front-end control engine determines that the processing of the data processing task with the high priority is completed, the suspended computing task can be re-executed based on the stored suspension task information. And will not be described herein.
In the embodiment of the application, the front-end control engine of the coprocessor adopts the host interface register to interact with the host CPU, so that the software programming is simple, the interaction efficiency is higher, and the compatibility with the existing computing programming interface is good. The programming interface of the heterogeneous computing processor to the host CPU is simplified, the repeated query of the front-end control engine to the command list in the memory is avoided, and the memory bandwidth and the memory access power consumption are saved. In addition, considering that the programmability of the hardware command queue module is low, the parallel processing efficiency of the single-thread microprocessor is low, the design cost of the multi-thread or multi-core microprocessor is high, and the programming is complex, in the embodiment of the application, a plurality of hardware command queue modules are combined with the programmable command processing module, and each hardware command queue module is respectively connected with at least one programmable command processing module, so that the characteristics of high parallelism of the hardware command queue modules and the programmability of the programmable command processing modules can be combined, and the complex programming design and the high microprocessor design cost are not needed. On the premise of not increasing hardware cost, the flexibility of processing commands by the front-end control engine is improved, the parallelism and the high efficiency of task processing are kept, and the computing task switching process is simplified. Furthermore, the command list adopts a multilevel structure organization form, so that the hardware command queue module can asynchronously execute commands, does not need to wait for the current command submission list to release storage resources, does not block a plurality of command lists prepared by a CPU (central processing unit) of the host, can read the next command submission list in advance, and greatly improves the performance of the computing coprocessor. Furthermore, a plurality of hardware command queue modules can execute different computing tasks in parallel, and can read, analyze and send commands to the programmable command processing module at the same time.
Based on the same inventive concept, the embodiment of the present application further provides a front-end control engine, and since the principle of the front-end control engine for solving the problem is similar to the task processing method, the implementation of the front-end control engine may refer to the implementation of the method, and repeated details are not repeated.
As shown in fig. 7, which is a schematic structural diagram of a front-end control engine 700 according to an embodiment of the present disclosure, the front-end control engine 700 may be applied to a computing coprocessor, the computing coprocessor further includes at least one data processing unit, the front-end control engine 700 includes a hardware command queue module 701 and a programmable command processing module 702;
the hardware command queue module 701 is configured to read a command submission list from a memory based on command submission control information corresponding to a received to-be-processed task, and send command execution control information included in the command submission list to the programmable command processing module 702;
the programmable command processing module 702 is configured to read a command execution list from the memory based on the received command execution control information, and allocate the command included in the command execution list to a matching data processing unit of the at least one data processing unit, so that each data processing unit executes the allocated command.
In one embodiment, the front-end control engine 700 further includes a command queue management module 703, where the command queue management module 703 is configured to: receiving command management control information sent by a host processor, wherein the host processor and a computing coprocessor are positioned in the same computer device; if the command list is determined to be updated based on the command management control information, reading the command management list from the memory based on the command management control information, wherein the command management list comprises command submission control information corresponding to at least one command submission list; at least one piece of command submission control information included in the command management list is sent to the idle hardware command queue module 701 in each hardware command queue module 701.
In one embodiment, the hardware command queue module 701 is configured to: for each entry in the command submission list, the following steps are performed in turn:
if it is determined that one entry to be processed in the command submission list belongs to the control information type, the entry parsing information of the entry to be processed belonging to the control information type is sent to the programmable command processing module 702 as command execution control information.
In one embodiment, the programmable command processing module 702 is configured to: executing the following steps in a circulating manner until the command execution list is determined to be completed:
decoding a target item to be executed currently in the command execution list to obtain a decoded command;
if the decoded command is determined to belong to the management command type, distributing the decoded command to the matched data processing unit;
receiving a command execution result returned by the data processing unit;
if the command is determined to be executed in sequence based on the command execution result, updating the next entry to the current target entry to be executed according to the entry sequence in the command execution list;
and if the jump execution command is determined to be needed based on the command execution result, determining a target jump entry in the command execution list based on the command execution result, and updating the target jump entry to a target entry to be executed currently.
In one embodiment, the front-end control engine 700 further includes at least one element management module 704, and the programmable command processing modules 702 are configured to: storing the decoded command to a management register; element management module 704 is to: and reading the command in the management register, and distributing the command to a target unit in at least one data processing unit according to a unit type corresponding to the command, wherein the unit type is divided according to the functions of the data processing units.
In one embodiment, at least one of the target units, the unit management module 704 is configured to: if a plurality of commands corresponding to the unit type of the target unit are determined, determining the command priority of each command corresponding to the unit type of the target unit; and sequentially sending each command corresponding to the unit type of the target unit to the idle data processing unit in the target unit according to the command priority.
In one embodiment, the hardware command queue module 701 is multiple, and the front-end control engine 700 is configured to:
if a task suspension notification sent by the host processor is received, a task suspension command is sent to each hardware command queue module 701;
each hardware command queue module 701 is configured to: based on the received task abort command, stop reading the command submission list, store the read pointer of the current command submission list, and send an abort task request to the programmable command processing module 702;
the programmable command processing module 702 is configured to: stopping reading the command execution list based on the suspension task request, storing a read pointer of the command execution list for which reading is stopped, and transmitting an early suspension signal to each data processing unit;
the front-end control engine 700 is configured to: and after the task is determined to be terminated, returning task termination completion information to the host processor so that the host processor issues a task switching instruction based on the task termination completion information.
Based on the same inventive concept, an embodiment of the present application further provides a chip, including a front-end control engine, where the chip has at least one data processing unit, and the front-end control engine includes: a hardware command queue module and a programmable command processing module;
the hardware command queue module is used for reading a command submission list from the memory based on the received command submission control information corresponding to the task to be processed and sending command execution control information contained in the command submission list to the programmable command processing module;
and the programmable command processing module is used for reading the command execution list from the memory based on the received command execution control information and distributing the commands contained in the command execution list to the matched data processing units in the at least one data processing unit so that each data processing unit can execute the distributed commands.
Since the principle of the chip is similar to that of the task processing method, the implementation of the chip can refer to the implementation of the method, and repeated details are not repeated. For other details of the front-end control engine in the chip, reference may be made to the description of other locations in this document, and details are not repeated here.
The embodiment of the present application further provides a computing coprocessor, which includes the foregoing front-end control engine and at least one data processing unit, where the front-end control engine is configured to adopt the steps in the foregoing embodiments to assign a command to the at least one data processing unit, and the at least one data processing unit is configured to execute the assigned command.
An embodiment of the present application further provides a computer device, which includes the foregoing computing coprocessor and a memory, where the memory stores computer-readable instructions, and the computer-readable instructions are executed by the computing coprocessor to perform the steps in the foregoing embodiments. Optionally, the computer device may further comprise a host processor for interacting with the computing co-processor. The application does not limit the specific type of host processor.
For further details regarding the front-end control engine in the embodiments of the computing coprocessor, the computer device and other products, please refer to the related description in the foregoing embodiments, which will not be described herein.
For convenience of description, the above parts are separately described as modules (or units) according to functional division. Of course, the functionality of the various modules (or units) may be implemented in the same one or more pieces of software or hardware when implementing the present application.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, hardware product, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A task processing method is applied to a front-end control engine in a computing coprocessor, the computing coprocessor further comprises at least one data processing unit, the front-end control engine comprises a hardware command queue module and a programmable command processing module, the hardware command queue module is based on fixed-function hardware execution commands, the programmable command processing module has programmable functions, and the method comprises the following steps:
the hardware command queue module reads a command submission list from a memory based on command submission control information corresponding to the received task to be processed, and sends command execution control information contained in the command submission list to the programmable command processing module;
and the programmable command processing module reads a command execution list from a memory based on the received command execution control information, and distributes the commands contained in the command execution list to the matched data processing units in the at least one data processing unit so that each data processing unit can execute the distributed commands.
2. The method of claim 1, wherein the front end control engine further comprises a command queue management module, the method further comprising:
the command queue management module performs the steps of:
receiving command management control information sent by a host processor, wherein the host processor and the computing coprocessor are positioned in the same computer device;
if it is determined that command list updating exists based on the command management control information, reading a command management list from a memory based on the command management control information, wherein the command management list comprises command submission control information corresponding to at least one command submission list;
and respectively sending at least one command submission control information contained in the command management list to the idle hardware command queue module in each hardware command queue module.
3. The method of claim 1, wherein sending command execution control information contained in the command submission list to the programmable command processing module comprises:
sequentially executing the following steps respectively aiming at each entry in the command submission list:
and if determining that one to-be-processed item in the command submission list belongs to the control information type, sending item analysis information of the to-be-processed item belonging to the control information type to the programmable command processing module as command execution control information.
4. The method of any of claims 1-3, wherein said assigning the commands contained in the command execution list to matching ones of the at least one data processing unit comprises:
executing the following steps in a circulating manner until the command execution list is determined to be completed:
decoding a target item to be executed currently in the command execution list to obtain a decoded command;
if the decoded command is determined to belong to the management command type, distributing the decoded command to a matched data processing unit;
receiving a command execution result returned by the data processing unit;
if the command is determined to be executed in sequence based on the command execution result, updating the next entry to the current target entry to be executed according to the entry sequence in the command execution list;
and if the command needs to be jumped and executed based on the command execution result, determining a target jump entry in the command execution list based on the command execution result, and updating the target jump entry to a target entry to be executed currently.
5. The method of claim 4, wherein the front-end control engine further comprises a unit management module, the programmable command processing module being at least one, the distributing the decoded command to a matching data processing unit comprising:
each programmable command processing module stores the decoded command to a management register;
and reading the command in the management register through the unit management module, and distributing the command to a target unit in the at least one data processing unit according to a unit type corresponding to the command, wherein the unit type is divided according to the functions of the data processing units.
6. The method of claim 5, wherein at least one of said target units, said assigning said command to a target unit of said at least one data processing unit, comprises:
if a plurality of commands corresponding to the unit type of the target unit are determined, determining the command priority of each command corresponding to the unit type of the target unit;
and sequentially sending each command corresponding to the unit type of the target unit to the idle data processing unit in the target unit according to the command priority.
7. The method of any of claims 1-3, wherein the hardware command queue module is plural, the method further comprising:
if a task suspension notice sent by a host processor is received, sending a task suspension command to each hardware command queue module;
each hardware command queue module stops reading the command submission list based on the received task suspension command, stores a read pointer of the current command submission list, and sends a task suspension request to the programmable command processing module;
the programmable command processing module stops reading the command execution list based on the task stopping request, stores a read pointer of the command execution list for stopping reading, and sends an early stopping signal to each data processing unit;
and after the task is determined to be suspended, returning task suspension completion information to the host processor so that the host processor can issue a task switching instruction based on the task suspension completion information.
8. A chip comprising a front-end control engine, said chip having at least one data processing unit, said front-end control engine comprising: a hardware command queue module and a programmable command processing module;
the hardware command queue module is used for reading a command submission list from a memory based on command submission control information corresponding to the received task to be processed and sending command execution control information contained in the command submission list to the programmable command processing module;
the programmable command processing module is configured to read a command execution list from a memory based on the received command execution control information, and allocate a command included in the command execution list to a matched data processing unit in the at least one data processing unit, so that each data processing unit executes the allocated command.
9. A computing coprocessor comprising a front-end control engine for distributing commands to at least one data processing unit using the method of any of claims 1-7, and at least one data processing unit for executing the distributed commands.
10. A computer device comprising a computing coprocessor and a memory, said memory storing computer-readable instructions that, when executed by said computing coprocessor, perform the method of any of claims 1-7.
CN202210304928.3A 2022-03-25 2022-03-25 Task processing method, computing coprocessor, chip and computer equipment Pending CN114637536A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210304928.3A CN114637536A (en) 2022-03-25 2022-03-25 Task processing method, computing coprocessor, chip and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210304928.3A CN114637536A (en) 2022-03-25 2022-03-25 Task processing method, computing coprocessor, chip and computer equipment

Publications (1)

Publication Number Publication Date
CN114637536A true CN114637536A (en) 2022-06-17

Family

ID=81949055

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210304928.3A Pending CN114637536A (en) 2022-03-25 2022-03-25 Task processing method, computing coprocessor, chip and computer equipment

Country Status (1)

Country Link
CN (1) CN114637536A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115994115A (en) * 2023-03-22 2023-04-21 成都登临科技有限公司 Chip control method, chip set and electronic equipment
CN116339944A (en) * 2023-03-14 2023-06-27 海光信息技术股份有限公司 Task processing method, chip, multi-chip module, electronic device and storage medium
CN116957908A (en) * 2023-09-20 2023-10-27 上海登临科技有限公司 Hardware processing architecture, processor and electronic equipment

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116339944A (en) * 2023-03-14 2023-06-27 海光信息技术股份有限公司 Task processing method, chip, multi-chip module, electronic device and storage medium
CN116339944B (en) * 2023-03-14 2024-05-17 海光信息技术股份有限公司 Task processing method, chip, multi-chip module, electronic device and storage medium
CN115994115A (en) * 2023-03-22 2023-04-21 成都登临科技有限公司 Chip control method, chip set and electronic equipment
CN115994115B (en) * 2023-03-22 2023-10-20 成都登临科技有限公司 Chip control method, chip set and electronic equipment
CN116957908A (en) * 2023-09-20 2023-10-27 上海登临科技有限公司 Hardware processing architecture, processor and electronic equipment
CN116957908B (en) * 2023-09-20 2023-12-15 上海登临科技有限公司 Hardware processing architecture, processor and electronic equipment

Similar Documents

Publication Publication Date Title
CN113535367B (en) Task scheduling method and related device
CN114637536A (en) Task processing method, computing coprocessor, chip and computer equipment
US8914805B2 (en) Rescheduling workload in a hybrid computing environment
US8972699B2 (en) Multicore interface with dynamic task management capability and task loading and offloading method thereof
US8739171B2 (en) High-throughput-computing in a hybrid computing environment
US20160350245A1 (en) Workload batch submission mechanism for graphics processing unit
US20150143382A1 (en) Scheduling workloads and making provision decisions of computer resources in a computing environment
US20110219373A1 (en) Virtual machine management apparatus and virtualization method for virtualization-supporting terminal platform
US11347546B2 (en) Task scheduling method and device, and computer storage medium
CN111209046A (en) Multitask-oriented embedded SPARC processor operating system design method
US11175919B1 (en) Synchronization of concurrent computation engines
CN113495780A (en) Task scheduling method and device, storage medium and electronic equipment
CN111597044A (en) Task scheduling method and device, storage medium and electronic equipment
CN112925616A (en) Task allocation method and device, storage medium and electronic equipment
EP4386554A1 (en) Instruction distribution method and device for multithreaded processor, and storage medium
CN114816777A (en) Command processing device, method, electronic device and computer readable storage medium
US10922146B1 (en) Synchronization of concurrent computation engines
US9760969B2 (en) Graphic processing system and method thereof
CN111813541A (en) Task scheduling method, device, medium and equipment
EP3591518B1 (en) Processor and instruction scheduling method
CN116795503A (en) Task scheduling method, task scheduling device, graphic processor and electronic equipment
CN112114967B (en) GPU resource reservation method based on service priority
CN113032154B (en) Scheduling method and device for virtual CPU, electronic equipment and storage medium
CN114911538A (en) Starting method of running system and computing equipment
CN114610485A (en) Resource processing system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination