WO2024124737A1 - 异构计算中进程切换管理方法及计算装置 - Google Patents

异构计算中进程切换管理方法及计算装置 Download PDF

Info

Publication number
WO2024124737A1
WO2024124737A1 PCT/CN2023/084003 CN2023084003W WO2024124737A1 WO 2024124737 A1 WO2024124737 A1 WO 2024124737A1 CN 2023084003 W CN2023084003 W CN 2023084003W WO 2024124737 A1 WO2024124737 A1 WO 2024124737A1
Authority
WO
WIPO (PCT)
Prior art keywords
command
fence
task
command list
list
Prior art date
Application number
PCT/CN2023/084003
Other languages
English (en)
French (fr)
Inventor
严宗宗
马亮
Original Assignee
上海登临科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海登临科技有限公司 filed Critical 上海登临科技有限公司
Publication of WO2024124737A1 publication Critical patent/WO2024124737A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/52Program synchronisation; Mutual exclusion, e.g. by means of semaphores
    • G06F9/524Deadlock detection or avoidance
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to high performance computing and parallel computing, and in particular to a method and device for managing switching between multiple processes.
  • the coprocessor can simultaneously support multiple parallel computing tasks, which can belong to multiple processes of one user or multiple processes of multiple users.
  • the coprocessor can usually only execute one or a limited number of processes at a time point, which means that multiple processes need to switch execution on the coprocessor so that the tasks of each process can be executed effectively, balanced, and timely. It is usually not possible to wait for a process to complete all tasks before switching to the next process, which will cause the subsequent process to fail to respond in time.
  • the process switching management is usually the responsibility of the main processor (also called the host side).
  • the host side submits some tasks that can be switched normally for each process on the device side.
  • the order of tasks is pre-arranged by the host side, and the device side executes these tasks in sequence.
  • the device side finds that the tasks belong to different processes, it switches the processes.
  • the execution order of tasks is pre-set, and the host side actually finds it difficult to predict the time required for the device side to complete each task, it often leads to an imbalance in the task execution time of each process, and it is difficult to switch to high-priority tasks in time.
  • the device side in order to solve the problem of unbalanced task execution time of each process, is configured to promptly notify the host side when a task of a process is executed, and the host side will then specify to switch to the next process, or continue to submit tasks of the same process.
  • This method can make the execution time of each process more balanced and can be switched more timely.
  • the highest priority task however, it will increase the processing load on the host side, and the delay introduced by the frequent communication between the host and the device will also cause the loss of processing performance on the device side.
  • the purpose of this application is to provide a process switching management method in heterogeneous computing, which improves the efficiency of process switching, prevents switching errors and deadlocks, and reduces the complexity of software and hardware programming.
  • a method for managing process switching in heterogeneous computing comprising: in response to receiving a process switching request, determining the maximum fence command number executed by each command list in the current process; executing each command list of the current process until it satisfies a switchable state, wherein the switchable state includes: the number of the fence command executed to the command list is greater than the maximum fence command number or the command list is empty; and switching from the current process to the process specified in the process switching request.
  • the fence command is used to identify the starting or ending position of commands belonging to the same computing task in the command list, and the number of the fence command increases with the computing task submitted to the process, and the same computing task uses the same fence command number in different command lists.
  • the method may further include: in response to receiving a computing task submitted to a process, assigning a corresponding fence command to the computing task, wherein the fence command includes a fence command number for identifying the computing task.
  • the fence command corresponding to the computing task may be distributed first to identify the starting position of the command of the computing task in the command list; or the command included in the computing task may be distributed to each command list first, and then the fence command corresponding to the computing task may be distributed to the command list together to identify the ending position of the command of the computing task in the command list.
  • the host side does not need to participate.
  • the device side can accurately identify the reliable switching boundaries of each command list according to the fence command added in each command list of the process, which not only avoids switching errors and deadlocks, but also improves the efficiency of process switching; at the same time, it reduces the processing load of the host side, and avoids the delay introduced by the frequent communication between the host side and the device side, which will also cause the device side processing performance loss.
  • this solution is simple and efficient, reduces the complexity of software and hardware programming, and is very conducive to the pure hardware logic implementation of the device side.
  • the method may further include: recording the fence number executed by each command list of the current process respectively; and determining the maximum fence command number executed by each command list in the current process by comparing the recorded fence numbers executed by each command list when a process switching request is received. In this way, when a process switching request is received, the maximum fence command number executed by each command list in the current process can be determined by simply comparing the current fence command numbers executed by each recorded command list, thereby saving the amount of calculation and improving the switching speed.
  • the fence command may further include a configurable lock mark to indicate that the computing task corresponding to the fence command cannot be interrupted.
  • the method may further include: adding a lock mark to the fence command corresponding to each task in the task lock range according to the received setting of the task lock range; or adding a lock mark to the fence command corresponding to the starting task of the task lock range, and adding an unlock mark to the fence command corresponding to the ending task of the task lock range.
  • each command list of the current process when executing each command list of the current process until it satisfies the switchable state, it includes for each command list: the number of the current fence command executed to the command list is greater than the maximum fence command number or the command list is empty, and the current fence command does not contain a lock mark, and the command list is marked as a switchable state.
  • a high-performance processor which includes a controller, wherein the controller is configured to execute the method described in the first aspect of the embodiment of the present application.
  • a computing device including a main processor and a coprocessor, wherein the coprocessor is a high-performance processor according to the second aspect of the embodiment of the present application.
  • FIG. 1 shows a schematic diagram of a data structure of a command set according to multiple processes on a device.
  • FIG. 2 is a schematic diagram showing three computing tasks belonging to the same process with dependencies between commands.
  • FIG3 shows a schematic flow chart of a method for managing multiple processes in heterogeneous computing according to an embodiment of the present application.
  • FIG. 4 shows a schematic diagram of a command list structure applicable to an embodiment of the present application.
  • FIG5 shows a schematic diagram of a command list structure applicable to yet another embodiment of the present application.
  • FIG6 shows a schematic diagram of a command list structure applicable to yet another embodiment of the present application.
  • coprocessors are used to handle a large number of computing tasks, while the CPU, as the main processor, is responsible for other non-computing tasks, such as process switching management. Corresponding to a process, it can submit multiple computing tasks to the corresponding process.
  • Each computing task includes one or more commands.
  • the software driver on the device side will add all the commands contained in the computing task to the command list of the process responsible for processing the computing task.
  • Each process may include one or more command lists, which is a collection of multiple commands. Generally, commands that are executed sequentially are assigned to the same command list, while commands that can be executed in parallel are often assigned to different command lists.
  • the process information related to the multiple processes executed by the device is recorded in the process list.
  • the process list contains several process information.
  • Each process information contains process header information and control information of several command lists.
  • the process header information contains the information necessary for controlling the process, so it can also be called process control information; for example, process identifier, process start time, process status, the number of command lists contained in the process, etc.
  • the control information of each command list contains the storage address of the command list and all control information related to the command list.
  • the storage of the command list can be implemented in many forms.
  • the command list can be designed as a ring buffer structure. The advantage of using a ring buffer structure is that the software does not need to allocate memory frequently.
  • Such a ring buffer structure completes the management of the command list through a write pointer and a read pointer: when a new command is added to the command list, the write pointer can be updated; when a command is taken out of the command list, the read pointer can be updated.
  • Each command list has its own storage address in the memory, and this information is stored in the control information of the command list.
  • the above-mentioned process list and command list are generated by the software driver and stored in the memory, which can be a host-side memory or a device-side memory.
  • FIG. 1 shows an example of a data structure of a command set of multiple processes on the device side.
  • the device side process list includes several process information (e.g., process information 0, process information 1, process information 2...) and several command lists (e.g., command list 0, command list 1, command list 2...) belonging to each process.
  • process information e.g., process information 0, process information 1, process information 2
  • command lists e.g., command list 0, command list 1, command list 2
  • the device side software driver receives a computing task submitted from an upper layer application, one or more commands contained in the computing task are added to one or more command lists of the process corresponding to the application.
  • the software driver may be notified to reclaim the memory space corresponding to these completed commands in the command list.
  • Multiple commands contained in a computing task may be distributed in multiple command lists.
  • the number of commands included in each computing task is indefinite, and the number of associated command lists is also indefinite, and a computing task does not have to add commands to all command lists.
  • the device usually sets a time slice for each process.
  • a hardware process switch request is generated to switch from the current process to another process.
  • priority-related process switches such as switching from a low-priority process to a high-priority process.
  • the device determines that there is a process switching request, the currently executed command should be executed first before the process switching can be executed.
  • each task is separated and represented separately, and the solid arrow represents the dependency relationship between the two commands: the command at the tail of the arrow depends on the command at the tip of the arrow, that is, the command at the tip of the arrow must be executed first. Since the coprocessor executes each command list in parallel and the execution time of each command is uncertain, assuming that the coprocessor is executing command 1 of command list 0 (due to the above-mentioned time slice rotation and priority requirements) and requests process switching, and the coprocessor has not yet executed command 2 in command list 2 at this time, then the coprocessor state will be abnormal, and the coprocessor cannot complete command 1 of command list 0, nor can it perform process switching, thereby causing switching errors and deadlock phenomena.
  • a new method for managing process switching is provided, which accurately identifies the reliable switching boundaries of each command list by adding a fence command to each command list of the process, thereby improving the efficiency of process switching and avoiding switching errors and deadlocks.
  • the fence command here is a command specially customized in the embodiment of the present application for identifying the starting or ending position of commands belonging to the same computing task in the command list. It also exists in the form of a command and can be submitted to the command list together with a normal command.
  • the fence command includes a number, which increases with the computing task submitted to the process. A corresponding fence command is set for each computing task that arrives at the process.
  • the fence command corresponding to the computing task can be distributed to the command list first, and then the command contained in the computing task can be distributed to the command list, so that the starting position of the command of the computing task can be marked in the command list.
  • the command contained in the computing task can also be distributed to the command list first, and then the fence command corresponding to the computing task can be distributed to the command list, so that the end position of the command of the computing task can be marked in the command list.
  • the fence commands used by the same computing task in different command lists have the same number.
  • the device When the device executes a fence command in a command list, it can obtain the number of the fence command to identify the starting or ending position of the corresponding computing task in the command list.
  • a fence command itself does not affect the execution of normal commands, but by submitting such a fence command to the command list, the device can accurately identify the boundary or position of each computing task in each command list.
  • FIG. 3 shows a schematic diagram of a method for managing process switching according to an embodiment of the present application.
  • the method includes: S1) in response to receiving a process switching request, determining the maximum fence command number executed by each command list in the current process; S2) executing each command list of the current process until it satisfies a switchable state, wherein the switchable state includes: the number of the fence command executed to the command list is greater than the maximum fence command number or the command list is empty. S3) switching from the current process to the process specified in the process switching request.
  • the execution subject of the method is a coprocessor, that is, a device end.
  • step S1 when the coprocessor receives a process switching request during operation, it first determines the maximum barrier command number executed by each command list in the current process being executed.
  • the coprocessor can record the number of the barrier command currently executed by each command list of the current process. Since the number of the barrier command increases with the order of the computing tasks arriving at the process, and the command list is executed sequentially, whenever a barrier command in the command list is executed, the number of the barrier command is set to the current barrier command number, which is actually the current maximum barrier command number of the command list. In this way, when a process switching request is received, the maximum barrier command number executed by each command list in the current process can be determined by simply comparing the current barrier command number executed by each recorded command list, thereby saving the amount of calculation and improving the switching speed.
  • step S2 when the coprocessor receives a process switching request, in order to prevent switching errors and deadlocks, the coprocessor will continue to execute the various command lists of the current process until each command list meets the switchable state.
  • a command list can be marked as a switchable state when it meets the following conditions: the number of the barrier command currently executed by the command list is greater than the maximum barrier command number determined in step S1), or the command list is empty.
  • step S3 when the coprocessor determines that all command lists in the current process are in a switchable state, it starts to execute process switching, switching from the current process to the process specified in the process switching request received in step S1).
  • Figure 4 shows a schematic diagram of the command list structure after adding the fence command according to the embodiment of the present application to the command list of Figure 2.
  • the command of each task uses the fence command corresponding to the task as a start identifier.
  • the commands corresponding to task 1 all start with a fence command numbered 1
  • the commands corresponding to task 2 all start with a fence command numbered 2.
  • the fence command corresponding to each task actually serves as an identifier or boundary of the starting position of the commands of each task in the command list. It should be pointed out that the above is only an example and does not impose any limitation.
  • the number of the fence command in Figure 4 is exactly equal to the number of the task, it should be understood that the fence command in this application The number of the command is incremented for each task, but it does not mean that the number of the fence command completely corresponds to the label of the task.
  • the coprocessor When the coprocessor is executing list 0 command 1 and receives a process switch request, the coprocessor first determines that the largest fence number executed in the command lists 0, 1, 2, and 3 of the current process is 1, and then the coprocessor needs to execute each command list to the fence command numbered 2 (that is, the number of the fence command currently executed by the command list is greater than the determined maximum fence command number) or the command list is empty before the process switch can be performed. Therefore, the switching error and deadlock phenomenon mentioned above in conjunction with Figure 2 can be avoided.
  • the arrangement of the fence command in the command list is not limited to the structure shown in FIG. 4 .
  • the commands included in the computing task may be first distributed to the command list, and then the fence command corresponding to the computing task may be distributed to the command list, so that the end position of the command of the computing task may be marked in the command list by the fence command.
  • the command list structure shown in FIG. 5 may also be adopted.
  • the same fence command (for example, fence 1) is first distributed to each command list of the process as a unified start mark, and then the commands included in the computing task are distributed to one or more command lists of the process, and then the fence command corresponding to the computing task is added to each command list, that is, the end position of the command of the computing task is marked in the command list by the fence command, for example, fence 2 corresponds to task 1, fence 3 corresponds to task 2, and fence 4 corresponds to task 3.
  • each command list regardless of whether it is distributed with commands included in computing tasks, will be distributed with fence commands corresponding to each computing task, for example, fences 3 and 4 in command list 0, and fence 2 in command list 3 in FIG5 .
  • Such a structure is more conducive to more quickly determining the maximum fence command number executed by each command list of the current process when a process switching request is received.
  • the host side does not need to participate.
  • the device side can accurately identify the reliable switching boundaries of each command list according to the fence command added in each command list of the process, which not only avoids switching errors and deadlocks, but also improves the efficiency of process switching; at the same time, it reduces the processing load of the host side, and avoids the delay introduced by the frequent communication between the host side and the device side, which will also cause the loss of processing performance on the device side.
  • this solution is simple and efficient, reduces the complexity of software and hardware programming, and is very conducive to the pure hardware logic implementation of the device side.
  • the method may further include the step of setting a lock mark in the fence command so that a single or multiple consecutive computing tasks will not be interrupted by the process switch, thereby controlling the process switch more flexibly.
  • a lock mark may be set in the fence command corresponding to the task to indicate that the task corresponding to the fence command is not interrupted.
  • the computing task cannot be interrupted.
  • one way is to add a lock mark to the fence command corresponding to each task in the task lock range.
  • Another way is to add a lock mark to the fence command corresponding to the starting task of the task lock range according to the setting of the task lock range, and add an unlock mark to the fence command corresponding to the ending task of the task lock range.
  • task 1 and task 2 are configured in the switch lock range. Even if the process switch request occurs in task 1 (the maximum fence command number is 1), the coprocessor will not stop at the fence command numbered 2, because the fence command numbered 2 is still in the switch lock state; when the fence command numbered 3 is executed, it will be unlocked according to the unlock mark, and the process switch can be executed.
  • step S2 when determining in step S2 whether each command list of the process satisfies the switchable state, in addition to determining whether the number of the fence command currently executed by the command list is greater than the maximum fence command number determined in step S1), or whether the command list is empty, it is also necessary to determine whether the fence command currently executed by the command list contains a lock mark. If the fence command currently executed by the command list contains a lock mark, it is necessary to continue to execute the command list until the fence command executed to the command list no longer contains a lock mark. That is, upon receiving a switch request, the device side continues to execute each command list of the current process until the following two conditions are met, and stops and marks the command list as a switchable state:
  • Condition 1 A fence command appears in the command list and the fence number of the command is greater than the current maximum fence number; or, there are no more commands in the command list.
  • the method may also include the step of assigning a corresponding fence command to the computing task in response to receiving a computing task to be submitted to the process, and distributing the fence command together with the commands contained in the computing task to the command list of the process.
  • the fence command corresponding to the computing task may be distributed first to identify the starting position of the commands of the computing task in the command list, and then the commands contained in the computing task may be distributed to the command list; or vice versa, only the fence command may be used to identify the ending position of the computing task in the command list.
  • the command list shown is a command list, and the construction process itself is not tightly coupled with the process switching steps S1)-S3).
  • the main processor provides the relevant control commands of the process to the coprocessor by directly writing to the process control register of the coprocessor.
  • the main processor can update the command list of the process corresponding to the upper-level application software according to the computing task of the upper-level application software, and write the update command to the process control register of the coprocessor, and the coprocessor executes the command list of the corresponding process according to its process controller register.
  • a process control register can be implemented by any existing register in any hardware form.
  • a high-performance processor which includes a controller and a memory, wherein the controller is configured to execute the method described above in conjunction with FIG. 3 , and the memory is used to store the command list described above in conjunction with FIG. 4-6 .
  • a computing device for heterogeneous computing including a main processor and a coprocessor.
  • the main processor is configured to, in response to receiving a computing task submitted to a process, assign a corresponding fence command to the computing task, and distribute the fence command together with the commands contained in the computing task to one or more command lists of the process, so as to identify the starting position or the ending position of the commands of the computing task in each command list through the fence command.
  • the coprocessor is configured to, in response to receiving a process switching request, determine the maximum fence command number executed by each command list in the current process; execute each command list of the current process until the number of the fence command executed to the command list is greater than the maximum fence command number or the command list is empty; and switch from the current process to the process specified in the process switching request.
  • the coprocessor can also be configured to: during process execution, record the fence number executed by each command list of the current process; when receiving a process switching request, determine the maximum fence command number executed by each command list in the current process by comparing the recorded fence numbers executed by each command list.
  • the main processor may be further configured to add a lock mark to the fence command corresponding to each task in the task lock range according to the received setting of the task lock range.
  • the main processor may be further configured to add a lock mark to the fence command corresponding to the starting task of the task lock range according to the setting of the task lock range, and add an unlock mark to the fence command corresponding to the ending task of the task lock range.
  • the coprocessor may be configured to execute each command list of the current process until the number of the current barrier command executed to the command list is greater than the maximum barrier command number or the command list is empty and the current barrier command does not contain a lock mark.
  • references in this specification to "various embodiments,” “some embodiments,” “one embodiment,” or “an embodiment,” etc. refer to a particular feature, structure, or property described in conjunction with the embodiment being included in at least one embodiment.
  • the appearances of the phrases “in various embodiments,” “in some embodiments,” “in one embodiment,” or “in an embodiment,” etc., in various places throughout the specification do not necessarily refer to the same embodiment.
  • particular features, structures, or properties may be combined in any suitable manner in one or more embodiments.
  • particular features, structures, or properties shown or described in conjunction with one embodiment may be combined in whole or in part with features, structures, or properties of one or more other embodiments without restriction, as long as the combination is not illogical or inoperable.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请提供了一种异构计算中进程切换管理方法及装置。其中响应于收到进程切换请求,确定当前进程中各个命令列表执行到的最大栅栏命令编号,执行当前进程的每个命令列表直到执行至该命令列表的栅栏命令的编号大于最大栅栏命令编号或者该命令列表为空,以及从当前进程切换至所述进程切换请求中指定的进程。该方案通过在各命令列表中添加的栅栏命令可以准确识别进程中各命令列表的可靠切换边界,提高了进程切换的效率,避免了切换错误和死锁,降低了软件编程的复杂度。

Description

异构计算中进程切换管理方法及计算装置 技术领域
本申请涉及高性能计算和并行计算,尤其涉及管理多个进程之间切换的方法及装置。
背景技术
本部分的陈述仅仅是为了提供与本申请的技术方案有关的背景信息,以帮助理解,其对于本申请的技术方案而言并不一定构成现有技术。
人工智能(AI)技术在近些年来得到了迅猛的发展,很多人工智能算法要求较强的并行计算能力以处理海量数据,而现有的串行指令执行方式的中央处理器CPU执行并行算法的效率很低。由此出现了“CPU+加速芯片”的异构计算架构,其中加速芯片作为协处理器专门处理大量计算任务,而其他非计算任务可由作为主处理器的CPU负责。这种加速芯片通常也可以称为协处理器、AI处理器、AI加速器、GPU处理器等等(在本文中统称为协处理器或设备端)。协处理器可以同时支持多路并行的计算任务,这些计算任务可以属于一个用户的多个进程,也可以属于多个用户的多个进程。但协处理器在一个时间点通常只能执行一个或有限个进程,这意味着,多个进程需要在协处理器上切换执行以使各进程的任务得到有效、平衡、及时的执行。通常不能等待某个进程完成所有任务后再切换至下一进程,这样会造成后续进程得不到及时响应。
在现有的异构计算架构中,进程的切换管理通常是由主处理器(也可称为主机端)负责。例如主机端为设备端的每个进程分别提交能正常切换的部分任务,任务的顺序由主机端预先排列好,设备端按顺序执行这些任务。当设备端发现任务属于不同进程时,进行进程切换。但在这种情况下由于任务的执行次序是预先设置的,而主机端实际上难以预知设备端执行完各个任务所需要的时间,因此常常导致各进程的任务执行时间不平衡,也难以及时切换到高优先级任务。在另一种方案中,为解决各进程的任务执行时间不平衡的问题,将设备端配置为当执行完某一进程的某个任务时会及时通知主机端,由主机端再来指定切换至下一进程,或继续提交同一进程的任务。这种方式可使各进程执行时间更平衡,且可以更及时地切换 至高优先级任务;但是会增加主机端处理的负载,而且主机端和设备端的频繁通信引入的延迟也会造成设备端处理性能损失。
需要说明的是,上述内容仅用于帮助理解本申请的技术方案,并不作为评价本申请的现有技术的依据。
发明内容
本申请的目的是提供一种异构计算中进程切换管理方法,提高了进程切换的效率,防止了切换错误和死锁,并降低了软硬件编程的复杂度。
上述目的是通过以下技术方案实现的:
根据本申请实施例的第一方面,提供了一种异构计算中进程切换管理方法,包括:响应于收到进程切换请求,确定当前进程中各个命令列表执行到的最大栅栏命令编号;执行当前进程的每个命令列表直到其满足可切换状态,其中可切换状态包括:执行至该命令列表的栅栏命令的编号大于所述最大栅栏命令编号或者该命令列表为空;以及从当前进程切换至所述进程切换请求中指定的进程。其中栅栏命令用于在命令列表中标识属于同一计算任务的命令的起始或结束位置,所述栅栏命令的编号随着提交至进程的计算任务而递增,同一计算任务在不同命令列表中使用相同的栅栏命令编号。
在一些实施例中,该方法还可包括:响应于收到向进程提交的计算任务,为该计算任务分配相应的栅栏命令,所述栅栏命令中包含栅栏命令的编号,用于识别该计算任务。对于所述进程的每个命令列表,在将该计算任务中包含的命令分发至该命令列表时,可以先分发与该计算任务对应的栅栏命令,以识别该计算任务的命令在该命令列表中的起始位置;或者也可以先将该计算任务中包含的命令分发至每个命令列表,并接着将与该计算任务对应的栅栏命令也一起分发至该命令列表,以识别该计算任务的命令在该命令列表中的结束位置。
可以看出,在上述实施例的进程切换中,不需要主机端的参与,设备端可以根据在进程各命令列表中添加的栅栏命令准确地识别各命令列表的可靠切换边界,不仅避免了切换错误和死锁,提高了进程切换的效率;同时减少了主机端处理的负载,避免了由于主机端和设备端的频繁通信引入的延迟也会造成设备端处理性能损失。而且该方案简单高效,降低了软硬件编程的复杂度,非常有利于设备端纯硬件逻辑实现。
在一些实施例中,该方法还可包括:分别记录当前进程的每个命令列表执行到的栅栏编号;在收到进程切换请求时,通过比较所记录的每个命令列表执行到的栅栏编号来确定当前进程中各个命令列表执行到的最大栅栏命令编号。这样,在收到进程切换请求时,通过简单地比较所记录的每个命令列表执行到的当前的栅栏命令编号,就可以确定出当前进程中各个命令列表执行到的最大栅栏命令编号,从而可以节约计算量,提高切换速度。
在一些实施例中,所述栅栏命令还可包括可配置的锁定标记,用于指示与该栅栏命令对应的计算任务不可中断。该方法还可包括:根据收到的对任务锁定范围的设置,将与处于所述任务锁定范围中每个任务对应的栅栏命令中添加锁定标记;或者将与任务锁定范围的起始任务对应的栅栏命令中添加锁定标记,并在与任务锁定范围的结束任务对应的栅栏命令中添加解锁标记。通过这样的方式可以使得单个或连续多个计算任务不会被进程切换所中断,从而更灵活地控制进程切换。
在上述的包含锁定标记的实施例中,当执行当前进程的每个命令列表直到其满足可切换状态包括对于每个命令列表:执行至该命令列表的当前栅栏命令的编号大于所述最大栅栏命令编号或者该命令列表为空,并且该当前栅栏命令中不包含锁定标记,以及将该命令列表标记为可切换状态。
根据本申请实施例的第二方面,提供了一种高性能处理器,其包括控制器,其中控制器被配置为执行根据本申请实施例的第一方面所述的方法。
根据本申请实施例的第三方面,提供了一种计算设备,包括主处理器和协处理器,其中所述协处理器为根据本申请实施例的第二方面所述的高性能处理器
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。 在附图中:
图1示出了根据设备端多个进程的命令集合的数据结构示意图。
图2示出了存在命令间依赖关系的属于同一进程的三个计算任务的示意图。
图3示出了根据本申请一个实施例的异构计算中多进程管理方法的流程示意图。
图4示出了适用根据本申请一个实施例的命令列表结构示意图。
图5示出了适用根据本申请又一个实施例的命令列表结构示意图。
图6示出了适用根据本申请又一个实施例的命令列表结构示意图。
具体实施方式
为了使本申请的目的,技术方案及优点更加清楚明白,以下结合附图通过具体实施例对本申请进一步详细说明。应当理解,所描述的实施例是本申请的一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动下获得的所有其他实施例,都属于本申请保护的范围。
此外,所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施例中。在下面的描述中,提供许多具体细节从而给出对本申请的实施例的充分理解。然而,本领域技术人员将意识到,可以实践本申请的技术方案而没有特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知方法、装置、实现或者操作以避免模糊本申请的各方面。
附图中所示的方框图仅仅是功能实体,不一定必须与物理上独立的实体相对应。即,可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
附图中所示的流程图仅是示例性说明,不是必须包括所有的内容和操作/步骤,也不是必须按所描述的顺序执行。例如,有的操作/步骤还可以分解,而有的操作/步骤可以合并或部分合并,因此实际执行的顺序有可能根据实际情况改变。
在异构计算中通过协处理器专门处理大量计算任务,而其他非计算任务可由作为主处理器的CPU负责,例如进程切换管理。通常,每个应用 对应一个进程,其可以向与其对应的进程提交多个计算任务。每个计算任务包括一个或多个命令,当一个计算任务从主机端提交至设备端执行时,设备端的软件驱动会将该计算任务包含的所有命令添加至负责处理该计算任务的进程的命令列表中。每个进程可包括一个或多个命令列表,命令列表是多个命令的集合。一般的,顺序执行的命令分配至同一命令列表,而可以并行执行的命令往往分配至不同的命令列表。
与设备端执行的多个进程相关的进程信息被记录在进程列表中。在该进程列表中包含了若干个进程信息。每个进程信息包含进程头信息和若干个命令列表的控制信息。其中进程头信息包含了控制进程所必要的信息,因此也可以称为进程控制信息;例如进程标识、进程开始时间、进程状态、进程包含的命令列表的数量等等。每个命令列表的控制信息包含了命令列表的存储地址及与该命令列表相关的所有控制信息。其中,命令列表的存储可以通过多种形式来实现。例如,命令列表可以被设计为环形缓冲结构,采用环形缓冲结构的好处是软件不用频繁分配内存。这样的环形缓冲结构通过写指针和读指针来完成命令列表的管理:当向命令列表中添加新命令时,可以更新写指针;当从命令列表取出命令后,可以更新读指针。每个命令列表在存储器中都有自己的存储地址,该信息被保存在命令列表的控制信息中。上述的进程列表以及命令列表由软件驱动生成并保存在存储器中,该存储器可以是主机端的存储器,也可以是设备端的存储器。
图1示出了设备端多个进程的命令集合的数据结构示例。如图1所示,在设备端进程列表中包含若干个进程信息(例如进程信息0、进程信息1、进程信息2…)和从属于每进程的若干命令列表(例如命令列表0、命令列表1、命令列表2…)。当设备端软件驱动收到来自上层应用提交的计算任务时,将计算任务中包含的一个或多个命令添加至与该应用对应的进程的一个或多个命令列表中。当设备端执行完一部分命令时,可通知软件驱动回收命令列表中这些已完成命令对应的存储器空间。一个计算任务中包含的多个命令可以分布于多个命令列表中。每个计算任务中包括的命令数量是不定的,且关联的命令列表数量也是不定的,并且一个计算任务也不必在所有命令列表上均添加命令。
在设备端通常会为各个进程设定时间片,当设备端执行某进程达到该时间片时,产生硬件进程切换请求以从当前进程切换至其他进程。另外还会有优先级相关的进程切换,例如从低优先级的进程切换至高优先级的进 程。当设备端确定有进程切换请求时,应当先执行完当前正在执行的命令,然后才能执行进程切换。然而,同一进程的多个任务命令之间可能存在依赖关系,即某个命令依赖于另一个命令的输出。如图2所示,任务1、任务2和任务3属于同一个进程中的3个任务,这三个任务的命令分布在该进程的命令列表0、1、2、3中。各个任务分别隔开表示,实线箭头表示两个命令的依赖关系:箭尾的命令依赖于箭头尖处的命令,即箭头尖处的命令需先执行。由于协处理器并行执行各个命令列表,且每个命令的执行时间不定,假设协处理器正在执行命令列表0的命令1时(因上述时间片轮转及优先级要求)请求进程切换,而此时协处理器在命令列表2中还未执行到命令2,那么将造成协处理器状态异常,该协处理器无法完成命令列表0的命令1,也不能进行进程切换,由此造成了切换错误和死锁现象的出现。
在本申请的实施例中,提供了一种新的管理进程切换的方法,其通过在进程各命令列表中添加的栅栏命令准确地识别各命令列表的可靠切换边界,提高了进程切换的效率,避免了切换错误和死锁。这里的栅栏命令是在本申请的实施例中专门定制的用于在命令列表中标识属于同一计算任务的命令的起始或结束位置的命令,其也是采用命令的形式存在,可以与正常命令一起提交至命令列表中。该栅栏命令包括编号,该编号随着提交至进程的计算任务而递增。对于到达进程的每个计算任务都会设置一个相应的栅栏命令。在将计算任务包含的一个或多个命令分发至进程的一个或多个命令列表中时,可以先将与该计算任务对应的栅栏命令分发至命令列表中,再将该计算任务包含的命令分发至命令列表,这样就可以在命令列表中标记该计算任务的命令的起始位置。同理,也可以先将计算任务包含的命令分发至命令列表中,然后再将与该计算任务对应的栅栏命令分发至命令列表中,从而可以在命令列表中标记该计算任务的命令的结束位置。同一计算任务在不同的命令列表上使用的栅栏命令的编号相同。当设备端执行到命令列表中的栅栏命令时,可以获取该栅栏命令的编号,从而识别出相应计算任务在该命令列表中的起始或结束位置。这样的栅栏命令本身并不会影响正常命令的执行,但通过这样提交至命令列表中的栅栏命令,设备端可以准确地识别每个计算任务在各个命令列表中的边界或位置。
图3给出了根据本申请一个实施例的管理进程切换的方法的流程示意 图。如图3所示,该方法包括:S1)响应于收到进程切换请求,确定当前进程中各个命令列表执行到的最大栅栏命令编号;S2)执行当前进程的每个命令列表直到其满足可切换状态,其中可切换状态包括:执行至该命令列表的栅栏命令的编号大于所述最大栅栏命令编号或者该命令列表为空。S3)从当前进程切换至所述进程切换请求中指定的进程。该方法的执行主体是协处理器,即设备端。
更具体地,在步骤S1),当协处理器运行过程中收到进程切换请求时,首先确定正在执行的当前进程中各个命令列表执行到的最大栅栏命令编号。在一个实施例中,在执行过程中,协处理器可以分别记录当前进程的每个命令列表当前执行到的栅栏命令的编号。由于该栅栏命令的编号是随着到达进程的计算任务的先后顺序而递增的,同时命令列表是顺序执行的,因此,每当执行到命令列表中一个栅栏命令时,就将该栅栏命令的编号设置为当前的栅栏命令编号,实际上就是该命令列表的当前最大的栅栏命令编号。这样,在收到进程切换请求时,通过简单地比较所记录的每个命令列表执行到的当前的栅栏命令编号,就可以确定出当前进程中各个命令列表执行到的最大栅栏命令编号,从而可以节约计算量,提高切换速度。
在步骤S2),当协处理器收到进程切换请求时,为了防止切换错误和死锁,协处理器还会继续执行当前进程的各个命令列表直到每个命令列表都满足可切换状态为止。在一个示例中,当命令列表满足下列条件时,可以将其标记为可切换状态:该命令列表当前执行到的栅栏命令的编号大于在步骤S1)确定的最大栅栏命令编号,或者该命令列表为空。
在步骤S3),当协处理器确定当前进程中所有命令列表都是可切换状态时,开始执行进程切换,从当前进程切换至在步骤S1)收到的进程切换请求中指定的进程。
下面以图4为例说明上述进程切换的效果。图4示出了将图2的命令列表中加入根据本申请实施例的栅栏命令之后的命令列表结构示意图。在每个命令列表中每个任务的命令都以与该任务对应的栅栏命令作为开始标识,例如在命令列表0、1、2、3中,任务1对应的命令都以编号为1的栅栏命令开头,而任务2对应的命令都已编号为2的栅栏命令开头,每个任务对应的栅栏命令实际上充当了命令列表中各任务的命令的起始位置的标识或边界。应指出,上述仅是举例说明,而非进行任何限制,虽然图4中栅栏命令的编号恰好等于任务的编号,但应理解在本申请中栅栏命 令的编号为每任务递增,但不一意味着栅栏命令的编号与任务的标号完全对应。该协处理器正在执行列表0命令1时收到进程切换请求,协处理器首先确定出当前进程的命令列表0、1、2、3中执行到的最大的栅栏编号为1,接着协处理器需要将各命令列表执行至编号为2的栅栏命令处(即命令列表当前执行到的栅栏命令的编号大于所确定的最大栅栏命令编号)或命令表列表为空才可进行进程切换。因此,可以避免上文结合图2中条的切换错误和死锁现象的出现。
应理解,栅栏命令在命令列表中的设置方式并不局限于图4所示的结构。如上文提到的,在一些实施例中,也可以先将计算任务包含的命令分发至命令列表中,然后再将与该计算任务对应的栅栏命令分发至命令列表中,从而可以通过该栅栏命令在命令列表中标记该计算任务的命令的结束位置。又例如,也可以采用如图5所示的命令列表结构。先将预至的同一栅栏命令(例如栅栏1)分发至进程的每个命令列表中作为统一的起始标志,接着将计算任务中包含的命令分发至进程的一个或多个命令列表中,然后再将与计算任务对应的栅栏命令每个命令列表中,即通过栅栏命令在命令列表中标记该计算任务的命令的结束位置,例如,与任务1对应的为栅栏2,与任务2对应的为栅栏3,与任务3对应的为栅栏4。在图5所示的命令列表结构中,每个命令列表无论其是否被分发有计算任务中包含的命令,都会被分发有与每个计算任务相应的栅栏命令,例如可参考图5中命令列表0中的栅栏3和栅栏4,命令列表3中的栅栏2。这样的结构更有益于在收到进程切换请求时更快速地确定出当前进程的各个命令列表执行到的最大的栅栏命令编号。
可以看出,在上述实施例的进程切换中,不需要主机端的参与,设备端可以根据在进程各命令列表中添加的栅栏命令准确地识别各命令列表的可靠切换边界,不仅避免了切换错误和死锁,提高了进程切换的效率;同时减少了主机端处理的负载,避免了由于主机端和设备端的频繁通信引入的延迟也会造成设备端处理性能损失。而且该方案简单高效,降低了软硬件编程的复杂度,非常有利于设备端纯硬件逻辑实现。
在又一些实施例中,该方法还可以包括通过在栅栏命令中设置锁定标记的步骤来使得单个或连续多个计算任务不会被进程切换所中断,从而更灵活地控制进程切换。例如,对于某些不想要被切换中断的任务,可以在于该任务对应的栅栏命令中设置锁定标记,用于指示与该栅栏命令对应的 计算任务不可中断。例如,在收到对任务锁定范围的设置时,一种方式是可以将与处于该任务锁定范围中每个任务对应的栅栏命令中添加锁定标记。另一种方式是可以根据对任务锁定范围的设置,将与任务锁定范围的起始任务对应的栅栏命令中添加锁定标记,并在与任务锁定范围的结束任务对应的栅栏命令中添加解锁标记。如图6所示,通过设置与任务对应的栅栏命令中的锁定标记和解锁标记,将任务1与任务2配置于切换锁定范围中,即使进程切换请求发生于任务1中(最大栅栏命令编号为1),协处理器也不会在执行编号2的栅栏命令处停止,因为编号2的栅栏命令处仍为切换锁定状态;当执行到编号为3的栅栏命令处才会根据解锁标记解除锁定,才能执行进程切换。
在这样的实施例中,在步骤S2确定进程的每个命令列表是否满足可切换状态时,除了判断该命令列表当前执行到的栅栏命令的编号是否大于在步骤S1)确定的最大栅栏命令编号,或者该命令列表是否为空之外,还需要判断该命令列表当前执行到的栅栏命令是否包含锁定标记。如果该命令列表当前执行到的栅栏命令包含锁定标记,则需要继续执行该命令列表直到执行至命令列表的栅栏命令不再包含锁定标记。也就是说,在收到切换请求时,设备端继续执行当前进程的各命令列表,直到以下两条件均满足时停止且标记命令列表为可切换状态:
条件1:命令列表上出现栅栏命令且该命令的栅栏编号大于当前最大栅栏编号;或者,命令列表中没有更多的命令。
条件2:栅栏命令没有包含锁定标记。
如果条件1中出现栅栏命令且该命令栅栏编号于当前最大栅栏编号的情况,设备端需要回退该栅栏命令,不能识别为已执行,下一次切换至本进程时才会开始实际执行这个栅栏命令。
上述结合图4、5、6举例说明的命令列表可以存储在主机端的存储器,但优选地保存在设备端的存储器。在一些实施例中,该方法还可包括响应于收到要向进程提交的计算任务,为该计算任务分配相应的栅栏命令,并将该栅栏命令与该计算任务中包含的命令一起分发至该进程的命令列表中的步骤。例如,可以先分发与该计算任务对应的栅栏命令,以识别该计算任务的命令在命令列表中的起始位置,接着再将该计算任务中包含的命令分发至命令列表中;反之也可以,只是利用栅栏命令识别计算任务在命令列表中的结束位置。应理解,上述步骤实质上是在构建如图4、5、6所 示的命令列表,而该构建过程本身与进程切换步骤S1)-S3)并非紧耦合的。对于上述的步骤S1)-S3)而言,只要能从存储器获取进程中的命令列表即可。因此,上述构建命令列表的过程可以由主处理器执行,也可以由协处理器执行,本文对此并不进行任何限制。在一个示例中,主处理器通过直接对协处理器的进程控制寄存器进行写入的方式来将对进程的相关控制命令提供至协处理器。例如,主处理器可以根据上层应用软件的计算任务更新该上层应用软件对应进程的命令列表,并向协处理器的进程控制寄存器写入更新命令,协处理器根据其进程控制器寄存器来执行相应进程的命令列表。通常,这样的进程控制寄存器可以由现有的任意硬件形式的寄存器来实现。
在本申请的又一个实施例中,还提供了一种高性能处理器,其包括控制器和存储器,其中控制器被配置为执行上文结合图3介绍的方法,而存储器用于保存上文结合图4-6所介绍的命令列表。
在本申请的又一个实施例中,还提供了一种用于异构计算的计算装置,包括主处理器和协处理器。其中主处理器被配置为响应于收到向进程提交的计算任务,为该计算任务分配相应的栅栏命令,并将该栅栏命令与该计算任务中包含的命令一起分发至该进程的一个或多个命令列表中,以通过该栅栏命令识别该计算任务的命令在各个命令列表中的起始位置或结束位置。其中,协处理器被配置为响应于收到进程切换请求,确定当前进程中各个命令列表执行到的最大栅栏命令编号;执行当前进程的每个命令列表直到执行至该命令列表的栅栏命令的编号大于所述最大栅栏命令编号或者该命令列表为空;以及从当前进程切换至所述进程切换请求中指定的进程。
在一些实施例中,协处理器还可以被配置为:在进程执行过程中,分别记录当前进程的每个命令列表执行到的栅栏编号;在收到进程切换请求时,通过比较所记录的每个命令列表执行到的栅栏编号来确定当前进程中各个命令列表执行到的最大栅栏命令编号。
在一些实施例中,主处理器还可以被配置为根据收到的对任务锁定范围的设置,将与处于所述任务锁定范围中每个任务对应的栅栏命令中添加锁定标记。在又一些实施例中,主处理器还可以被配置为根据对任务锁定范围的设置,将与任务锁定范围的起始任务对应的栅栏命令中添加锁定标记,并在与任务锁定范围的结束任务对应的栅栏命令中添加解锁标记。在 这样的实施例中,协处理器可被配置为执行当前进程的每个命令列表直到执行至该命令列表的当前栅栏命令的编号大于所述最大栅栏命令编号或者该命令列表为空,并且该当前栅栏命令中不包含锁定标记。
本说明书中针对“各个实施例”、“一些实施例”、“一个实施例”、或“实施例”等的参考指代的是结合所述实施例所描述的特定特征、结构、或性质包括在至少一个实施例中。因此,短语“在各个实施例中”、“在一些实施例中”、“在一个实施例中”、或“在实施例中”等在整个说明书中各地方的出现并非必须指代相同的实施例。此外,特定特征、结构、或性质可以在一个或多个实施例中以任何合适方式组合。因此,结合一个实施例中所示出或描述的特定特征、结构或性质可以整体地或部分地与一个或多个其他实施例的特征、结构、或性质无限制地组合,只要该组合不是非逻辑性的或不能工作。
本说明书中“包括”和“具有”以及类似含义的术语表达,意图在于覆盖不排他的包含,例如包含了一系列步骤或单元的过程、方法、系统、产品或设备并不限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。“一”或“一个”也不排除多个的情况。另外,本申请附图中的各个元素仅仅为了示意说明,并非按比例绘制。
虽然本申请已经通过上述实施例进行了描述,然而本申请并非局限于这里所描述的实施例,在不脱离本申请范围的情况下还包括所做出的各种改变以及变化。

Claims (10)

  1. 一种异构计算中进程切换管理方法,包括:
    响应于收到进程切换请求,确定当前进程中各个命令列表执行到的最大栅栏命令编号;
    执行当前进程的每个命令列表直到其满足可切换状态,其中可切换状态包括:执行至该命令列表的栅栏命令的编号大于所述最大栅栏命令编号或者该命令列表为空;
    从当前进程切换至所述进程切换请求中指定的进程;
    其中所述栅栏命令用于在命令列表中标识属于同一计算任务的命令的起始或结束位置,所述栅栏命令的编号随着提交至进程的计算任务而递增,同一计算任务在不同命令列表中使用相同的栅栏命令编号。
  2. 根据权利要求1所述的方法,其中所述栅栏命令还包括可配置的锁定标记,用于指示与该栅栏命令对应的计算任务不可中断;所述可切换状态还包括:执行至命令列表的栅栏命令不再包含锁定标记。
  3. 根据权利要求1或2所述的方法,还包括:
    响应于收到向进程提交的计算任务,为该计算任务分配相应的栅栏命令,所述栅栏命令中包含栅栏命令的编号,用于识别该计算任务;
    对于所述进程的每个命令列表,在将该计算任务中包含的命令分发至该命令列表时,先分发与该计算任务对应的栅栏命令,以识别该计算任务的命令在该命令列表中的起始位置。
  4. 根据权利要求1或2所述的方法,还包括:
    响应于收到向进程提交的计算任务,为该计算任务分配相应的栅栏命令,所述栅栏命令中包含栅栏命令的编号,用于识别该计算任务;
    在将该计算任务中包含的命令分发至每个命令列表时,同时将与该计算任务对应的栅栏命令也一起分发至该命令列表,以识别该计算任务的命令在该命令列表中的结束位置。
  5. 根据权利要求1或2所述的方法,还包括:
    分别记录当前进程的每个命令列表执行到的栅栏编号;
    在收到进程切换请求时,通过比较所记录的每个命令列表执行到的栅栏编号来确定当前进程中各个命令列表执行到的最大栅栏命令编号。
  6. 根据权利要求2所述的方法,还包括:
    根据收到的对任务锁定范围的设置,将与处于所述任务锁定范围中每个任务对应的栅栏命令中添加锁定标记。
  7. 根据权利要求2所述的方法,还包括:
    根据对任务锁定范围的设置,将与任务锁定范围的起始任务对应的栅栏命令中添加锁定标记,并在与任务锁定范围的结束任务对应的栅栏命令中添加解锁标记。
  8. 根据权利要求6所述的方法,其中所述执行当前进程的每个命令列表直到其满足可切换状态包括对于每个命令列表:
    执行至该命令列表的当前栅栏命令的编号大于所述最大栅栏命令编号或者该命令列表为空,并且该当前栅栏命令中不包含锁定标记;
    将该命令列表标记为可切换状态。
  9. 一种高性能处理器,其包括控制器,其中控制器被配置为执行根据权利要求1-8中任一项所述的方法。
  10. 一种计算装置,包括主处理器和协处理器,其中主处理器被配置为执行权利要求3-4、6-7中任一项所述的方法,所述协处理器被配置为执行权利要求1-2、5或8中任一项所述的方法。
PCT/CN2023/084003 2022-12-14 2023-03-27 异构计算中进程切换管理方法及计算装置 WO2024124737A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211599077.6A CN115617533B (zh) 2022-12-14 2022-12-14 异构计算中进程切换管理方法及计算装置
CN202211599077.6 2022-12-14

Publications (1)

Publication Number Publication Date
WO2024124737A1 true WO2024124737A1 (zh) 2024-06-20

Family

ID=84880659

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/084003 WO2024124737A1 (zh) 2022-12-14 2023-03-27 异构计算中进程切换管理方法及计算装置

Country Status (2)

Country Link
CN (1) CN115617533B (zh)
WO (1) WO2024124737A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115617533B (zh) * 2022-12-14 2023-03-10 上海登临科技有限公司 异构计算中进程切换管理方法及计算装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040187122A1 (en) * 2003-02-18 2004-09-23 Microsoft Corporation Systems and methods for enhancing performance of a coprocessor
US20110078434A1 (en) * 2009-07-29 2011-03-31 Echostar Technologies L.L.C. Process management providing operating mode switching within an electronic device
CN110032453A (zh) * 2019-04-19 2019-07-19 上海兆芯集成电路有限公司 用以任务调度与分配的处理系统及其加速方法
CN114610472A (zh) * 2022-05-09 2022-06-10 上海登临科技有限公司 异构计算中多进程管理方法及计算设备
CN115617533A (zh) * 2022-12-14 2023-01-17 上海登临科技有限公司 异构计算中进程切换管理方法及计算装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112292664B (zh) * 2018-06-06 2024-05-24 普莱赛恩技术有限公司 用于设计分布式异构计算和控制系统的方法和系统
US10761893B1 (en) * 2018-11-23 2020-09-01 Amazon Technologies, Inc. Automatically scaling compute resources for heterogeneous workloads
CN110287016A (zh) * 2019-07-01 2019-09-27 武汉兆格信息技术有限公司 一种分布式流程图异构计算调度方法
CN110554925B (zh) * 2019-09-05 2022-02-08 中国人民解放军国防科技大学 面向死锁检查的非阻塞mpi程序符号执行方法、系统及介质
CN111443999A (zh) * 2020-02-17 2020-07-24 深圳壹账通智能科技有限公司 数据并行处理方法、执行器、计算机设备和存储介质
CN114548389A (zh) * 2022-01-27 2022-05-27 上海登临科技有限公司 异构计算中计算单元的管理方法及相应处理器
CN115454576B (zh) * 2022-09-29 2024-02-02 安超云软件有限公司 一种虚拟机进程管理方法、系统及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040187122A1 (en) * 2003-02-18 2004-09-23 Microsoft Corporation Systems and methods for enhancing performance of a coprocessor
US20110078434A1 (en) * 2009-07-29 2011-03-31 Echostar Technologies L.L.C. Process management providing operating mode switching within an electronic device
CN110032453A (zh) * 2019-04-19 2019-07-19 上海兆芯集成电路有限公司 用以任务调度与分配的处理系统及其加速方法
CN114610472A (zh) * 2022-05-09 2022-06-10 上海登临科技有限公司 异构计算中多进程管理方法及计算设备
CN115617533A (zh) * 2022-12-14 2023-01-17 上海登临科技有限公司 异构计算中进程切换管理方法及计算装置

Also Published As

Publication number Publication date
CN115617533B (zh) 2023-03-10
CN115617533A (zh) 2023-01-17

Similar Documents

Publication Publication Date Title
JP7313381B2 (ja) ハードウェアアクセラレーションのためのハードウェアリソースの埋込みスケジューリング
KR100934533B1 (ko) 연산 처리 시스템, 컴퓨터 시스템 상에서의 태스크 제어 방법, 및 컴퓨터 프로그램을 기록한 컴퓨터 판독 가능한 기록 매체
US9798595B2 (en) Transparent user mode scheduling on traditional threading systems
US7739432B1 (en) Command switching for multiple initiator access to a SATA drive
US7822908B2 (en) Discovery of a bridge device in a SAS communication system
US20070074214A1 (en) Event processing method in a computer system
JP4464378B2 (ja) 同一データを纏める事で格納領域を節約する計算機システム、ストレージシステム及びそれらの制御方法
US20180059988A1 (en) STREAM IDENTIFIER BASED STORAGE SYSTEM FOR MANAGING AN ARRAY OF SSDs
US6754736B1 (en) Information processing apparatus, data inputting/outputting method, and program storage medium therefor
US7502876B1 (en) Background memory manager that determines if data structures fits in memory with memory state transactions map
KR20180010953A (ko) 비휘발성 메모리 익스프레스(NVMe) 장치의 커맨드의 실행 조정을 위한 시스템 및 방법
US20110265093A1 (en) Computer System and Program Product
US11275600B2 (en) Virtualized I/O
WO2024124737A1 (zh) 异构计算中进程切换管理方法及计算装置
US20080127198A1 (en) Fine granularity exchange level load balancing in a multiprocessor storage area network
US7543121B2 (en) Computer system allowing any computer to copy any storage area within a storage system
JP2005338985A (ja) 記憶領域管理方法及びシステム
CN101617297A (zh) 多处理器存储区域网络中的虚拟化支持
US7844784B2 (en) Lock manager rotation in a multiprocessor storage area network
AU2068200A (en) System application management method and system, and storage medium which stores program for executing system application management
JP4798395B2 (ja) リソース自動構築システム及び自動構築方法並びにそのための管理用端末
JP5704176B2 (ja) プロセッサ処理方法、およびプロセッサシステム
JPH04288638A (ja) コンピュータシステム
JP3206580B2 (ja) Dmaデータ転送制御方法及び入出力処理装置
EP3945407B1 (en) Systems and methods for processing copy commands

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23901955

Country of ref document: EP

Kind code of ref document: A1