CN117593172A - Process management method, device, medium and equipment - Google Patents

Process management method, device, medium and equipment Download PDF

Info

Publication number
CN117593172A
CN117593172A CN202410064434.1A CN202410064434A CN117593172A CN 117593172 A CN117593172 A CN 117593172A CN 202410064434 A CN202410064434 A CN 202410064434A CN 117593172 A CN117593172 A CN 117593172A
Authority
CN
China
Prior art keywords
target process
process group
group
target
triggering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410064434.1A
Other languages
Chinese (zh)
Other versions
CN117593172B (en
Inventor
黄增士
王鲲
陈飞
邹懋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vita Technology Beijing Co ltd
Original Assignee
Vita Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vita Technology Beijing Co ltd filed Critical Vita Technology Beijing Co ltd
Priority to CN202410064434.1A priority Critical patent/CN117593172B/en
Publication of CN117593172A publication Critical patent/CN117593172A/en
Application granted granted Critical
Publication of CN117593172B publication Critical patent/CN117593172B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a process management method, a device, a medium and equipment, and relates to the technical field of data processing, wherein the method comprises the following steps: acquiring state information of a target process group, wherein the target process group is a process group to which a process in an operating state belongs; when the state information indicates that none of the processes in the target process group access the graphic processor within the preset time length, triggering the storage operation of the equipment context of the target process group; and when the save operation is completed, releasing GPU resources occupied by the target process group in the graphics processor. Therefore, aiming at the scene that the process does not use the GPU resource for a long time, the GPU resource occupied by the target process group can be released on the premise that the process in the running state does not exit, and the GPU resource can be used by other users, so that the resource utilization rate is improved.

Description

Process management method, device, medium and equipment
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a process management method, device, medium, and apparatus.
Background
With the widespread use of AI (Artificial Intelligence ), heterogeneous programs using GPUs (Graphics Processing Unit, graphics processors) have also gained widespread use. However, under the condition that the program does not use the GPU resource in the development environment, the GPU resource cannot be scheduled because the program is not exited, and the resource utilization rate is reduced.
Disclosure of Invention
The disclosure aims to provide a process management method, a device, a medium and equipment so as to improve the resource utilization rate.
In a first aspect, the present disclosure provides a process management method, the method comprising:
acquiring state information of a target process group, wherein the target process group is a process group to which a process in an operating state belongs;
triggering a saving operation of the device context of the target process group when the state information indicates that none of the processes in the target process group access the graphics processor within a preset time period;
and when the save operation is completed, releasing GPU resources occupied by the target process group in the graphics processor.
Optionally, the device context includes an operation record of the target process group on the graphics processor and video memory data of the target process group, and triggering a save operation of the device context of the target process group, including:
acquiring the operation record and the video memory data;
and storing the operation record and the display data to a preset storage position.
Optionally, after triggering the save operation of the device context of the target process group, the method further comprises:
determining that a process in the target process group revisits the graphics processor;
and triggering the preset operation of the target process group according to the progress of the save operation.
Optionally, triggering a preset operation on the target process group according to the progress of the save operation, including:
triggering a recovery operation of the device context of the target process group when the progress characterizes the completion of the save operation;
and triggering an interrupt operation for the save operation when the progress characterizes that the save operation is not completed.
Optionally, the device context includes an operation record of the target process group on the graphics processor and video memory data of the target process group, and the method includes triggering a recovery operation of the device context of the target process group, including:
and allocating resources for the target process group in a free resource pool so that the target process group re-executes the operation in the operation record, and copying the display data to the graphic processor to restore the equipment context of the target process group.
Optionally, the method further comprises:
when the state information characterizes that a first target process in the target process group exits, marking the first target process;
and executing the save operation or the restore operation according to the rest processes which are not marked in the target process group.
Optionally, the method further comprises:
triggering a clearing operation of metadata information stored locally by a second target process when the state information characterizes the second target process in the target process group to exit;
and when the state information characterizes that all processes in the target process group exit, releasing the resources occupied by the target process group.
In a second aspect, the present disclosure provides a process management apparatus, including:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring state information of a target process group, and the target process group is a process group to which a process in an operation state belongs;
the storage module is used for triggering the storage operation of the equipment context of the target process group when the state information indicates that none of the processes in the target process group access the graphic processor within a preset time period;
and the resource release module is used for releasing GPU resources occupied by the target process group in the graphic processor when the storage operation is completed.
In a third aspect, the present disclosure provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the process management method of the first aspect.
In a fourth aspect, the present disclosure provides an electronic device comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to implement the process management method according to the first aspect.
According to the technical scheme, when the state information of the target process group indicates that processes in the target process group do not access the graphic processor within the preset time, the storage operation of the equipment context of the target process group is triggered; and when the save operation is completed, releasing GPU resources occupied by the target process group in the graphics processor. Therefore, aiming at the scene that the process does not use the GPU resource for a long time, the GPU resource occupied by the target process group can be released on the premise that the process in the running state does not exit, and the GPU resource can be used by other users, so that the resource utilization rate is improved.
Additional features and advantages of the present disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification, illustrate the disclosure and together with the description serve to explain, but do not limit the disclosure. In the drawings:
fig. 1 is a flow chart of a process management method provided in accordance with an exemplary embodiment.
FIG. 2 is a block diagram of a system for managing a group of processes provided in accordance with an exemplary embodiment.
FIG. 3 is a flow chart illustrating another process management method according to an example illustration.
Fig. 4 is a block diagram of a process management apparatus provided in accordance with an example embodiment.
Fig. 5 is a block diagram of an electronic device, according to an example embodiment.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
GPU resources are mainly divided into computing resources and video memory resources. In the existing scheme, idle GPU resources can be released by directly exiting the process, and the working principle is that a timer is used for counting the operation of the process on the graphics processor, and if the process is not operated for a long time, the process is directly exited. Although GPU resources can be released in the mode, the process can be stopped, original program data can be lost, and the use scene of the scheme is limited. In addition, when the resources of the process are reused, all data must be reloaded, and the original calculation must be performed again, so as to restore to the state before exiting, including resource application, calculation, and the like, and especially the calculation task occupies larger computing power resources.
The present disclosure proposes a process management method for the above usage scenario. Under the process management method, the process which does not execute the GPU operation for a long time does not exit, but the related data of the GPU is saved, and GPU resources are released.
Referring to fig. 1, fig. 1 is a flowchart of a process management method according to an exemplary embodiment, which includes the following steps, as shown in fig. 1.
S101, acquiring state information of a target process group, wherein the target process group is a process group to which a process in an operation state belongs.
By way of example, a Process is a program in a computer that is a basic execution entity of the program with respect to one running activity on a set of data, a group of processes is a collection of one or more processes, typically associated with a set of tasks, that can accept various signals from the same terminal device. The target process group is a process group to which a process running on the current electronic device belongs. The target process group may be a plurality of processes, for example, a set of parent-child processes, or a user-defined set of logical processes, and the user-defined set of logical processes may be, for example, a multi-process program that performs the same task, or a plurality of processes that perform a plurality of associated tasks. Each process group has a unique process group identifier to facilitate management of processes in the process group.
For example, the state information may represent the running condition of the processes in the target process group, the access condition of the processes in the target process group to the graphics processor, the use condition of the CPU (Central Processing Unit ) resources, GPU resources, memory resources and the like, and through the state information of the target process group, the corresponding management operation may be performed on the processes in the target process group. When the process has been allocated to all necessary resources except the CPU, it can be executed immediately as soon as the CPU resources are obtained again, and the state of the process at this time is called ready state. There may be multiple processes in a system in a ready state, which are typically queued into a queue, referred to as a ready queue. The process has acquired CPU resources whose programs are running. In a single processor system, only one process is in an operational state; in a multiprocessor system, a plurality of processes are in operation.
S102, triggering the storage operation of the device context of the target process group when the state information indicates that none of the processes in the target process group access the graphic processor within a preset time.
The preset time period may be set according to the actual requirement of the user, for example, may be 5-60 minutes. When processes in the target process group in the preset time period do not access the graphic processor, namely GPU resources are not used in the preset time period, the GPU resources occupied by the target process group corresponding to the currently operated process cannot be used by other processes, and effective utilization of the resources cannot be achieved. The long-time unused GPU resources herein include several scenarios, one is that the user has no operation for a long time, i.e. both the CPU resources and the GPU resources are occupied but no GPU operation or CPU operation is performed, such as in the case where the user is inputting a code command line; the other is that the user performs only the CPU operation for a long time without performing the GPU operation, for example, the operation is performed by the CPU during the running of the program process, but the operation is not performed using the GPU.
By way of example, all programs in a target process group to which the running process belongs are monitored, and when all processes in the target process group do not access the graphics processor within a preset time period, GPU resources occupied by the target process group are not used. Here, the save operation of the device context of the target process group is triggered when all processes within the target process group do not access the graphics processor, and is not triggered when a single process or a portion of the processes do not access the graphics processor. The reason is that when GPU data sharing exists among a plurality of processes in the target process group, if a certain process does not operate for a long time, a save operation is triggered, GPU resources are released, and other processes may be caused to make mistakes.
Illustratively, in the process of monitoring the target process group, recording the operation information of the target process group to the graphics processor, and recording the time when the processes in the target process group finally access the graphics processor; when the last access time and the current time exceed the preset duration, the storage operation is triggered, and all processes in the target process group store the device context and can be used for the subsequent process of recovering the task corresponding to the target process group.
And S103, when the storage operation is completed, releasing GPU resources occupied by the target process group in the graphic processor.
For example, after the target process group completes saving the device context, GPU resources occupied by the target process group in the graphics processor may be released without exiting the process. GPU resources are released into a resource pool and can be scheduled for other users, and at the moment, after unused GPU resources are released by processes in the target process group, the GPU resources still can still continue to run based on CPU resources and the like without exiting.
The method comprises the steps that when state information of a target process group indicates that processes in the target process group do not access a graphic processor within a preset time period, a storage operation of equipment context of the target process group is triggered; and when the save operation is completed, releasing GPU resources occupied by the target process group in the graphics processor. Therefore, aiming at the scene that the process does not use the GPU resource for a long time, the GPU resource occupied by the target process group can be released on the premise that the process in the running state does not exit, and the GPU resource can be used by other users, so that the resource utilization rate is improved. Meanwhile, after the device context of the target process group is stored, the device context can be used for recovering the task corresponding to the target process group, the calculation process is not required to be executed again, and the calculation force resource of the graphic processor is saved.
As an optional embodiment, the device context includes an operation record of the target process group on the graphics processor and video memory data of the target process group, and triggering a save operation of the device context of the target process group includes:
acquiring the operation record and the video memory data;
and storing the operation record and the display data to a preset storage position.
For example, to facilitate restoring the device context of the target process group when a subsequent user process revisits the graphics processor to save a portion of the computing power resources of the graphics processor, the device context may be saved based on the playback operation. Namely, when the process of the target process group runs, the operation information of the user on the graphic processor is recorded, an operation record is formed, and the display data can be stored. The operation record and the video memory data can be stored in a preset storage position, and the preset storage position can be any allowed position, for example, stored in a local or cloud end.
As an alternative embodiment, after triggering a save operation of the device context of the target process group, the method further comprises:
determining that a process in the target process group revisits the graphics processor;
and triggering the preset operation of the target process group according to the progress of the save operation.
The preset operation may be a resume operation or an interrupt operation, for example. Processes in the target process group may revisit the graphics processor and request use of GPU resources. The operation of saving the device context may take a long time, and if the process in the target process group revisits the graphics processor, the saving operation of the device context for the target process group is triggered, two situations occur: after the device context is saved, GPU resources occupied by the target process group can be released, and then the progress of the process of the target process group can be recovered by rescheduling the GPU resources in the resource pool; when the device context is not saved, the process in the target process group still operates and occupies GPU resources, at this time, the save operation can be interrupted to continue operating the process, and the GPU resources can not be rescheduled. Therefore, the recovery operation or the interrupt operation to the target process group can be triggered according to the progress of the save operation.
As an optional embodiment, triggering a preset operation on the target process group according to the progress of the save operation includes:
triggering a recovery operation of the device context of the target process group when the progress characterizes the completion of the save operation;
and triggering an interrupt operation for the save operation when the progress characterizes that the save operation is not completed.
For example, access to the graphics processor by a process within the target process group may be monitored upon triggering a save operation of the device context of the target process group. If the process in the target process group revisits the graphic processor, but the saving operation is not completed, the operation of currently saving the equipment context can be interrupted, and the original progress can be quickly restored without additional operation because the original equipment context is not released at the moment. If the process in the target process group revisits the graphic processor and the save operation is completed, the GPU resource occupied by the target process group is released, and the progress and the data of the process in the target process group for executing the operation can be restored based on the saved equipment context, so that the use of the computing power resource of the graphic processor can be reduced in the restoration process.
As an optional embodiment, the device context includes an operation record of the target process group to the graphics processor and video memory data of the target process group, and triggering a recovery operation of the device context of the target process group includes:
and allocating resources for the target process group in a free resource pool so that the target process group re-executes the operation in the operation record, and copying the display data to the graphic processor to restore the equipment context of the target process group.
Illustratively, when a resource is used, the resource pool clears the busy identifier of the associated resource to indicate that the resource may be used by the next request. And when the recovery operation of the equipment context of the target process group is triggered, the processes in the target process group execute the operation in the operation record by utilizing the GPU resources, and at the moment, the idle resources in the idle resource pool can be distributed to the target process group through a scheduling module for scheduling the resources, so that the target process group can execute the operation in the operation record again.
As an alternative embodiment, the method further comprises:
when the state information characterizes that a first target process in the target process group exits, marking the first target process;
and executing the save operation or the restore operation according to the rest processes which are not marked in the target process group.
For example, processes within the target process group may exit due to an exception, execution completion, or user cleanup during execution, and since both save and restore operations are performed on all processes of the target process group, the restore operation is also based on the device context saved by the save operation. The first target process may be a part of processes in the target process group, and when processes in the target process group exit, if the exiting target process is still saved and restored, a restoring operation of the target process group may be affected. The exiting target process may be marked here to avoid subsequent repeated operations on the target process.
As an alternative embodiment, the method further comprises:
triggering a clearing operation of metadata information stored locally by a second target process when the state information characterizes the second target process in the target process group to exit;
and when the state information characterizes that all processes in the target process group exit, releasing the resources occupied by the target process group.
For example, the second target process may be a part of processes in the target process group, and when the second target process in the target process group exits, the process metadata information corresponding to the target process and stored locally is cleaned up, so that other processes in the same subsequent process group are prevented from being affected. The metadata information of the process comprises a process number, a node to which the process belongs, a process group to which the process belongs, a current running state of the process and the like. When processes within the target process group all exit, the target process group no longer uses resources, which are understood to include CPU resources, GPU resources, and the like. Therefore, the resources used by the target process group can be released for other process groups to use, so that the resource utilization rate is improved.
Through the flow, unified management of the processes in the target process group can be realized. Here, the nodes to which the processes in the target process group belong may be the same node or different nodes, so that cross-node management of the processes in the same process group can be further realized.
Referring to fig. 2, fig. 2 shows a system for managing a process group according to the above-mentioned process management method, where the system for managing a process group includes a resource management service, a process, and a resident service, where the resident service may be used to maintain related information of all processes on a node, and the resource management service is responsible for managing global resource information and managing processes in the same process group, so as to implement global resource scheduling and allocation. The process comprises a management module, a first communication module, an interrupt module, a storage module, a recovery module and a first heartbeat module, the resident service comprises a trigger module, a second communication module, a second heartbeat module and an exception handling module, and the resource management service comprises a third communication module, a resource management module and a process management module.
The management module may be configured to receive a message from an internal module of the process, and determine to process based on the content of the message, so as to manage metadata information in the process and manage a save and restore process. The first communication module may be configured to communicate with a second communication module of a resident service on the node and a third communication module of a resource management service, report a state of a process, and receive information sent by the resource management service. The interrupt module can be used for monitoring the access condition of the process to the graphic processor in the process of saving the device context, interrupting the current flow of saving the device context if the process is accessed, and transmitting an interrupt message to the resource management service through the first communication module and to other processes in the same process group, thereby interrupting the saving process. The save module may be used to save a device context of a process. The restoration module may be configured to restore the executing program using the saved device context. The first heartbeat module can be used for communicating with a second heartbeat module of the resident service to ensure that the process is in an operating state. When the process exits for various reasons, the heartbeat of the first heartbeat module stops, and the resident service monitors that the heartbeat stops and transmits information to the resource management service so as to clear the metadata information of the process.
The triggering module can be used for recording the access condition of the process in the process group to the graphic processor. If the last time the process in the process group accesses the graphic processor exceeds the preset duration, triggering the storage operation of the device context of the process in the process group, and at the moment, sending a triggering message to a resource management service by a triggering module, and uniformly managing the processes in the process group by the resource management service. The second communication module may be used to communicate with processes and resource management services to implement message passing. The second heartbeat module is communicated with the first heartbeat module, the state of a process is monitored, when the heartbeat of a certain process is stopped, the corresponding process is possibly abnormal, and the second heartbeat module transmits information to the abnormality processing module. The exception handling module can be used for handling the process of the abnormality, when receiving the process exception information, the exception handling module can monitor the state of the abnormal process, when confirming that the process exits, the exception handling module transmits the information to the resource management service, so that the process management module in the resource management service cleans the relevant information of the process.
Wherein the third communication module may be used to communicate with resident services and processes. The resource management module may be used to manage global resource information and schedule resources among multiple process groups. When one process group saves the equipment context, the resource is released, and the resource management module can schedule the part of the resource again; when the saved process group is restored, the resource management module can allocate resources for the process group again, so that the process restoration is ensured. The process management module may be configured to record metadata information of a process, including a process number, a node to which the process belongs, a process group to which the process belongs, a current state, and the like. When a process is abnormally exited, the process management module cleans up metadata information stored locally by the process, so that other processes in the same process group are prevented from being influenced; when all processes in the same process group exit, the process management module notifies the resource management module to release the resources of the current process group for resource scheduling; when the process state changes, the process management module marks the process state and avoids subsequent repeated operation on the process. The process management module is a global process management component, so that various problems such as inconsistency caused by independent management of each process/node are avoided, and the problem of inconsistency possibly generated by a cross-node process group is effectively solved.
It can be understood that the communication flow among the resource management service, the process and the resident service is realized through the corresponding communication module, and the information required to be transferred by other modules is sent to the corresponding communication module and then forwarded to the communication module of another module by the corresponding communication module, so that the unified management of the information can be realized without independent transfer. For example, when the trigger module of the resident service sends the trigger message triggering the save operation to the resource management service, the trigger module sends the trigger message to the second communication module, and after the second communication module sends the trigger message to the third communication module, the third communication module forwards the trigger message to the process management module, and the process management module may send the processed trigger message to the first communication module of all processes in the process group, and send the message to the save module of the corresponding process through the first communication module of each process, so that each process in the process group executes the save operation.
As an alternative embodiment, referring to fig. 3, fig. 3 is a flowchart of another process management method according to an exemplary illustration, including the following steps.
S301, recording the access condition of a process group to which the process belongs to the graphic processor by the resident service;
s302, determining whether the duration that none of the processes in the process group access the graphic processor exceeds a preset duration, if so, executing the step S303, otherwise, returning to the step S301;
s303, the resident service informs the resource management service to trigger the save operation of all the processes in the process group;
s304, the resource management service informs all processes in the process group to save the device context;
s305, executing the operation of saving the device context by the processes in the process group;
s306, determining whether any process receives interrupt information, if yes, executing step S307, and if not, executing step S308;
s307, the process interrupts the save operation, resumes executing the user program, and notifies the resource management service to notify other programs in the process group to execute the interrupt save operation;
s308, after the storage operation is completed, releasing GPU resources occupied by the process group;
s309, determining whether any process receives the recovery information, if yes, executing step S310, and if not, repeatedly executing step S309;
s310, the process executes the recovery operation to recover executing the user program, and notifies the resource management service to notify other programs in the process group to execute the recovery operation.
Referring to fig. 4, fig. 4 is a block diagram of a process management apparatus provided according to an exemplary embodiment, and as shown in fig. 4, the process management apparatus includes an acquisition module 401, a save module 402, and a resource release module 403.
An obtaining module 401, configured to obtain state information of a target process group, where the target process group is a process group to which a process in an running state belongs;
a saving module 402, configured to trigger a saving operation of a device context of the target process group when the state information indicates that none of the processes in the target process group access the graphics processor within a preset duration;
and the resource releasing module 403 is configured to release the GPU resource occupied by the target process group in the graphics processor when the save operation is completed.
As an alternative embodiment, the device context includes an operation record of the target process group to the graphics processor and video memory data of the target process group, and the saving module 402 is further configured to:
acquiring the operation record and the video memory data;
and storing the operation record and the display data to a preset storage position.
As an alternative embodiment, the process management apparatus further includes:
a determining module, configured to determine that a process in the target process group revisits the graphics processor;
and the triggering module is used for triggering the preset operation of the target process group according to the progress of the storage operation.
As an optional embodiment, according to the progress of the save operation, the triggering module includes:
the first triggering sub-module is used for triggering the recovery operation of the device context of the target process group when the progress characterizes that the preservation operation is completed;
and the second triggering sub-module is used for triggering the interrupt operation of the save operation when the progress characterizes that the save operation is not completed.
As an optional embodiment, the device context includes an operation record of the target process group to the graphics processor and video memory data of the target process group, and the first triggering sub-module is further configured to:
and allocating resources for the target process group in a free resource pool so that the target process group re-executes the operation in the operation record, and copying the display data to the graphic processor to restore the equipment context of the target process group.
As an alternative embodiment, the process management apparatus further includes:
the marking module is used for marking a first target process in the target process group when the state information characterizes the first target process to exit;
and the execution module is used for executing the save operation or the recovery operation according to the rest processes which are not marked in the target process group.
As an alternative embodiment, the process management apparatus further includes:
the clearing module is used for triggering the clearing operation of the metadata information stored locally by the second target process when the state information characterizes the second target process in the target process group to exit;
and the releasing module is used for releasing the resources occupied by the target process group when the state information characterizes that all processes in the target process group exit.
With respect to the process management apparatus in the above-described embodiment, the specific manner in which the respective modules perform operations has been described in detail in the embodiment regarding the process management method, and will not be described in detail here.
The present disclosure also provides a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the process management method provided by the present disclosure.
The present disclosure provides an electronic device, comprising:
a memory having a computer program stored thereon;
and a processor for executing the computer program in the memory to implement the process management method provided by the present disclosure.
Fig. 5 is a block diagram of an electronic device 700, according to an example embodiment. As shown in fig. 5, the electronic device 700 may include: a processor 701, a memory 702. The electronic device 700 may also include one or more of a multimedia component 703, an i/O interface 704 (input/output interface), and a communication component 705.
The processor 701 is configured to control the overall operation of the electronic device 700 to perform all or part of the steps in the process management method described above. The memory 702 is used to store various types of data to support operation on the electronic device 700, which may include, for example, instructions for any application or method operating on the electronic device 700, as well as application-related data, such as contact data, messages sent and received, pictures, audio, video, and so forth. The Memory 702 may be implemented by any type or combination of volatile or non-volatile Memory devices, such as static random access Memory (Static Random Access Memory, SRAM for short), electrically erasable programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM for short), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM for short), programmable Read-Only Memory (Programmable Read-Only Memory, PROM for short), read-Only Memory (ROM for short), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia component 703 can include a screen and an audio component. Wherein the screen may be, for example, a touch screen, the audio component being for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signals may be further stored in the memory 702 or transmitted through the communication component 705. The audio assembly further comprises at least one speaker for outputting audio signals. The I/O interface 704 provides an interface between the processor 701 and other interface modules, which may be a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 705 is for wired or wireless communication between the electronic device 700 and other devices. Wireless communication, such as Wi-Fi, bluetooth, near field communication (Near Field Communication, NFC for short), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or one or a combination of more of them, is not limited herein. The corresponding communication component 705 may thus comprise: wi-Fi module, bluetooth module, NFC module, etc.
In an exemplary embodiment, the electronic device 700 can be implemented by one or more application specific integrated circuits (Application Specific Integrated Circuit, abbreviated ASIC), digital signal processor (Digital Signal Processor, abbreviated DSP), digital signal processing device (Digital Signal Processing Device, abbreviated DSPD), programmable logic device (Programmable Logic Device, abbreviated PLD), field programmable gate array (Field Programmable Gate Array, abbreviated FPGA), controller, microcontroller, microprocessor, or other electronic components for performing the process management methods described above.
In another exemplary embodiment, a computer readable storage medium is also provided that includes program instructions that, when executed by a processor, implement the steps of the process management method described above. For example, the computer readable storage medium may be the memory 702 including program instructions described above, which are executable by the processor 701 of the electronic device 700 to perform the process management method described above.
In another exemplary embodiment, a computer program product is also provided, comprising a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-mentioned process management method when executed by the programmable apparatus.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A process management method, the method comprising:
acquiring state information of a target process group, wherein the target process group is a process group to which a process in an operating state belongs;
triggering a saving operation of the device context of the target process group when the state information indicates that none of the processes in the target process group access the graphics processor within a preset time period;
and when the save operation is completed, releasing GPU resources occupied by the target process group in the graphics processor.
2. The method of claim 1, wherein the device context includes an operational record of the graphics processor by the target process group and memory data of the target process group, and triggering a save operation of the device context of the target process group comprises:
acquiring the operation record and the video memory data;
and storing the operation record and the display data to a preset storage position.
3. The method of claim 1, wherein after triggering the save operation of the device context of the target process group, the method further comprises:
determining that a process in the target process group revisits the graphics processor;
and triggering the preset operation of the target process group according to the progress of the save operation.
4. A method according to claim 3, wherein triggering a preset operation on the target process group according to the progress of the save operation comprises:
triggering a recovery operation of the device context of the target process group when the progress characterizes the completion of the save operation;
and triggering an interrupt operation for the save operation when the progress characterizes that the save operation is not completed.
5. The method of claim 4, wherein the device context includes an operational record of the graphics processor by the target process group and memory data of the target process group, triggering a restore operation of the device context of the target process group, comprising:
and allocating resources for the target process group in a free resource pool so that the target process group re-executes the operation in the operation record, and copying the display data to the graphic processor to restore the equipment context of the target process group.
6. The method according to claim 4, wherein the method further comprises:
when the state information characterizes that a first target process in the target process group exits, marking the first target process;
and executing the save operation or the restore operation according to the rest processes which are not marked in the target process group.
7. The method according to any one of claims 1-5, further comprising:
triggering a clearing operation of metadata information stored locally by a second target process when the state information characterizes the second target process in the target process group to exit;
and when the state information characterizes that all processes in the target process group exit, releasing the resources occupied by the target process group.
8. A process management apparatus, comprising:
the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring state information of a target process group, and the target process group is a process group to which a process in an operation state belongs;
the storage module is used for triggering the storage operation of the equipment context of the target process group when the state information indicates that none of the processes in the target process group access the graphic processor within a preset time period;
and the resource release module is used for releasing GPU resources occupied by the target process group in the graphic processor when the storage operation is completed.
9. A non-transitory computer readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements the process management method according to any of claims 1-7.
10. An electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to implement the process management method of any one of claims 1-7.
CN202410064434.1A 2024-01-16 2024-01-16 Process management method, device, medium and equipment Active CN117593172B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410064434.1A CN117593172B (en) 2024-01-16 2024-01-16 Process management method, device, medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410064434.1A CN117593172B (en) 2024-01-16 2024-01-16 Process management method, device, medium and equipment

Publications (2)

Publication Number Publication Date
CN117593172A true CN117593172A (en) 2024-02-23
CN117593172B CN117593172B (en) 2024-04-23

Family

ID=89911874

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410064434.1A Active CN117593172B (en) 2024-01-16 2024-01-16 Process management method, device, medium and equipment

Country Status (1)

Country Link
CN (1) CN117593172B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190235924A1 (en) * 2018-01-31 2019-08-01 Nvidia Corporation Dynamic partitioning of execution resources
CN114567784A (en) * 2022-04-24 2022-05-31 银河麒麟软件(长沙)有限公司 VPU video decoding output method and system for Feiteng display card
CN115269341A (en) * 2022-09-26 2022-11-01 浩鲸云计算科技股份有限公司 Multi-dimensional monitoring method and system for GPU (graphics processing Unit) virtual resource utilization rate
CN115357389A (en) * 2022-08-22 2022-11-18 维沃移动通信有限公司 Memory management method and device and electronic equipment
CN115391000A (en) * 2021-05-25 2022-11-25 腾讯科技(深圳)有限公司 Business resource monitoring method and device, electronic equipment and storage medium
CN115543674A (en) * 2022-10-19 2022-12-30 深圳市正浩创新科技股份有限公司 Process management method and device, electronic equipment and storage medium
CN116149818A (en) * 2023-02-10 2023-05-23 阿里云计算有限公司 Migration method, equipment, system and storage medium of GPU (graphics processing Unit) application
CN116893899A (en) * 2023-07-07 2023-10-17 中国电信股份有限公司技术创新中心 Resource allocation method, device, computer equipment and storage medium
CN117078495A (en) * 2023-08-18 2023-11-17 苏州浪潮智能科技有限公司 Memory allocation method, device, equipment and storage medium of graphic processor
CN117215721A (en) * 2023-09-06 2023-12-12 山石网科通信技术股份有限公司 Virtual system management method and device, electronic equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190235924A1 (en) * 2018-01-31 2019-08-01 Nvidia Corporation Dynamic partitioning of execution resources
CN115391000A (en) * 2021-05-25 2022-11-25 腾讯科技(深圳)有限公司 Business resource monitoring method and device, electronic equipment and storage medium
CN114567784A (en) * 2022-04-24 2022-05-31 银河麒麟软件(长沙)有限公司 VPU video decoding output method and system for Feiteng display card
CN115357389A (en) * 2022-08-22 2022-11-18 维沃移动通信有限公司 Memory management method and device and electronic equipment
CN115269341A (en) * 2022-09-26 2022-11-01 浩鲸云计算科技股份有限公司 Multi-dimensional monitoring method and system for GPU (graphics processing Unit) virtual resource utilization rate
CN115543674A (en) * 2022-10-19 2022-12-30 深圳市正浩创新科技股份有限公司 Process management method and device, electronic equipment and storage medium
CN116149818A (en) * 2023-02-10 2023-05-23 阿里云计算有限公司 Migration method, equipment, system and storage medium of GPU (graphics processing Unit) application
CN116893899A (en) * 2023-07-07 2023-10-17 中国电信股份有限公司技术创新中心 Resource allocation method, device, computer equipment and storage medium
CN117078495A (en) * 2023-08-18 2023-11-17 苏州浪潮智能科技有限公司 Memory allocation method, device, equipment and storage medium of graphic processor
CN117215721A (en) * 2023-09-06 2023-12-12 山石网科通信技术股份有限公司 Virtual system management method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN117593172B (en) 2024-04-23

Similar Documents

Publication Publication Date Title
US10838777B2 (en) Distributed resource allocation method, allocation node, and access node
US8112559B2 (en) Increasing available FIFO space to prevent messaging queue deadlocks in a DMA environment
EP2641419B1 (en) A method and system for cell recovery in telecommunication networks
CN109656742B (en) Node exception handling method and device and storage medium
EP1094389A2 (en) Method for monitoring fault of operating system and application program
JP2006011992A (en) System switching method of cluster configuration computer system
JP2004030363A (en) Logical computer system, and method and program for controlling configuration of logical computer system
CN112153024B (en) Mimicry defense system based on SaaS platform
EP2811402B1 (en) Transaction resuming program, information processing apparatus and transaction resuming method
JP2006277115A (en) Abnormality detection program and abnormality detection method
US8631086B2 (en) Preventing messaging queue deadlocks in a DMA environment
US9176783B2 (en) Idle transitions sampling with execution context
CN110109741B (en) Method and device for managing circular tasks, electronic equipment and storage medium
CN114840318A (en) Scheduling method for preempting hardware key encryption and decryption resources through multiple processes
CN116339954A (en) Process migration method and device, storage medium and electronic equipment
CN111858077A (en) Recording method, device and equipment for IO request log in storage system
CN117593172B (en) Process management method, device, medium and equipment
CN110287159B (en) File processing method and device
CN111897626A (en) Cloud computing scene-oriented virtual machine high-reliability system and implementation method
JP2000222376A (en) Computer system and method for operating the same
CN115422010A (en) Node management method and device in data cluster and storage medium
US11442843B2 (en) Methods and systems for identifying, handling, and debugging a hung thread
CN108255515A (en) A kind of method and apparatus for realizing timer service
CN114490270A (en) Data acquisition method and device
CN107092531B (en) Computing framework, electronic device and information processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant