WO2022121866A1 - 一种基于加速卡的服务运行方法、装置、电子设备及计算机可读存储介质 - Google Patents

一种基于加速卡的服务运行方法、装置、电子设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2022121866A1
WO2022121866A1 PCT/CN2021/135879 CN2021135879W WO2022121866A1 WO 2022121866 A1 WO2022121866 A1 WO 2022121866A1 CN 2021135879 W CN2021135879 W CN 2021135879W WO 2022121866 A1 WO2022121866 A1 WO 2022121866A1
Authority
WO
WIPO (PCT)
Prior art keywords
service
memory
accelerator card
card
virtual address
Prior art date
Application number
PCT/CN2021/135879
Other languages
English (en)
French (fr)
Inventor
李孟轩
刘一鸣
Original Assignee
第四范式(北京)技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 第四范式(北京)技术有限公司 filed Critical 第四范式(北京)技术有限公司
Publication of WO2022121866A1 publication Critical patent/WO2022121866A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/28Indexing scheme for image data processing or generation, in general involving image processing hardware

Definitions

  • the present disclosure relates to the field of computer technologies, and in particular, to a method, apparatus, electronic device, and computer-readable storage medium for running a service based on an accelerator card.
  • accelerator cards may include, for example, GPU (Graphics Processing Unit, graphics processor), TPU (tensor processing unit, tensor processor), NPU (Neural-network Processing Unit, embedded neural network processor), FPGA (Field Programmable) Gate Array, Field Programmable Gate Array) and so on.
  • GPU Graphics Processing Unit
  • TPU tensor processing unit, tensor processor
  • NPU Neuro-network Processing Unit
  • FPGA Field Programmable Gate Array
  • Field Programmable Gate Array Field Programmable Gate Array
  • the accelerator card has powerful computing functions, but it takes a long time to initialize the programs it executes. Therefore, in order to ensure that the programs can respond to service requests in a timely manner, the programs running on the accelerator card often maintain services online for a long time once initialized. Monitor status, even if service demand is infrequent.
  • each service for reviewing video streams needs to determine whether a video stream has illegal content based on the incoming pictures. Some need to be handed over to manual processing.
  • each service In order to ensure timely response to audit requests, each service must have an exclusive accelerator card. However, the number of requests per day for this service is very low, perhaps only a few dozen requests a day, resulting in a large number of accelerator card resources being idle.
  • the embodiments of the present disclosure provide an accelerator card-based service operation method, apparatus, electronic device, and computer-readable storage medium, which can effectively improve the utilization rate of the accelerator card while ensuring the service response speed of the accelerator card.
  • an embodiment of the present disclosure provides a method for running a service based on an accelerator card, wherein at least one service is deployed on the accelerator card, and the method includes: for each service in the at least one service, in response to In the event that the service enters a frozen state, the process corresponding to the service is swapped out from the onboard memory of the accelerator card to the memory of the central processing unit CPU; and, for each service in the at least one service, in response to the When the service enters the active state, the process corresponding to the service is swapped from the memory of the CPU to the onboard memory of the acceleration card.
  • an embodiment of the present disclosure further provides an apparatus for running a service based on an accelerator card, wherein at least one service is deployed on the accelerator card, and the apparatus includes: a swap-out unit for running the at least one service For each service in, in response to the event that the service enters the frozen state, the process corresponding to the service is swapped out from the onboard memory of the acceleration card to the memory of the central processing unit CPU; and, the swap-in unit is used for Each service in the at least one service, in response to an event that the service enters an active state, swaps a process corresponding to the service from the memory of the CPU to the onboard memory of the acceleration card.
  • embodiments of the present disclosure further provide an electronic device, the electronic device includes: a housing, a processor, a memory, a circuit board, and a power supply circuit, wherein the circuit board is arranged inside the space enclosed by the housing, The processor and the memory are arranged on the circuit board; the power supply circuit is used to supply power to each circuit or device of the above electronic equipment; the memory is used to store the executable program code; the processor runs by reading the executable program code stored in the memory A program corresponding to the executable program code is used to execute any method provided by the embodiments of the present disclosure.
  • embodiments of the present disclosure further provide a computer-readable storage medium, where the computer-readable storage medium stores one or more programs, and the one or more programs can be executed by one or more processors , to implement any method provided by the embodiments of the present disclosure.
  • the acceleration card-based service running method, device, electronic device, and computer-readable storage medium provided by the embodiments of the present disclosure, for each service in at least one service deployed in the acceleration card, can respond to the service entering a frozen state. event, the process corresponding to the service is swapped out from the onboard memory of the accelerator card to the memory of the central processing unit CPU, and also in response to the event that the service enters the active state, the process corresponding to the service is swapped from the memory of the CPU. into the onboard memory of the accelerator card.
  • multiple services can be deployed in the accelerator card, and according to the events in the accelerator card, the service can be put into a frozen state, and the process corresponding to the service can be swapped out from the onboard memory to the memory of the processor, thereby releasing the The corresponding resources in the accelerator card, or the service is activated from the frozen state, and the process corresponding to the service is swapped in from the processor's memory to the onboard memory, thereby continuing to run the service.
  • the suspension or operation of each service can be flexibly controlled and scheduled according to various events, which effectively improves the utilization rate of the accelerator card while ensuring the response speed of the accelerator card service.
  • FIG. 1 is a flowchart of a method for running a service based on an accelerator card provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of the onboard memory of the accelerator card occupied by a process running in the accelerator card according to an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of service operation in the case of not using the on-board memory swap-in and swap-out technology in the related art
  • FIG. 4 is a schematic diagram of the usage of an accelerator card for idle services in an embodiment of the present disclosure
  • FIG. 5 is a schematic diagram of the usage of an accelerator card when a service is activated in an embodiment of the present disclosure
  • FIG. 7 is a detailed flowchart of a method for running a service based on an accelerator card provided by an embodiment of the present disclosure
  • FIG. 8 is a schematic structural diagram of an accelerator card-based service running device provided by an embodiment of the present disclosure.
  • FIG. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
  • an acceleration card is used to accelerate computing.
  • the accelerator card has powerful computing functions, it takes a long time to initialize the program it executes. Therefore, in order to ensure that the program can respond to service requests in a timely manner, the programs running on the accelerator card are often maintained online for a long time once initialized. Service listening status, even if service demand is infrequent, resulting in lower utilization of the accelerator card.
  • the inventor found in the research that the service deployed in the accelerator card can be switched between the frozen state and the activated state by using the memory of the central processing unit and the on-board memory of the accelerator card, so as to ensure the service response of the accelerator card. At the same time, it can effectively improve the utilization rate of the accelerator card.
  • an embodiment of the present disclosure provides a service running method based on an accelerator card, which can effectively improve the utilization rate of the accelerator card while ensuring the service response speed of the accelerator card.
  • a method for running a service based on an accelerator card provided by an embodiment of the present disclosure, wherein at least one service is deployed on the accelerator card, and the method may include:
  • the accelerator card may be various chips with powerful computing functions, such as GPU, TPU, NPU, FPGA, and the like.
  • the accelerator card can be used in conjunction with the central processing unit to achieve timely response to services.
  • the accelerator card is provided with a processor with computing functions, and is also provided with an on-board memory for storing computing data and programs.
  • the storage space of the onboard memory is generally relatively small, and it is difficult to store a large amount of data and programs.
  • one or more services may be deployed on the accelerator card.
  • a model training service for image recognition a model training service for speech recognition may be deployed on the accelerator card A model training service, a commodity query service, etc.
  • each service deployed on the accelerator card can run on the accelerator card at the same time, or run at different times according to various events that occur in the accelerator card.
  • the process corresponding to the service can be swapped out from the onboard memory of the accelerator card to the memory of the central processing unit CPU,
  • the accelerator card resource corresponding to the service is released, so that other services can utilize the resource, thereby effectively improving the resource utilization rate of the accelerator card.
  • the process corresponding to the service can be swapped from the CPU memory to the onboard memory of the accelerator card, so as to continue to run the service. process in order to respond to the service in a timely manner.
  • the process corresponding to the service can be deleted from the The onboard memory of the accelerator card is swapped out to the memory of the central processing unit CPU, and the process corresponding to the service can also be swapped into the onboard memory of the accelerator card from the memory of the CPU in response to the event that the service enters the active state.
  • multiple services can be deployed in the accelerator card, and according to the events in the accelerator card, the service can be put into a frozen state, and the process corresponding to the service can be swapped out from the onboard memory to the memory of the processor, thereby releasing the The corresponding resources in the accelerator card, or the service is activated from the frozen state, and the process corresponding to the service is swapped in from the processor's memory to the onboard memory, thereby continuing to run the service.
  • the suspension or operation of each service can be flexibly controlled and scheduled according to various events, which effectively improves the utilization rate of the accelerator card while ensuring the response speed of the accelerator card service.
  • the above technology can be applied to the video website scenario described in the background art, that is, many services can be shared on one accelerator card, and most of the services are in the state of swapping out and freezing. When each service receives a request, it will be Swap in to execute, and then swap out when done. This solution greatly reduces the number of required accelerator cards and greatly saves costs.
  • whether each service enters a frozen state or whether it enters an activated state can be triggered by various events.
  • the event of entering a frozen state may include one or more of the following: the idle duration of the service exceeds a preset duration threshold; the running rate of the service is low is lower than the preset rate threshold; the running priority of the service is lower than the preset priority threshold.
  • the idle time of a service in the accelerator card is too long, it means that there has been no request for the service for a long time. , which puts the service into a frozen state.
  • the running rate of a service is too low, it means that the resources in the accelerator card are too tight to support the effective operation of the current service.
  • the running priority of a service is lower than the preset priority, it means that the task does not need to run immediately, and the service can be temporarily frozen.
  • the running priority of the service can also be adjusted in time according to the nature of the service itself and the running status of each task in the accelerator card, so that the task scheduling in the accelerator card is more flexible.
  • the event of entering the active state may include at least one of the following: the service is invoked, and new resources of the accelerator card are released.
  • the service when an event of a service being called occurs, if the service is in a frozen state, the service can be activated in time to respond to the service request in time.
  • the service in the frozen state can also be activated, so as to use the newly released resources to run the service in time.
  • swapping out the process corresponding to the service from the on-board memory of the accelerator card to the memory of the central processing unit CPU may include: The context information of the first process corresponding to the service and the first virtual address space where the first process is located are retained in the loaded memory, and the first process is swapped out to the memory of the central processing unit CPU to release the The corresponding occupied resources in the onboard memory of the accelerator card.
  • each service deployed in the acceleration card may correspond to one process, that is, the first process.
  • the context information of the first process and the virtual address space corresponding to the first process, that is, the first virtual address space are also recorded.
  • the context information includes information such as the current running environment configuration of the process, which can be used for the acceleration card driver to identify the process, maintain the process state, and the like. Since only these configuration information are recorded, the file of context information is usually small, usually 200M to 1G, so it will not occupy too much onboard memory.
  • the onboard memory of the acceleration card occupied by the process running in the acceleration card may be as shown in FIG. 2 .
  • the virtual address space is an addressing method in which the operating system uses memory addresses for upper-layer applications.
  • a virtual address space refers to an address range in a virtual address. Through the mapping of the operating system, virtual addresses can be mapped to corresponding physical addresses. It should be noted that the virtual address space and the real physical space are not in a one-to-one correspondence. The operating system will update the correspondence between the virtual address and the physical address according to the current application and the current memory usage.
  • the first virtual address space corresponding to the first process is reserved, that is, the virtual address space corresponding to a segment of virtual addresses corresponding to the first virtual address space is reserved, for example, X0100 ⁇ XF000 this virtual address space, so that this virtual address space will not be occupied by other application services.
  • the physical address corresponding to this segment of the virtual address space will be released as the first process is swapped out of the onboard memory of the accelerator card.
  • the acceleration card-based service provided by the embodiments of the present disclosure.
  • the running method may further include: in the onboard memory of the acceleration card, applying for a segment of the first virtual address space as a reserved space; mapping the reserved space to the first process, and obtaining the first virtual address space of the first process. virtual address space; run the first process in the first virtual address space.
  • the embodiment of the present disclosure can use a brand-new interface of cuMemAddressReserve+cuMemMap to apply for the board of the accelerator card by mapping. Therefore, the virtual address space can be kept unoccupied, so as to ensure that the first process can be successfully re-swapped from the memory of the central processing unit to the onboard memory of the acceleration card and run.
  • the selected memory segment is not the memory segment that is usually applied for by the process using malloc when applying for memory, but the memory segment that is applied for by applying for memory cuMemHostAlloc to share the page table with the GPU. This can greatly reduce the swap-in and swap-out time. The whole process, whether swapping in or swapping out, takes seconds.
  • the first virtual address space corresponding to the first process is obtained through the above method, when an event occurs that causes the service to enter a frozen state, the first virtual address space and context information of the first process corresponding to the service can be saved, and the first virtual address space and context information of the first process corresponding to the service can be saved.
  • a process is swapped out to the CPU's memory.
  • the process corresponding to the service may be swapped into the onboard memory of the accelerator card from the memory of the CPU in step S12.
  • swapping the process corresponding to the service from the memory of the CPU to the onboard memory of the acceleration card may include: swapping the first process corresponding to the service from the memory of the central processing unit (CPU) into the onboard memory of the accelerator card, and save it in the first virtual address space of the onboard memory; and continue to run the first process according to the context information of the first process. That is to say, after the first process corresponding to the service is re-swapped into the onboard memory of the accelerator card, the virtual address space used by it is still the virtual address space used before being swapped out of the onboard memory of the accelerator card. After being re-swapped into the onboard memory of the accelerator card, the first process can be identified and continued to run according to the saved context information of the first process.
  • CPU central processing unit
  • continuing to run the first process according to context information of the first process may include: identifying the first process according to the context information; The maintained process state starts from a program breakpoint when the first process is swapped out of the onboard memory of the accelerator card, and continues to run the first process.
  • the first process can quickly enter the running state according to the context information without re-initialization, thereby greatly speeding up the running efficiency of the process and the response speed of the service.
  • the first process may be swapped into the onboard memory of the accelerator card according to the event of entering the active state.
  • the event that enters the active state can include various types of events, such as the event that the service is called, the event that the accelerator card has new resources and is released, and so on. When these events occur, the service in the frozen state can be activated, and the first process corresponding to the service can be swapped from the memory of the central processing unit to the onboard memory of the accelerator card.
  • an event that causes the service to enter an active state occurs in the accelerator card, and the event that enters the active state is: the accelerator card has new resources released.
  • swapping the first process corresponding to the service from the memory of the central processing unit (CPU) into the onboard memory of the acceleration card may include: determining the onboard memory Whether the remaining space of the memory is greater than or equal to the first virtual address space of the first process; if the remaining space of the onboard memory is greater than or equal to the first virtual address space of the first process, the The first process swaps the memory of the central processing unit CPU into the onboard memory of the acceleration card.
  • the size of the remaining space of the onboard memory is 3G after the new resources in the accelerator card are released, and the size of the first virtual address space of the first process is 2.8G, it means that the current accelerator card can support the service smoothly. run, the first process corresponding to the service can be swapped from the memory of the central processing unit to the onboard memory of the acceleration card and run.
  • the remaining space of the on-board memory is greater than or equal to the first virtual address space of the first process
  • after the remaining space of the on-board memory is less than that of the first process
  • the first virtual address space from other acceleration card processes pre-saved in the memory of the central processing unit CPU, select a first service that occupies a virtual address space less than or equal to the remaining space of the onboard memory. Two processes; swap the second process from the memory of the central processing unit CPU into the onboard memory of the acceleration card and run it.
  • the size of the remaining space of the onboard memory is 3G
  • the size of the first virtual address space of the first process is 3.7G.
  • Processes P1 and P2 corresponding to other services are stored, wherein the virtual address space occupied by P1 is 1G, and the virtual address space occupied by P2 is 3.3G.
  • P1 (1G is less than 3G) can be selected from the processes P1 and P2 pre-saved in the memory of the CPU as the second process. , swap the second process from the memory of the CPU into the onboard memory of the accelerator card and run it.
  • the accelerator card in the above-mentioned embodiment may not only include a physical accelerator card, but may also include any one of a plurality of virtual accelerator cards virtualized by the physical accelerator card.
  • the accelerator card may have the capability of virtualization, that is, by dividing the onboard memory, a physical accelerator card can be virtualized into several virtual accelerator cards for different containers/processes Use, in which each virtual accelerator card can multiplex multiple services, to achieve the goal of multiple uses for one card.
  • Node1 a node 1 with 4 GPUs (Node1) can only run 4 services (service).
  • each GPU can deploy multiple services. These services are usually in a dormant state, and data structures such as models are placed in the memory of the central processor. This state is called freezing. state.
  • the corresponding service can swap the corresponding process from the memory of the central processing unit to the onboard memory of the accelerator card, resume the execution of the process, and start processing the request. This state is called the active state. After the service processes the request, the process can swap out the onboard memory and re-enter the frozen state.
  • the specific process can be shown in Figure 4, Figure 5, and Figure 6.
  • a node has 1 NVIDIA 2080TI graphics card with 10G video memory, and two trainings requiring 8G video memory are run on the GPU at the same time. Without the onboard memory swapping in and out technique, at least one training session would fail due to insufficient memory being requested. If the on-board memory swap-in and swap-out technology is used, when a training application for video memory fails, it can swap itself out to the memory and suspend execution. After waiting for the other training to end and release the video memory, you can switch back from the memory to the video memory and resume execution. In this case, both trainings can run successfully.
  • acceleration card-based service running method provided by the embodiments of the present disclosure will be described in detail below through specific embodiments.
  • the acceleration card-based service running method may include:
  • the running rate of the first process is lower than the preset rate threshold, triggering an event that the service enters a frozen state.
  • S207 Determine that the remaining space of the onboard memory is greater than or equal to the first virtual address space of the first process, and swap the first process from the memory of the central processing unit CPU into the onboard memory of the accelerator card Memory.
  • the embodiments of the present disclosure further provide a service running device based on an accelerator card, which can effectively improve the utilization rate of the accelerator card while ensuring the service response speed of the accelerator card.
  • an apparatus for running a service based on an accelerator card provided by an embodiment of the present disclosure, where at least one service is deployed on the accelerator card, the apparatus may include:
  • the swap-out unit 31 is configured to, for each service in the at least one service, in response to an event that the service enters a frozen state, swap out a process corresponding to the service from the onboard memory of the accelerator card to the central processing unit CPU memory;
  • the swap-in unit 32 is configured to, for each service in the at least one service, in response to an event that the service enters an active state, swap in the process corresponding to the service from the memory of the CPU to the onboard of the acceleration card. Memory.
  • the accelerator card-based service running device can, for each service in at least one service deployed in the accelerator card, respond to an event that the service enters a frozen state, and delete the process corresponding to the service from the
  • the onboard memory of the accelerator card is swapped out to the memory of the central processing unit CPU, and the process corresponding to the service can also be swapped into the onboard memory of the accelerator card from the memory of the CPU in response to the event that the service enters the active state.
  • multiple services can be deployed in the accelerator card, and according to the events in the accelerator card, the service can be put into a frozen state, and the process corresponding to the service can be swapped out from the onboard memory to the memory of the processor, thereby releasing the The corresponding resources in the accelerator card, or the service is activated from the frozen state, and the process corresponding to the service is swapped in from the processor's memory to the onboard memory, thereby continuing to run the service.
  • the suspension or operation of each service can be flexibly controlled and scheduled according to various events, which effectively improves the utilization rate of the accelerator card while ensuring the response speed of the accelerator card service.
  • the swap-out unit 31 may be configured to:
  • the context information of the first process corresponding to the service and the location of the first process are retained in the onboard memory of the accelerator card.
  • the first virtual address space of the first process is swapped out to the memory of the central processing unit CPU to release the corresponding occupied resources in the onboard memory of the acceleration card.
  • the swap-in unit 32 may include:
  • the swap-in module is configured to, for each service in the at least one service, in response to an event that the service enters an active state, swap the first process corresponding to the service from the memory of the central processing unit (CPU); into the onboard memory of the accelerator card, and save it in the first virtual address space of the onboard memory;
  • a continuing operation module is configured to continue to run the first process according to the context information of the first process.
  • the event of entering the active state is: the accelerator card has new resources to be released;
  • the swap-in module may include:
  • a determination submodule configured to determine whether the remaining space of the onboard memory is greater than or equal to the first virtual address space of the first process
  • a swap-in submodule configured to replace the first process from the memory of the central processing unit (CPU) when the remaining space of the onboard memory is greater than or equal to the first virtual address space of the first process Swap in the onboard memory of the said accelerator card.
  • the swap-in module further includes:
  • the selection sub-module is configured to select one from other acceleration card processes pre-saved in the memory of the central processing unit CPU when the remaining space of the on-board memory is less than the first virtual address space of the first process the second process of other services occupying a virtual address space less than or equal to the remaining space of the onboard memory;
  • the swap-in submodule is further configured to swap the second process from the memory of the central processing unit CPU into the onboard memory of the acceleration card and run it.
  • the continuing operation module includes:
  • an identification submodule configured to identify the first process according to the context information
  • the continuing operation submodule is configured to continue to run the first process from a program breakpoint when the first process is swapped out of the onboard memory of the accelerator card according to the process state maintained by the context information.
  • the apparatus may further include:
  • the application unit is configured to apply for a segment of the first virtual address in the onboard memory of the acceleration card before reserving the context information of the first process corresponding to the service and the first virtual address space where the first process is located space as reserved space;
  • mapping unit configured to map the reserved space to the first process to obtain a first virtual address space of the first process
  • a running unit configured to run the first process in the first virtual address space.
  • the event of entering the freezing state includes at least one of the following:
  • the idle duration of the service exceeds a preset duration threshold
  • the running rate of the service is lower than a preset rate threshold
  • the operating priority of the service is below a preset priority threshold.
  • the event of entering the active state includes at least one of the following:
  • the accelerator card has new resources to be released.
  • the accelerator card is a physical accelerator card, or any one of a plurality of virtual accelerator cards virtualized by a physical accelerator card.
  • an embodiment of the present disclosure provides an electronic device, which can effectively improve the utilization rate of the accelerator card while ensuring the service response speed of the accelerator card.
  • an electronic device may include: a casing 41 , a processor 42 , a memory 43 , a circuit board 44 and a power supply circuit 45 , wherein the circuit board 44 is arranged in the casing 41 Inside the enclosed space, the processor 42 and the memory 43 are arranged on the circuit board 44; the power circuit 45 is configured to supply power to each circuit or device of the above-mentioned electronic equipment; the memory 43 is configured to store executable program codes; the processor 42 runs a program corresponding to the executable program code by reading the executable program code stored in the memory 43, and is configured to execute the acceleration card-based service running method described in any of the foregoing embodiments.
  • the electronic device exists in various forms, and may have a stand-alone or distributed computing structure, which is not limited in the present disclosure.
  • embodiments of the present disclosure further provide a computer-readable storage medium, where the computer-readable storage medium stores one or more programs, and the one or more programs can be executed by one or more processors , so as to implement any of the accelerator card-based service operation methods provided in the foregoing embodiments, and thus also achieve corresponding technical effects, which have been described in detail above, and will not be repeated here.
  • each unit/module may be implemented in one or more software, one or more hardware, and one or more software and hardware.
  • the computer-readable storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM), and the like.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

一种基于加速卡的服务运行方法、装置、电子设备及计算机可读存储介质,涉及计算机技术领域,能够在保证加速卡服务响应速度的同时,有效提高加速卡的利用率。所述加速卡上部署有至少一个服务,该方法包括:对于所述至少一个服务中的每个服务,响应于该服务进入冻结状态的事件,将该服务对应的进程从所述加速卡的板载内存换出到中央处理器CPU的内存(S11);以及,对于所述至少一个服务中的每个服务,响应于该服务进入激活状态的事件,将该服务对应的进程从CPU的内存换入到所述加速卡的板载内存(S12)。本方法可用于加速卡的服务运行中。

Description

一种基于加速卡的服务运行方法、装置、电子设备及计算机可读存储介质
本公开要求于2020年12月9日提交中国专利局、申请号为202011431859.X,发明名称为“一种基于加速卡的服务运行方法、装置、电子设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及计算机技术领域,尤其涉及一种基于加速卡的服务运行方法、装置、电子设备及计算机可读存储介质。
背景技术
随着大数据和人工智能技术的发展,很多服务都以大量而复杂的数据运算为基础,而CPU的运算能力却越来越难以满足计算要求。因此,在许多场景下,会借助加速卡进行运算加速。常见的加速卡例如可以包括GPU(Graphics Processing Unit,图形处理器)、TPU(tensor processing unit,张量处理器)、NPU(Neural-network Processing Unit,嵌入式神经网络处理器)、FPGA(Field Programmable Gate Array,现场可编程门阵列)等。
加速卡具有强大的计算功能,但却需要较长时间对其所执行的程序进行初始化,因此,为了保证程序能够对服务请求进行及时响应,加速卡上运行的程序往往一经初始化即长期在线保持服务监听状态,即便服务需求并不频繁。
举例而言,许多视频网站存在这样一种场景:有多个审核视频流的服务,每个所述审核视频流的服务需要根据传入的图片来判断某个视频流是否有违规的内容,若有则需要转交给人工处理。为了保证对审核请求的及时响应,每个服务都要独占一张加速卡。然而,该服务每天的请求数很低,可能一天只有几十个请求,从而造成大量的加速卡资源闲置。
针对上述如何在保证加速卡服务响应速度的同时,有效提高加速卡的利用率,相关领域尚无有效的解决方案。
发明内容
有鉴于此,本公开实施例提供一种基于加速卡的服务运行方法、装置、电子设备及计算机可读存储介质,能够在保证加速卡服务响应速度的同时,有效提高加速卡的利用率。
第一方面,本公开实施例提供一种基于加速卡的服务运行方法,其中,所述加速卡上部署有至少一个服务,该方法包括:对于所述至少一个服务中的每个服务,响应于该服务进入冻结状态的事件,将该服务对应的进程从所述加速卡的板载内存换出到中央处理器CPU的内存;以及,对于所述至少一个服务中的每个服务,响应于该服务进入激活状态的事件,将该服务对应的进程从CPU的内存换入到所述加速卡的板载内存。
第二方面,本公开的实施例还提供一种基于加速卡的服务运行装置,其中,所述加速卡上部署有至少一个服务,该装置包括:换出单元,用于对于所述至少一个服务中的每个服务,响应于该服务进入冻结状态的事件,将该服务对应的进程从所述加速卡的板载内存换出到中央处理器CPU的内存;以及,换入单元,用于对于所述至少一个服务中的每个服务,响应于该服务进入激活状态的事件,将该服务对应的进程从CPU的内存换入到所述加速卡的板载内存。
第三方面,本公开的实施例还提供一种电子设备,所述电子设备包括:壳体、处理器、存储器、电路板和电源电路,其中,电路板安置在壳体围成的空间内部,处理器和存储器设置在电路板上;电源电路,用于为上述电子设备的各个电路或器件供电;存储器用于存储可执行程序代码;处理器通过读取存储器中存储的可执行程序代码来运行与可执行程序代码对应的程序,用于执行本公开的实施例提供的任一种方法。
第四方面,本公开的实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现本公开的实施例提供的任一种方法。
本公开的实施例提供的基于加速卡的服务运行方法、装置、电子设备及计算机可读存储介质,对于加速卡中部署的至少一个服务中的每个服务,能够响应于该服务进入冻结状态的事件,将该服务对应的进程从所述加速卡的板载内存换出到中央处理器CPU的内存,也能够响应于该服务进入激活状态的事件,将该服务对应的进程从CPU的内存换入到所述加速卡的板载内存。这样,就能够在加速卡中部署多个服务,并根据加速卡中出现的事件,使该服务进入冻结状态,并使服务对应的进程从板载内存换出到处理器的内存,从而释放出加速卡中对应的资源,或者使该服务从冻结状态进入激活状态,并使服务对应的进程从处理器的内存换入到板载内存,从而继续运行该服 务。这样就能够根据各种不同的事件,灵活控制和调度每一项服务的暂停或运行,在保证加速卡服务响应速度的同时,有效提高了加速卡的利用率。
附图说明
为了更清楚地说明本公开实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本公开的实施例提供的基于加速卡的服务运行方法的一种流程图;
图2是本公开的实施例中加速卡中运行的进程所占用的加速卡的板载内存示意图;
图3是相关技术中没有使用板载内存换入换出技术的情况下的服务运行示意图;
图4为本公开的实施例中空闲服务的加速卡使用情况示意图;
图5为本公开的实施例中有服务被激活时的加速卡使用情况示意图;
图6为图5中被激活的服务执行完毕后的加速卡使用情况示意图;
图7为本公开的实施例提供的基于加速卡的服务运行方法的一种详细流程图;
图8为本公开的实施例提供的基于加速卡的服务运行装置的一种结构示意图;
图9为本公开的实施例提供的电子设备的一种结构示意图。
具体实施方式
下面结合附图对本公开实施例进行详细描述。
应当明确,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
如背景技术所言,为了弥补中央处理器计算能力的不足,在越来越多的场景下,会借助加速卡进行运算加速。加速卡虽具有强大的计算功能,但却需要较长时间对其所执行的程序进行初始化,因此,为了保证程序能够对服务请求进行及时响应,加速卡上运行的程序往往一经初始化即长期在线保持服务监听状态,即便服务需求并不频繁,从而使加速卡的利用率较低。
为了解决上述问题,发明人在研究中发现,可以利用中央处理器的内存与加速卡的板载内存,对加速卡中部署的服务进行冻结状态和激活状态的切换,从而在保证加 速卡服务响应速度的同时,有效提高加速卡的利用率。
为使本领域技术人员更好地理解本公开的实施例的技术构思、实施方案和有益技术效果,以下通过具体实施例进行详细说明。
第一方面,本公开实施例提供一种基于加速卡的服务运行方法,能够在保证加速卡服务响应速度的同时,有效提高加速卡的利用率。
如图1所示,本公开的实施例提供的基于加速卡的服务运行方法,其中,所述加速卡上部署有至少一个服务,该方法可以包括:
S11,对于所述至少一个服务中的每个服务,响应于该服务进入冻结状态的事件,将该服务对应的进程从所述加速卡的板载内存换出到中央处理器CPU的内存;
其中,加速卡可以是具有强大计算功能的各种芯片,例如可以为GPU、TPU、NPU、FPGA等。加速卡可以与中央处理器配合使用,实现对服务的及时响应。加速卡中设置有具有运算功能的处理器,也设置有用于存储运算数据和程序的板载内存。然而板载内存的存储空间一般会比较小,难以存储大量的数据和程序。
在本公开的实施例中,加速卡上可以部署有一个或多个服务,例如,在本公开的一个实施例中,加速卡上可以部署一个针对图像识别的模型训练服务、一个针对语音识别的模型训练服务、一个商品查询服务等。根据加速卡中资源的充裕程度的不同,部署在加速卡上的各服务既可以同时运行在加速卡中,也可以根据加速卡中出现的各种事件,在不同时机运行。
具体而言,本步骤中,当出现该服务进入冻结状态的事件时,响应于该事件,可以将该服务对应的进程从所述加速卡的板载内存换出到中央处理器CPU的内存,从而释放该服务对应的加速卡资源,以使其他服务可以利用该资源,因此有效提高了加速卡的资源利用率。
S12,对于所述至少一个服务中的每个服务,响应于该服务进入激活状态的事件,将该服务对应的进程从CPU的内存换入到所述加速卡的板载内存。
在一项服务进入冻结状态后,当出现该服务进入激活状态的事件时,响应于该事件,可以将该服务对应的进程从CPU的内存换入到加速卡的板载内存,从而继续运行该进程,以便及时响应该服务。
本公开的实施例提供的基于加速卡的服务运行方法,对于加速卡中部署的至少一 个服务中的每个服务,能够响应于该服务进入冻结状态的事件,将该服务对应的进程从所述加速卡的板载内存换出到中央处理器CPU的内存,也能够响应于该服务进入激活状态的事件,将该服务对应的进程从CPU的内存换入到所述加速卡的板载内存。这样,就能够在加速卡中部署多个服务,并根据加速卡中出现的事件,使该服务进入冻结状态,并使服务对应的进程从板载内存换出到处理器的内存,从而释放出加速卡中对应的资源,或者使该服务从冻结状态进入激活状态,并使服务对应的进程从处理器的内存换入到板载内存,从而继续运行该服务。这样就能够根据各种不同的事件,灵活控制和调度每一项服务的暂停或运行,在保证加速卡服务响应速度的同时,有效提高了加速卡的利用率。
上述技术可以应用在背景技术所述的视频网站场景,即许多个服务可以公用在一张加速卡上面,其中大部分服务都处于换出冻结的状态,当每个服务收到请求时,将其换入执行,完成之后再换出。这种方案极大的降低了所需要加速卡的数量,极大的节省了成本。
具体而言,在本公开的实施例中,各服务是否进入冻结状态,或是否进入激活状态,可以由各种事件来触发。在本公开的一个实施例中,对于每一项服务而言,进入冻结状态的事件可以包括以下一种或多种:所述服务的空闲时长超过预设时长阈值;所述服务的运行速率低于预设速率阈值;所述服务的运行优先级低于预设优先级阈值。
也即是说,如果一个服务在加速卡中的空闲时长过长,则说明已经有很长时间没有对该服务的请求,为了提高加速卡的利用率,可以根据该服务空闲时长过长的事件,使该服务进入冻结状态。或者,如果一个服务运行速率过低,则说明加速卡中的资源比较紧张,不足以支撑当前服务的有效运行,为了提高服务的执行效率,可以根据该服务运行速率过低的事件,使该服务进入冻结状态。或者,如果一个服务的运行优先级低于预设优先级,则说明当前该任务无需立即运行,可以暂时使该服务暂时进入冻结状态。在本公开的一个实施例中,服务的运行优先级还可以根据服务本身的性质以及加速卡中各任务的运行情况及时调整,使加速卡中的任务调度更加灵活。
与进入冻结状态的事件相反,在本公开的一个实施例中,对于每一项服务而言,进入激活状态的事件可以包括以下至少一项:服务被调用、加速卡有新的资源被释放。这样,当发生服务被调用的事件时,若该服务处于冻结状态,就可以及时将该服务激 活,及时响应服务请求。当加速卡有新的资源被释放时,也可以将处于冻结状态的服务激活,从而及时利用新释放的资源运行该服务。
当然,在本公开的其他实施例中,还可以包括其他可以使服务进入冻结状态的事件或使服务进入激活状态的事件,本公开的实施例对此不做限定。
基于上述事件,在本公开的一个实施例中,步骤S11中将该服务对应的进程从所述加速卡的板载内存换出到中央处理器CPU的内存可以包括:在所述加速卡的板载内存中保留该服务对应的第一进程的上下文信息和所述第一进程所在的第一虚拟地址空间,将所述第一进程换出到所述中央处理器CPU的内存,以释放所述加速卡的板载内存中对应的被占用资源。
在本公开的实施例中,加速卡中部署的每个服务,可以对应一个进程,即第一进程。为了保证该进程的顺利运行,还会记录有该第一进程的上下文信息以及该第一进程对应虚拟地址空间,即第一虚拟地址空间。
其中,上下文信息中包含了进程的当前运行环境配置等信息,可以用于加速卡驱动对于进程的识别,进程状态的维持等。由于仅仅记录了这些配置信息,上下文信息的文件通常较小,通常在200M至1G,因此不会占用过多板载内存。以模型训练服务为例,在本公开的一个实施例中,加速卡中运行的进程所占用的加速卡的板载内存可以如图2所示。
虚拟地址空间是操作系统针对上层应用使用内存地址的一种寻址方式。虚拟地址空间是指虚拟地址中的一段地址区间。通过操作系统的映射,能够将虚拟地址映射到对应的物理地址。需要说明的是,虚拟地址空间和真实的物理空间并不是一一对应的,操作系统会根据当前应用程序以及当前内存占用情况,更新虚拟地址与物理地址之间的对应关系。
具体而言,本公开的实施例中,保留第一进程对应的第一虚拟地址空间,也即是保留第一虚拟地址空间相对应的一段虚拟地址所对应的虚拟地址空间,例如保留了X0100~XF000这段虚拟地址空间,从而使该段虚拟地址空间不会被其他应用服务占用。然而该段虚拟地址空间对应的物理地址,将随着该第一进程被换出加速卡的板载内存,而得以释放。
为了能够顺利保留第一进程对应的第一虚拟地址空间,以便使第一进程的冻结和 激活性能更加稳定,在本公开的一个实施例中,在所述对于所述至少一个服务中的每个服务,响应于该服务进入冻结状态的事件,保留该服务对应的第一进程的上下文信息和所述第一进程所在的第一虚拟地址空间之前,本公开的实施例提供的基于加速卡的服务运行方法还可以包括:在所述加速卡的板载内存中,申请一段第一虚拟地址空间作为保留空间;将所述保留空间映射给所述第一进程,得到所述第一进程的第一虚拟地址空间;在所述第一虚拟地址空间运行所述第一进程。
示例性的,对于GPU类的加速卡而言,与传统的通过cuMemalloc接口分配的方式不同,本公开的实施例可以采用一种全新的以cuMemAddressReserve+cuMemMap接口,通过映射的方式申请加速卡的板载内存,从而可以使其保留虚拟地址空间不被占用,以便保证第一进程能够顺利从中央处理器的内存中重新换入到加速卡的板载内存并运行。不仅如此,本公开在换入换出内存时,所选择的内存段并不是通常进程申请内存时使用malloc方式申请的内存段,而是通过申请内存cuMemHostAlloc申请的与GPU共享页表的内存段,从而可以极大的降低换入换出的时间。整个过程,无论是换入还是换出都是秒级别的。
通过上述方法获得了第一进程对应的第一虚拟地址空间之后,当出现使服务进入冻结状态的事件时,可以保存服务所对应的第一进程的第一虚拟地址空间以及上下文信息,并将第一进程换出到中央处理器的内存。当出现了该服务进入激活状态的事件时,则响应于该事件,可以在步骤S12中将该服务对应的进程从CPU的内存换入到所述加速卡的板载内存。
在本公开的一个实施例中,将该服务对应的进程从CPU的内存换入到所述加速卡的板载内存可以包括:将该服务对应的第一进程,从中央处理器CPU的内存换入加速卡的板载内存,保存于所述板载内存的所述第一虚拟地址空间;根据所述第一进程的上下文信息,继续运行所述第一进程。也即是说,当该服务对应的第一进程被重新换入加速卡的板载内存之后,其使用的虚拟地址空间仍然为被换出加速卡的板载内存之前所使用的虚拟地址空间。被重新换入加速卡的板载内存后,可以根据保存的第一进程的上下文信息,识别该第一进程并继续运行该第一进程。
具体而言,在本公开的一个实施例中,根据所述第一进程的上下文信息,继续运行所述第一进程可以包括:根据所述上下文信息识别所述第一进程;根据所述上下文 信息维持的进程状态,从所述第一进程被换出所述加速卡的板载内存时的程序断点开始,继续运行所述第一进程。这样,第一进程就可以根据上下文信息快速进入运行状态,无需重新进行初始化,从而大大加快了进程运行效率和服务响应速度。
第一进程可以根据进入激活状态的事件而换入加速卡的板载内存。进入激活状态的事件可以包多种类型的事件,例如服务被调用的事件、加速卡有新的资源被释放的事件等。当有这些事件出现时,可以将处于冻结状态的服务激活,并将该服务对应的第一进程从中央处理器的内存换入加速卡的板载内存。
举例而言,在本公开的一个实施例中,加速卡中出现了使服务进入激活状态的事件,且该进入激活状态的事件为:加速卡有新的资源被释放。则,响应于该服务进入激活状态的事件,将该服务对应的所述第一进程,从所述中央处理器CPU的内存换入所述加速卡的板载内存可以包括:确定所述板载内存的剩余空间是否大于或等于所述第一进程的第一虚拟地址空间;在所述板载内存的剩余空间大于或等于所述第一进程的第一虚拟地址空间的情况下,将所述第一进程从所述中央处理器CPU的内存换入所述加速卡的板载内存。
例如,如果加速卡中新的资源被释放后,板载内存的剩余空间的大小为3G,第一进程的第一虚拟地址空间的大小为2.8G,说明当前加速卡可以支持该服务较顺畅的运行,则可以将该服务对应的第一进程从中央处理器的内存换入到加速卡的板载内存中并运行。
在本公开的另一个实施例中,确定所述板载内存的剩余空间是否大于或等于所述第一进程的第一虚拟地址空间之后,在板载内存的剩余空间小于所述第一进程的第一虚拟地址空间的情况下,可以从所述中央处理器CPU的内存预先保存的其他加速卡进程中,选择一个占用虚拟地址空间小于或等于所述板载内存的剩余空间的其他服务的第二进程;将所述第二进程从所述中央处理器CPU的内存换入所述加速卡的板载内存并运行。
例如,如果加速卡中新的资源被释放后,板载内存的剩余空间的大小为3G,第一进程的第一虚拟地址空间的大小为3.7G,与此同时,中央处理器的内存中还保存有其他服务对应的进程P1、P2,其中P1占据的虚拟地址空间为1G,P2占据的虚拟地址空间为3.3G。则,由于3.7G大于3G,说明当前加速卡无法支持第一进程较顺畅的运行, 因此可以从中央处理器的内存中预先保存的进程P1、P2中选择P1(1G小于3G)作为第二进程,将第二进程从中央处理器的内存换入到加速卡的板载内存中并运行。
在一实施方式中,当中央处理器的内存中存在多个这样的第二进程时,可以结合其他策略,例如进程运行优先级、进程存入中央处理器的内存的时间先后等,从多个第二进程中选择一个第二进程换入加速卡板载内存并运行。
在一实施方式中,上述实施例中加速卡不仅可以包括实体加速卡,也可以包括由实体加速卡虚拟出的多个虚拟加速卡中的任一个。
具体而言,在本公开的实施例中,加速卡可以具有虚拟化的能力,即通过切分板载内存,可以将一张实体加速卡虚拟成若干张虚拟加速卡,供不同的容器/进程使用,其中每个虚拟加速卡都可以复用多个服务,达到一卡多用再多用的目标。
以视频网站服务为例,如图3所示,在相关技术中没有使用板载内存换入换出技术的情况下,每个服务必须占用一个完整的加速卡,一个有4块GPU的节点1(Node1)只能运行4个服务(service)。
而当使用了板载内存换入换出技术时,每个GPU可以部署多个服务,这些服务平时进程都处于休眠状态,模型等数据结构放置在中央处理器的内存,这种状态称为冻结状态。当有请求到来时,对应的服务便可将对应的进程从中央处理器的内存换入加速卡的板载内存,恢复进程的执行,并开始处理请求,这种状态称为激活状态。服务处理完请求之后,进程便可以换出板载内存并重新进入冻结状态,具体过程可以如图4、图5、图6所示。由此可见,采用了板载内存换入换出技术后,只需要一块GPU即可满足需求,极大的降低了成本。如图4所示,使用板载内存换入换出技术后,空闲的服务不会占用板载内存。图5当有请求(query)来到时,可以临时激活对应的服务(假设为服务1(Service1)),将其进程(包括模型、数据等)从内存换入加速卡处理。在加速卡的板载内存有剩余的情况下,可以同时激活多个服务,若加速卡的板载内存不足,则可以排队等待处理。如图6所示,当服务处理完query时,继续将其换出到中央处理器的内存。
以GPU为例,某个节点有1张NVIDIA 2080TI显卡,显存为10G,分别在该GPU上同时运行两个需要8G显存的训练。若不使用板载内存换入换出技术,则至少会有一个训练会因为申请不到足够的显存而失败。而若使用板载内存换入换出技术的话,当 一个训练申请显存失败时,其可以将自己换出到内存,并暂停执行。等待另外一个训练结束并释放显存之后即可从内存换回显存并恢复执行,这样的话,两个训练都可以成功运行。
下面通过具体的实施例对本公开的实施例提供的基于加速卡的服务运行方法进行详细说明。
如图7所示,本公开的实施例提供的基于加速卡的服务运行方法可以包括:
S201、在加速卡的板载内存中,申请一段第一虚拟地址空间作为保留空间。
S202、将所述保留空间映射给该服务对应的第一进程,得到所述第一进程的第一虚拟地址空间。
S203、在所述第一虚拟地址空间运行所述第一进程。
S204、第一进程的运行速率低于预设速率阈值,触发该服务进入冻结状态的事件。
S205、根据该进入冻结状态的事件,在所述加速卡的板载内存中保留该服务对应的第一进程的上下文信息和所述第一进程所在的第一虚拟地址空间,将所述第一进程换出到所述中央处理器CPU的内存,以释放所述加速卡的板载内存中对应的被占用资源。
S206、加速卡中有新的资源被释放,触发该服务进入激活状态的事件。
S207、确定所述板载内存的剩余空间大于或等于所述第一进程的第一虚拟地址空间,将所述第一进程从所述中央处理器CPU的内存换入所述加速卡的板载内存。
S208、根据所述上下文信息识别所述第一进程。
S209、根据所述上下文信息维持的进程状态,从所述第一进程被换出所述加速卡的板载内存时的程序断点开始,继续运行所述第一进程。
第二方面,本公开的实施例还提供一种基于加速卡的服务运行装置,能够在保证加速卡服务响应速度的同时,有效提高加速卡的利用率。
如图8所示,本公开的实施例提供的基于加速卡的服务运行装置,所述加速卡上部署有至少一个服务,该装置可以包括:
换出单元31,被配置为对于所述至少一个服务中的每个服务,响应于该服务进入冻结状态的事件,将该服务对应的进程从所述加速卡的板载内存换出到中央处理器CPU的内存;
以及,
换入单元32,被配置为对于所述至少一个服务中的每个服务,响应于该服务进入激活状态的事件,将该服务对应的进程从CPU的内存换入到所述加速卡的板载内存。
本公开的实施例提供的基于加速卡的服务运行装置,对于加速卡中部署的至少一个服务中的每个服务,能够响应于该服务进入冻结状态的事件,将该服务对应的进程从所述加速卡的板载内存换出到中央处理器CPU的内存,也能够响应于该服务进入激活状态的事件,将该服务对应的进程从CPU的内存换入到所述加速卡的板载内存。这样,就能够在加速卡中部署多个服务,并根据加速卡中出现的事件,使该服务进入冻结状态,并使服务对应的进程从板载内存换出到处理器的内存,从而释放出加速卡中对应的资源,或者使该服务从冻结状态进入激活状态,并使服务对应的进程从处理器的内存换入到板载内存,从而继续运行该服务。这样就能够根据各种不同的事件,灵活控制和调度每一项服务的暂停或运行,在保证加速卡服务响应速度的同时,有效提高了加速卡的利用率。
在一实施方式中,换出单元31可以被配置为:
对于所述至少一个服务中的每个服务,响应于该服务进入冻结状态的事件,在所述加速卡的板载内存中保留该服务对应的第一进程的上下文信息和所述第一进程所在的第一虚拟地址空间,将所述第一进程换出到所述中央处理器CPU的内存,以释放所述加速卡的板载内存中对应的被占用资源。
在一实施方式中,换入单元32可以包括:
换入模块,被配置为对于所述至少一个服务中的每个服务,响应于该服务进入激活状态的事件,将该服务对应的所述第一进程,从所述中央处理器CPU的内存换入所述加速卡的板载内存,保存于所述板载内存的所述第一虚拟地址空间;
继续运行模块,被配置为根据所述第一进程的上下文信息,继续运行所述第一进程。
在一实施方式中,所述进入激活状态的事件为:所述加速卡有新的资源被释放;
所述换入模块可以包括:
确定子模块,被配置为确定所述板载内存的剩余空间是否大于或等于所述第一进程的第一虚拟地址空间;
换入子模块,被配置为在所述板载内存的剩余空间大于或等于所述第一进程的第一虚拟地址空间的情况下,将所述第一进程从所述中央处理器CPU的内存换入所述加速卡的板载内存。
在一实施方式中,所述换入模块还包括:
选择子模块,被配置为在板载内存的剩余空间小于所述第一进程的第一虚拟地址空间的情况下,从所述中央处理器CPU的内存预先保存的其他加速卡进程中,选择一个占用虚拟地址空间小于或等于所述板载内存的剩余空间的其他服务的第二进程;
所述换入子模块,还被配置为将所述第二进程从所述中央处理器CPU的内存换入所述加速卡的板载内存并运行。
在一实施方式中,所述继续运行模块包括:
识别子模块,被配置为根据所述上下文信息识别所述第一进程;
继续运行子模块,被配置为根据所述上下文信息维持的进程状态,从所述第一进程被换出所述加速卡的板载内存时的程序断点开始,继续运行所述第一进程。
在一实施方式中,所述装置还可以包括:
申请单元,被配置为在保留该服务对应的第一进程的上下文信息和所述第一进程所在的第一虚拟地址空间之前,在所述加速卡的板载内存中,申请一段第一虚拟地址空间作为保留空间;
映射单元,被配置为将所述保留空间映射给所述第一进程,得到所述第一进程的第一虚拟地址空间;
运行单元,被配置为在所述第一虚拟地址空间运行所述第一进程。
在一实施方式中,所述进入冻结状态的事件包括以下至少一种:
所述服务的空闲时长超过预设时长阈值;
所述服务的运行速率低于预设速率阈值;
所述服务的运行优先级低于预设优先级阈值。
在一实施方式中,所述进入激活状态的事件包括以下至少一种:
所述服务被调用;
所述加速卡有新的资源被释放。
在一实施方式中,所述加速卡为实体加速卡,或者由实体加速卡虚拟出的多个虚 拟加速卡中的任一个。
以上图1示出的具体操作可由图8的基于加速卡的服务运行装置中的各个单元来执行,这里,对于具体操作细节将不再赘述。
第三方面,相应的,本公开实施例提供一种电子设备,能够在保证加速卡服务响应速度的同时,有效提高加速卡的利用率。
如图9所示,本公开的实施例提供的一种电子设备,可以包括:壳体41、处理器42、存储器43、电路板44和电源电路45,其中,电路板44安置在壳体41围成的空间内部,处理器42和存储器43设置在电路板44上;电源电路45,被配置为为上述电子设备的各个电路或器件供电;存储器43被配置为存储可执行程序代码;处理器42通过读取存储器43中存储的可执行程序代码来运行与可执行程序代码对应的程序,被配置为执行前述任一实施例所述的基于加速卡的服务运行方法。
处理器42对上述步骤的具体执行过程以及处理器42通过运行可执行程序代码来进一步执行的步骤,可以参见前述实施例的描述,在此不再赘述。
该电子设备以多种形式存在,可具有单机或分布式的运算结构,本公开对此不作限制。
第四方面,本公开的实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现前述实施例提供的任一种基于加速卡的服务运行方法,因此也能实现相应的技术效果,前文已经进行了详细说明,此处不再赘述。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部 分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。
尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
为了描述的方便,描述以上装置是以功能分为各种单元/模块分别描述。当然,在实施本公开时可以把各单元/模块的功能在同一个或多个软件、同一个或多个硬件、同一个或多个软件和硬件中实现。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的计算机可读存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以权利要求的保护范围为准。

Claims (22)

  1. 一种基于加速卡的服务运行方法,其中,所述加速卡上部署有至少一个服务,该方法包括:
    对于所述至少一个服务中的每个服务,响应于该服务进入冻结状态的事件,将该服务对应的进程从所述加速卡的板载内存换出到中央处理器CPU的内存;
    以及,对于所述至少一个服务中的每个服务,响应于该服务进入激活状态的事件,将该服务对应的进程从CPU的内存换入到所述加速卡的板载内存。
  2. 根据权利要求1所述的方法,其中,所述将该服务对应的进程从所述加速卡的板载内存换出到中央处理器CPU的内存包括:
    在所述加速卡的板载内存中保留该服务对应的第一进程的上下文信息和所述第一进程所在的第一虚拟地址空间,将所述第一进程换出到所述中央处理器CPU的内存,以释放所述加速卡的板载内存中对应的被占用资源。
  3. 根据权利要求2所述的方法,其中,所述将该服务对应的进程从CPU的内存换入到所述加速卡的板载内存包括:
    将该服务对应的所述第一进程,从所述中央处理器CPU的内存换入所述加速卡的板载内存,保存于所述板载内存的所述第一虚拟地址空间;
    根据所述第一进程的上下文信息,继续运行所述第一进程。
  4. 根据权利要求3所述的方法,其中,所述进入激活状态的事件为:所述加速卡有新的资源被释放;
    所述响应于该服务进入激活状态的事件,将该服务对应的所述第一进程,从所述中央处理器CPU的内存换入所述加速卡的板载内存包括:确定所述板载内存的剩余空间是否大于或等于所述第一进程的第一虚拟地址空间;在所述板载内存的剩余空间大于或等于所述第一进程的第一虚拟地址空间的情况下,将所述第一进程从所述中央处理器CPU的内存换入所述加速卡的板载内存。
  5. 根据权利要求4所述的方法,其中,所述确定所述板载内存的剩余空间是否大于或等于所述第一进程的第一虚拟地址空间之后,所述方法还包括:
    在板载内存的剩余空间小于所述第一进程的第一虚拟地址空间的情况下,从所述中央处理器CPU的内存预先保存的其他加速卡进程中,选择一个占用虚拟地址空间小 于或等于所述板载内存的剩余空间的其他服务的第二进程;
    将所述第二进程从所述中央处理器CPU的内存换入所述加速卡的板载内存并运行。
  6. 根据权利要求3所述的方法,其中,所述根据所述第一进程的上下文信息,继续运行所述第一进程包括:
    根据所述上下文信息识别所述第一进程;
    根据所述上下文信息维持的进程状态,从所述第一进程被换出所述加速卡的板载内存时的程序断点开始,继续运行所述第一进程。
  7. 根据权利要求2所述的方法,其中,所述对于所述至少一个服务中的每个服务,响应于该服务进入冻结状态的事件,保留该服务对应的第一进程的上下文信息和所述第一进程所在的第一虚拟地址空间之前,所述方法还包括:
    在所述加速卡的板载内存中,申请一段第一虚拟地址空间作为保留空间;
    将所述保留空间映射给所述第一进程,得到所述第一进程的第一虚拟地址空间;
    在所述第一虚拟地址空间运行所述第一进程。
  8. 根据权利要求1所述的方法,其中,所述进入冻结状态的事件包括以下至少一种:
    所述服务的空闲时长超过预设时长阈值;
    所述服务的运行速率低于预设速率阈值;
    所述服务的运行优先级低于预设优先级阈值。
  9. 根据权利要求1所述的方法,其中,所述进入激活状态的事件包括以下至少一种:
    所述服务被调用;
    所述加速卡有新的资源被释放。
  10. 根据权利要求1至9中任一项所述的方法,其中,所述加速卡为实体加速卡,或者由实体加速卡虚拟出的多个虚拟加速卡中的任一个。
  11. 一种基于加速卡的服务运行装置,其中,所述加速卡上部署有至少一个服务,该装置包括:
    换出单元,被配置为对于所述至少一个服务中的每个服务,响应于该服务进入冻 结状态的事件,将该服务对应的进程从所述加速卡的板载内存换出到中央处理器CPU的内存;
    以及,
    换入单元,被配置为对于所述至少一个服务中的每个服务,响应于该服务进入激活状态的事件,将该服务对应的进程从CPU的内存换入到所述加速卡的板载内存。
  12. 根据权利要求11所述的装置,其中,所述换出单元被配置为:
    对于所述至少一个服务中的每个服务,响应于该服务进入冻结状态的事件,在所述加速卡的板载内存中保留该服务对应的第一进程的上下文信息和所述第一进程所在的第一虚拟地址空间,将所述第一进程换出到所述中央处理器CPU的内存,以释放所述加速卡的板载内存中对应的被占用资源。
  13. 根据权利要求12所述的装置,其中,所述换入单元包括:
    换入模块,被配置为对于所述至少一个服务中的每个服务,响应于该服务进入激活状态的事件,将该服务对应的所述第一进程,从所述中央处理器CPU的内存换入所述加速卡的板载内存,保存于所述板载内存的所述第一虚拟地址空间;
    继续运行模块,被配置为根据所述第一进程的上下文信息,继续运行所述第一进程。
  14. 根据权利要求13所述的装置,其中,所述进入激活状态的事件为:所述加速卡有新的资源被释放;
    所述换入模块包括:
    确定子模块,被配置为确定所述板载内存的剩余空间是否大于或等于所述第一进程的第一虚拟地址空间;
    换入子模块,被配置为在所述板载内存的剩余空间大于或等于所述第一进程的第一虚拟地址空间的情况下,将所述第一进程从所述中央处理器CPU的内存换入所述加速卡的板载内存。
  15. 根据权利要求14所述的装置,其中,所述换入模块还包括:
    选择子模块,被配置为在板载内存的剩余空间小于所述第一进程的第一虚拟地址空间的情况下,从所述中央处理器CPU的内存预先保存的其他加速卡进程中,选择一个占用虚拟地址空间小于或等于所述板载内存的剩余空间的其他服务的第二进程;
    所述换入子模块,还被配置为将所述第二进程从所述中央处理器CPU的内存换入所述加速卡的板载内存并运行。
  16. 根据权利要求13所述的装置,其中,所述继续运行模块包括:
    识别子模块,被配置为根据所述上下文信息识别所述第一进程;
    继续运行子模块,被配置为根据所述上下文信息维持的进程状态,从所述第一进程被换出所述加速卡的板载内存时的程序断点开始,继续运行所述第一进程。
  17. 根据权利要求12所述的装置,其中,还包括:
    申请单元,被配置为在保留该服务对应的第一进程的上下文信息和所述第一进程所在的第一虚拟地址空间之前,在所述加速卡的板载内存中,申请一段第一虚拟地址空间作为保留空间;
    映射单元,被配置为将所述保留空间映射给所述第一进程,得到所述第一进程的第一虚拟地址空间;
    运行单元,被配置为在所述第一虚拟地址空间运行所述第一进程。
  18. 根据权利要求11所述的装置,其中,所述进入冻结状态的事件包括以下至少一种:
    所述服务的空闲时长超过预设时长阈值;
    所述服务的运行速率低于预设速率阈值;
    所述服务的运行优先级低于预设优先级阈值。
  19. 根据权利要求11所述的装置,其中,所述进入激活状态的事件包括以下至少一种:
    所述服务被调用;
    所述加速卡有新的资源被释放。
  20. 根据权利要求11至19中任一项所述的装置,其中,所述加速卡为实体加速卡,或者由实体加速卡虚拟出的多个虚拟加速卡中的任一个。
  21. 一种电子设备,其中,所述电子设备包括:壳体、处理器、存储器、电路板和电源电路,其中,电路板安置在壳体围成的空间内部,处理器和存储器设置在电路板上;电源电路,被配置为为上述电子设备的各个电路或器件供电;存储器被配置为存储可执行程序代码;处理器通过读取存储器中存储的可执行程序代码来运行与可执 行程序代码对应的程序,被配置为执行前述权利要求1至10任一项所述的方法。
  22. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有一个或者多个程序,所述一个或者多个程序可被一个或者多个处理器执行,以实现前述权利要求1至10中任一项所述的方法。
PCT/CN2021/135879 2020-12-09 2021-12-06 一种基于加速卡的服务运行方法、装置、电子设备及计算机可读存储介质 WO2022121866A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011431859.X 2020-12-09
CN202011431859.XA CN112598565B (zh) 2020-12-09 2020-12-09 一种基于加速卡的服务运行方法、装置、电子设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022121866A1 true WO2022121866A1 (zh) 2022-06-16

Family

ID=75191371

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/135879 WO2022121866A1 (zh) 2020-12-09 2021-12-06 一种基于加速卡的服务运行方法、装置、电子设备及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN112598565B (zh)
WO (1) WO2022121866A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024060710A1 (zh) * 2022-09-20 2024-03-28 华为技术有限公司 一种页面换入方法以及装置
CN117807118A (zh) * 2023-12-05 2024-04-02 中科驭数(北京)科技有限公司 数据聚合处理方法、装置、设备及存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112598565B (zh) * 2020-12-09 2024-05-28 第四范式(北京)技术有限公司 一种基于加速卡的服务运行方法、装置、电子设备及存储介质
CN113721990A (zh) * 2021-07-20 2021-11-30 北京比特大陆科技有限公司 数据处理方法、数据处理设备、加速卡和存储介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140257736A1 (en) * 2013-03-07 2014-09-11 International Business Machines Corporation Implementing automated memory address recording in constrained random test generation for verification of processor hardware designs
CN104732164A (zh) * 2013-12-18 2015-06-24 国家计算机网络与信息安全管理中心 一种提高ssl数据处理速度的装置及其方法
US20170083434A1 (en) * 2015-09-23 2017-03-23 Hanan Potash Computer processor with operand/variable-mapped namespace
CN110764799A (zh) * 2019-09-27 2020-02-07 苏州浪潮智能科技有限公司 一种优化远程更新fpga加速卡的方法、设备及介质
CN110781129A (zh) * 2019-09-12 2020-02-11 苏州浪潮智能科技有限公司 一种fpga异构加速卡集群中的资源调度方法、设备及介质
CN111813713A (zh) * 2020-09-08 2020-10-23 苏州浪潮智能科技有限公司 数据加速运算处理方法、装置及计算机可读存储介质
CN111833232A (zh) * 2019-04-18 2020-10-27 杭州海康威视数字技术股份有限公司 一种图像处理装置
CN112598565A (zh) * 2020-12-09 2021-04-02 第四范式(北京)技术有限公司 一种基于加速卡的服务运行方法、装置、电子设备及存储介质

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140257736A1 (en) * 2013-03-07 2014-09-11 International Business Machines Corporation Implementing automated memory address recording in constrained random test generation for verification of processor hardware designs
CN104732164A (zh) * 2013-12-18 2015-06-24 国家计算机网络与信息安全管理中心 一种提高ssl数据处理速度的装置及其方法
US20170083434A1 (en) * 2015-09-23 2017-03-23 Hanan Potash Computer processor with operand/variable-mapped namespace
CN111833232A (zh) * 2019-04-18 2020-10-27 杭州海康威视数字技术股份有限公司 一种图像处理装置
CN110781129A (zh) * 2019-09-12 2020-02-11 苏州浪潮智能科技有限公司 一种fpga异构加速卡集群中的资源调度方法、设备及介质
CN110764799A (zh) * 2019-09-27 2020-02-07 苏州浪潮智能科技有限公司 一种优化远程更新fpga加速卡的方法、设备及介质
CN111813713A (zh) * 2020-09-08 2020-10-23 苏州浪潮智能科技有限公司 数据加速运算处理方法、装置及计算机可读存储介质
CN112598565A (zh) * 2020-12-09 2021-04-02 第四范式(北京)技术有限公司 一种基于加速卡的服务运行方法、装置、电子设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024060710A1 (zh) * 2022-09-20 2024-03-28 华为技术有限公司 一种页面换入方法以及装置
CN117807118A (zh) * 2023-12-05 2024-04-02 中科驭数(北京)科技有限公司 数据聚合处理方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN112598565B (zh) 2024-05-28
CN112598565A (zh) 2021-04-02

Similar Documents

Publication Publication Date Title
WO2022121866A1 (zh) 一种基于加速卡的服务运行方法、装置、电子设备及计算机可读存储介质
US10104008B1 (en) Allocating processor resources based on a task identifier
US8151275B2 (en) Accessing copy information of MMIO register by guest OS in both active and inactive state of a designated logical processor corresponding to the guest OS
US9158362B2 (en) System and method for power reduction by sequestering at least one device or partition in a platform from operating system access
CN105579961B (zh) 数据处理系统及操作方法、用于数据处理系统的硬件单元
US7421533B2 (en) Method to manage memory in a platform with virtual machines
US8166288B2 (en) Managing requests of operating systems executing in virtual machines
US20150293709A1 (en) Fine-grained bandwidth provisioning in a memory controller
US11281388B2 (en) Method for managing a multi-system shared memory, electronic device and non-volatile computer-readable storage medium
US11360884B2 (en) Reserved memory in memory management system
KR20070057692A (ko) 정보 처리 장치, 프로세스 제어 방법, 및 컴퓨터 프로그램
US20110202918A1 (en) Virtualization apparatus for providing a transactional input/output interface
US11467870B2 (en) VMID as a GPU task container for virtualization
CN112306669A (zh) 一种基于多核系统的任务处理方法及装置
US8751724B2 (en) Dynamic memory reconfiguration to delay performance overhead
CN114168271A (zh) 一种任务调度方法、电子设备及存储介质
US20200201691A1 (en) Enhanced message control banks
CN116578416B (zh) 一种基于gpu虚拟化的信号级仿真加速方法
US20210373975A1 (en) Workgroup synchronization and processing
CN117270987A (zh) 应用启动方法、装置、电子设备及计算机可读存储介质
US8424013B1 (en) Methods and systems for handling interrupts across software instances and context switching between instances having interrupt service routine registered to handle the interrupt
US8533696B1 (en) Methods and systems for allocating hardware resources to instances of software images
US10051087B2 (en) Dynamic cache-efficient event suppression for network function virtualization
US20090241111A1 (en) Recording medium having instruction log acquiring program recorded therein and virtual computer system
JPH09319653A (ja) 情報処理装置、情報処理システム及びその制御方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21902571

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21902571

Country of ref document: EP

Kind code of ref document: A1