CN115292051B - Hot migration method, device and application of GPU (graphics processing Unit) resource POD (POD) - Google Patents

Hot migration method, device and application of GPU (graphics processing Unit) resource POD (POD) Download PDF

Info

Publication number
CN115292051B
CN115292051B CN202211169473.5A CN202211169473A CN115292051B CN 115292051 B CN115292051 B CN 115292051B CN 202211169473 A CN202211169473 A CN 202211169473A CN 115292051 B CN115292051 B CN 115292051B
Authority
CN
China
Prior art keywords
migration
pod
gpu
receiving
memory block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211169473.5A
Other languages
Chinese (zh)
Other versions
CN115292051A (en
Inventor
毛云青
来佳飞
彭大蒙
王勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCI China Co Ltd
Original Assignee
CCI China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CCI China Co Ltd filed Critical CCI China Co Ltd
Priority to CN202211169473.5A priority Critical patent/CN115292051B/en
Publication of CN115292051A publication Critical patent/CN115292051A/en
Application granted granted Critical
Publication of CN115292051B publication Critical patent/CN115292051B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Abstract

The application provides a hot migration method, a hot migration device and an application of a GPU resource POD, and the method comprises the following steps: the GPU client sends a migration request; after receiving the migration request, the migration component in the migration POD triggers the metadata cache index to be sent to a migration receiving component of the receiving POD; after receiving the migration request, the migration component in the migration POD transmits the high-frequency memory block and the low-frequency memory block to the migration receiving component; after receiving the migration request, the GPU server side adjusts the task priority; and carrying out hash verification on the received file by the receiving POD, and confirming the stability of the GPU with the GPU server. According to the scheme, the problem of GPU container mirror image transmission is solved, the CPU state is quickly copied, and GPU task scheduling is kept, so that the purpose of quick hot migration of algorithm container cloud GPU resources POD is achieved.

Description

Hot migration method, device and application of GPU (graphics processing Unit) resource POD (POD)
Technical Field
The application relates to the technical field of algorithm container cloud platforms, in particular to a thermal migration method, a thermal migration device and application of a GPU resource container.
Background
With the development of cloud computing technology, the Docker and kubernet technologies have become standards for application delivery of numerous enterprises by virtue of the advantages of standardized operating environment, rapid deployment of operation and maintenance, flexible allocation as required, and the like. Meanwhile, with the generalization of the GPU and the increasing use of the GPU in the deep learning field, the algorithm container cloud combining the GPU and the deep learning field is generated, the GPU resources can be rapidly provided, the delivery of standardized algorithm application is achieved, and the method plays an important role in the fields of graphic image rendering, parallel computing and artificial intelligence.
Algorithm operation in the deep learning field often has the characteristics of complex task and long time consumption, when POD (container group) needs scheduling adjustment due to resource adjustment, work priority adjustment or faults and the like, resetting of an operation state can cause interruption of algorithm operation, cause the computing process of the algorithm to be short of one step, and simultaneously, flexible scheduling collocation of an algorithm container can not be realized due to the reason that the state of the container can not be maintained, so that the utilization rate of a container cloud platform is reduced.
In an algorithm container cloud cluster, a Pod is the basis of all service types and is a combination of one or more containers, for a specific application, the Pod is a logic host of the Pod, the Pod contains a plurality of algorithm containers related to services, the containers are different from the cloud host, the containers are process-based and cannot be separated and migrated as easily as kvm cloud hosts, in a native kubernets container cloud platform, migration is often completed in a mode that a new Pod is started through a yaml configuration file and an old Pod is destroyed, and for a newly generated Pod, an original operation state is often not available.
In some open source projects, there is a way to migrate the CPU through CRIU (i.e., checkpoint/Restore In Userspace): related researches in the prior art focus on how to maintain and restore the state of a memory CPU, and various CPU optimization modes for compressing memory pages, optimizing network transmission and reducing repeated copying exist, but the schemes are only suitable for CPU resource containers.
An optimization scheme of mirror images also exists in the CPU container migration method, one scheme is preloading, mirror images are preloaded on each server of a cluster, and therefore the time of mirror image transmission is shortened, and the other scheme is that the transmission of repeated overlay is shortened by judging the difference between a mirror image warehouse and a local area in a layering mode according to the overlay layering principle of container mirror images.
The CPU, i.e., the central processing unit, is an operation and Control core of the computer system, and includes 25% of an ALU (arithmetic unit), 25% of a Control (Control unit), and 50% of a Cache (Cache unit), and the GPU, i.e., the graphics processor, includes 90% of an ALU (arithmetic unit), 5% of a Control (Control unit), and 5% of a Cache (Cache unit). Because the CPU resource container and the GPU resource container are greatly different, the hot migration method applicable to the CPU resource container in the market at present cannot be directly applied to the GPU resource container, in other words, both the above-mentioned CPU solutions are not applicable to the GPU container, because of the large volume of the GPU container mirror image, preloading is a great waste of resources, and some container configuration options are set as imagepull policies (i.e., mirror images are still pulled), preloading cannot take effect, and cannot solve synchronization of UpperDir (the top layer), while the second overlay is layered, and in the case of a large base number, reducing the meaning of individual repetition layers is not particularly large, and the synchronization problem of UpperDir also exists.
Specifically, the mirror size of the GPU container is much larger than that of a general mirror, taking a tensorflow mirror in deep learning as an example, the official mirror size of the GPU container on dockerhaub is 3.2G, the CPU common mirror centros is only 234M, the former is 14 times that of the latter, and the CPU common mirror is a native mirror of one container in POD, and considering the operations of the CPU and GPU resources of the algorithmic container group, status recording, and the like, the transmission capacity of the GPU container is far better than that of a pure CPU container, and if a general mirror transmission scheme is adopted, the delayed interruption of the container group will be in the order of minutes, even in the order of hours, and is unacceptable in a production environment.
Meanwhile, the GPU container needs to consider the CPU state and the GPU state, and in the algorithm cloud platform, one Pod often contains multiple algorithms, which are respectively deployed in respective containers to form a Pod of an algorithm cluster, and because of the high cost of GPU resources, one GPU often shares the GPU to be used by different algorithm groups, and the complexity thereof makes the current Pod live migration scheme of CPU resources unable to be translated to the Pod of GPU resources for use.
In summary, the existing CPU live migration scheme is not applicable to the GPU resource container, and when the existing CPU live migration scheme is applied to the GPU resource container, the problems of the transmission of the GPU container image file, the fast copy of the CPU state, and the maintenance of the GPU task scheduling are present.
Disclosure of Invention
The embodiment of the application provides a method, a device and an application for hot migration of GPU resource POD, solves the problem of GPU container mirror image transmission, achieves CPU state fast copying, keeps GPU task scheduling, and achieves algorithm container cloud GPU resource POD fast hot migration.
In a first aspect, an embodiment of the present application provides a method for hot migration and release of a GPU resource POD, where the method includes:
the method comprises the steps that a GPU client sends a migration request to a migration POD and a GPU server which need to be subjected to thermal migration, and a control server of an algorithm container cloud platform confirms to receive the POD through the migration request;
after receiving the migration request, the migration component in the migration POD checks the metadata tree-shaped graph index of the mirror image file in the migration POD to obtain a metadata index cache, receives the migration receiving component of the POD with the metadata cache index, establishes a file system in the receiving POD according to the metadata cache index, and places data corresponding to the metadata cache index into the file system;
the method comprises the steps of collecting process tree information in a migration POD, adding an addressing code in each process of the process tree information, copying the addressing code to an address space of a corresponding memory when the corresponding process calls the memory, recording the change frequency of each memory in real time, dividing the memory into a high-frequency memory block and a low-frequency memory block according to the change frequency, and transmitting the high-frequency memory block and the low-frequency memory block to a migration receiving assembly as memory state files.
In a second aspect, an embodiment of the present application provides a device for migrating a GPU resource POD, including:
a request module: the method comprises the steps that a GPU client sends a migration request to a migration POD and a GPU server which need to be subjected to thermal migration, and a control server of an algorithm container cloud platform confirms to receive the POD through the migration request;
a file migration module: after receiving the migration request, the migration component in the migration POD checks the metadata tree-shaped graph index of the mirror image file in the migration POD to obtain a metadata index cache, receives the migration receiving component of the POD with the metadata cache index, establishes a file system in the receiving POD according to the metadata cache index, and places data corresponding to the metadata cache index into the file system;
a memory migration module: the method comprises the steps of collecting process tree information in a migration POD, adding an addressing code in each process of the process tree information, copying the addressing code to an address space of a corresponding memory when the corresponding process calls the memory, recording the change frequency of each memory in real time, dividing the memory into a high-frequency memory block and a low-frequency memory block according to the change frequency, and transmitting the high-frequency memory block and the low-frequency memory block to a migration receiving assembly as memory state files.
In a third aspect, an embodiment of the present application provides an electronic apparatus, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the computer program to perform a method for hot migration of a GPU resource POD.
In a fourth aspect, embodiments of the present application provide a readable storage medium having stored therein a computer program comprising program code for controlling a process to execute a process, the process comprising a method for hot migration of GPU resource PODs.
The main contributions and innovation points of the invention are as follows:
the embodiment of the application introduces concepts of metadata and data, cuts a storage layer into blocks with fixed size according to the metadata, generates a flattened metadata tree diagram index, and a receiving container can call and migrate an image file system of POD (POD) through a network according to the metadata tree diagram index without being started after all image files are downloaded like a traditional container. And downloading the complete image file of the migrated POD in the background, and replacing the metadata tree graph index after the complete image file is downloaded, so that the time for transmitting the large-volume image network is saved, and the problem of image transmission of the GPU container is solved.
In the embodiment of the application, the memory blocks are subdivided, the hash value is confirmed for each memory block, the memory blocks are distinguished from the high-frequency memory block and the low-frequency memory block at a certain frequency, the low-frequency memory block is transmitted firstly, then the high-frequency memory block is transmitted, the low-frequency memory block is merged and compressed, and the high-frequency memory block is transmitted in the form of the low-frequency memory block after the frequency of the high-frequency memory block is reduced or transmitted after the time step is prolonged, so that the transmission overhead is reduced, the stability of the memory is enhanced, and the effect of quickly copying the GPU state is achieved;
according to the method and the device, the migration request of the migration POD for the GPU resource POD is sent to the GPU server through the GPU client, the GPU server returns to the migration structure, the loaded GPU state is changed into the state that only the GPU state needs to be kept on the GPU server and is linked to the receiving POD again, coupling of the CPU and the GPU is reduced, and therefore the GPU state is kept.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1 is a flowchart of a method for hot migration of GPU resources POD according to an embodiment of the present application;
fig. 2 is a block diagram illustrating a hot migration apparatus of a GPU resource POD according to an embodiment of the present application;
fig. 3 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims which follow.
It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described herein. In some other embodiments, the method may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.
For better understanding of the technical points of the present solution, the terms appearing in the present solution are explained here:
sidecar container (sidecar container): sidecar containers are containers that operate together with the main container in the pod, which expands and enhances the functionality of the current container without modifying it. Today, we know that we use container technology to wrap all dependencies so that an application can run anywhere. The sidecar container may add some functionality to the current container without touching or altering the main container.
Mirror image file: the main resource content in the POD is stored in the image file of the POD, and the image file needs to be operated through the POD.
Time step length: in the scheme, for a set time difference, when the memory frequency of the high-frequency memory block is not reduced to a certain degree within the set time difference, the high-frequency memory block is directly transmitted.
Example one
The embodiment of the present application provides a method for migrating a GPU resource POD, which solves the problem of GPU container mirror image transmission, achieves fast CPU state copy, maintains GPU task scheduling, and implements fast migration of an algorithm container cloud GPU resource POD, specifically referring to fig. 1, the method includes:
the method comprises the steps that a GPU client sends a migration request to a migration POD and a GPU server side which need to be subjected to thermal migration, and a control server of an algorithm container cloud platform confirms to receive the POD through the migration request;
after receiving the migration request, the migration component in the migration POD checks a metadata tree diagram index of a mirror image file in the migration POD to obtain a metadata index cache, receives a migration receiving component of the POD from the metadata cache index, establishes a file system in the receiving POD according to the metadata cache index, and places data corresponding to the metadata cache index into the file system;
the method comprises the steps of collecting process tree information in a migration POD, adding an addressing code in each process of the process tree information, copying the addressing code to an address space of a corresponding memory when the corresponding process calls the memory, recording the change frequency of each memory in real time, dividing the memory into a high-frequency memory block and a low-frequency memory block according to the change frequency, and transmitting the high-frequency memory block and the low-frequency memory block to a migration receiving assembly as memory state files.
In some embodiments, in the step of "the control server of the algorithm container cloud platform confirms the received POD by the migration request", the control server confirms the received POD according to the pointing parameter of the migration request, and if the pointing parameter of the migration request is empty, automatically allocates the most suitable server node as the received POD according to the resource and the network topology.
Specifically, the control server analyzes parameters in the migration request, checks whether the directional parameters of the received POD are included, confirms the resource allowance of the received POD through a container cloud platform where the control server is located if the migration request includes the directional parameters of the received POD, determines the received POD if the resource allowance meets the migration condition, and fails the migration request if the resource allowance does not meet the migration condition, and returns a failure result.
Specifically, when the pointing parameter of the migration request is empty, it is considered that the user needs to migrate the migration POD, but does not specify the receiving POD, and the most appropriate receiving POD is found according to the resource amount of the POD and the network topology through an extended schedule of the container cloud platform.
Further, the resource amount and resource margin of the POD in the container cloud platform may be confirmed by etcd (distributed registry service).
In some embodiments, since the resource amount of a received POD is not synchronized in real time with the resource margin of the received POD recorded in the container cloud platform, after confirming the received POD, a confirmation request is sent to the kubelet (control component) of the received POD, and whether the resource margin in the received POD satisfies the migration condition is confirmed again.
That is, after the step of "the control server of the algorithm container cloud platform confirms receiving POD by the migration request", the method comprises the steps of: and sending a confirmation request to the control component of the receiving pod, wherein the confirmation request comprises the content of confirming whether the resource allowance of the receiving pod meets the migration condition.
In some embodiments, in the "migration component in the migration POD receives the migration request" step, the migration component of the migration POD joins the algorithm container cloud platform to obtain the migration request.
In some embodiments, in the step of "checking the metadata tree index of the image file in the migration POD to obtain the metadata cache index", the sidecar container in the migration POD retrieves the metadata tree index of the image file in the migration POD to obtain the metadata index cache.
Specifically, the sidecar container retrieves the used data blocks in the metadata tree graph, and generates the metadata index based on the database where the data blocks are located.
The metadata tree graph index is generated according to data in the image file, and the metadata index cache is the tree graph index corresponding to the used data slice in the metadata tree graph index.
Specifically, the data in the mirror file in the migration POD is divided into fixed-size slices, the data slices are stored in a data layer, a corresponding metadata tree map index is generated according to the data slices in the data layer, the metadata tree map index is retrieved, the data slices used in the metadata tree map index are retrieved, a metadata cache index is generated according to the data slices, and the data slices corresponding to the metadata cache index are sent to the file system of the receiving POD.
Specifically, the metadata tree map index is a self-checking hash tree, and the problem that the content of the file system is inconsistent with that of the mirror image file in the mirror image transmission process can be avoided in a hash checking manner by using the self-checking hash tree as the metadata tree map index; in addition, the image file is transmitted in an index mode, the image file is rapidly transmitted, the minute-level or even hour-level integral transmission of the large-volume image is not required to wait, and the condition that the image file is inconsistent with the content in the file system due to network influence caused by network flash or interruption is avoided due to the existence of the index, so that the rapid transmission of the GPU image file is completed.
In some embodiments, in the step of "the migration receiving component of the receiving POD creates a file system in the receiving POD according to the metadata tree index and the metadata cache index, and puts the metadata corresponding to the metadata cache index into the file system", when the receiving POD uses the file system, the data slice corresponding to the metadata cache index is used through the metadata cache index first, and when a data slice other than the data slice corresponding to the metadata cache index is needed, the data slice in the image file of the migration POD is called for use through the metadata tree map index in a network call mode.
In some embodiments, in the step of "transmitting the high-frequency memory block and the low-frequency memory block to the migration receiving component as the memory state file", the low-frequency memory block is compressed and combined and then transmitted to the migration receiving component as the memory state file, and when the change frequency of the high-frequency memory block is reduced to the change frequency of the low-frequency memory block, the high-frequency memory block is regarded as the low-frequency memory block and compressed and combined and then transmitted to the migration receiving component as the memory state file, or the high-frequency memory block is directly transmitted to the migration receiving component after a set time step.
Specifically, in order to show the change of the frequency of the high-frequency memory block and the low-frequency memory block, the frequency change of the high-frequency memory block and the low-frequency memory block is represented by using a hash value.
Directly transmitting the high-frequency memory block to the migration receiving component after a set time step means: if the change frequency of the high-frequency memory block is still not converted into the low-frequency memory block after the set time duration, the high-frequency memory block still needs to be directly transmitted to the migration receiving component in order to ensure the integrity of memory information transmission.
Further, the migration receiving component merges the high-frequency memory block and the low-frequency memory block, analyzes resources in the high-frequency memory block and the low-frequency memory block, recursively generates a process tree in the receiving POD according to the resources of the high-frequency memory block and the low-frequency memory block, prepares a namespace, fills the resources of the high-frequency memory block and the low-frequency memory block in the namespace, and creates a socket.
Specifically, the socket is a network programming interface, in this embodiment, the receiving POD is connected to the network through the socket, and the network and the receiving POD interact through the socket.
Further, after the resource filling is completed in the received POD, the memory mapping in the migration stage is cancelled, and the addressing codes in the high-frequency memory block and the low-frequency memory block are cleared.
Specifically, through the judgement to memory block change frequency, distinguish high frequency, low frequency with the memory block, and right the low frequency memory block is compressed, is merged the back transmission, is right the high frequency memory block is put in order, waits to the transmission behind the change frequency step-down of high frequency memory block or right behind the time step the high frequency memory block transmits, reasonable reduction transmission overhead, strengthened the stability of memory transmission, accomplished the quick copy of memory state.
In some embodiments, after the GPU server receives the migration request, the task queue of the migration POD is traversed, the iteration number is used as a minimum scheduling unit, input/output analysis and context switching are performed on queues in the task queue, a GPU kernel is called forward, and then the update dimension of the task gradient is calculated backward to determine the priority of the tasks in the task queue.
Specifically, the GPU server is used for receiving the migration request, the CPU and the GPU are separated, coupling of the CPU and the GPU is reduced, the migration request for migrating the POD is forwarded to the server through the client, the server returns a calculation result of the GPU, the complex GPU state is transferred to be changed into the state which only needs to be kept at the server, and the POD is linked again.
Furthermore, tasks with multiple input and output operations, frequent context switching and long time consumption are used as low-priority tasks to be adjusted to the front end of the queue to be executed preferentially, so that blockage occurs in the migration process, when the migration is completed and tasks are recovered, high-priority tasks are executed preferentially, and the utilization rate of the GPU in normal operation is improved.
Specifically, in order to keep the operation of the GPU in a relatively steady state, facilitating the interfacing with new container nodes, artificial manufacturing tasks may be inserted into the queue to freeze the GPU process scheduling.
In some embodiments, after the migration of the data in the file system and the memory state file is completed, performing hash check on the data and the memory state file, after the hash check is passed, sending a pass request to a control server of the algorithm container cloud platform, and the control server performs a network state replacement by interfacing with a GPU server through an API and combining a network proxy component and DNS service of the algorithm cloud platform, and notifying the algorithm container cloud platform to destroy a migration POD.
Further, before performing hash check, the data in the file system should be the same as the total complete image file of the migration POD.
As described above, compared with the prior art, the hot migration method for GPU resource POD provided by the present disclosure can achieve:
1. the problem of GPU container mirror image transmission is solved:
the concept of metadata, i.e. a description of the data, and data, is introduced, in particular, the data in the metadata layer is a self-checking hash tree, while the data layer is divided into fixed-size slices. All data is segmented into data slices and stored in a data layer, and corresponding metadata tree graph indexes are generated at the same time. When the GPU container group starts the hot migration, the metadata tree graph index of the mirror image file is searched, used data slices are searched, and a metadata index cache is generated according to the used data slices.
Sending the metadata index cache to a receiving POD, creating a file system on the receiving POD by the receiving POD according to a flattened metadata tree diagram index and the metadata index cache, synchronizing data corresponding to the metadata index cache into the file system, using the corresponding data by the metadata index cache for the use of the data in the file system, if the data except the data corresponding to the metadata index cache is required to be used, calling the image file in the migration POD in a network calling mode through the metadata tree diagram index, avoiding the problem that the retransmission is required due to network interruption due to the existence of the metadata index cache, and on the other hand, avoiding the problem that the data in the migration POD is inconsistent with the data in the receiving POD file system due to the Hash check function, so that on one hand, the image file is quickly transmitted in an index mode, the problem that the transmission of a large-volume image in a minute-level or even in an hour-level is not required to be transmitted, and on the other hand, avoiding the problem that the network interruption or the network interruption affects the inconsistency of the image file in the image file system and the tree-level of the image file system due to the network interruption or the inconsistency.
CPU state flashcopy:
copying the memory state: according to the process tree information formed by the recursion collection container process and the subprocess of the migration POD process, an addressing code is injected into the migration POD process, when the container process calls a memory map, the addressing code is copied to a corresponding memory address space, the subsequent memory change frequency is recorded, a high-frequency memory block and a low-frequency memory block are distinguished according to the memory change frequency and are subjected to hash representation, the low-frequency memory block is compressed and combined and is transmitted to a migration receiving component for receiving the POD, and for the high-frequency memory block, the high-frequency memory block is converted into the low-frequency memory block for transmission after the call frequency of the high-frequency memory block is reduced, or the high-frequency memory block is transmitted to the migration receiving component for receiving the POD after the time step.
And (3) restoring the memory state: the method comprises the steps of combining a high-frequency memory block and a low-frequency memory block which are transmitted, analyzing resources in a process, recursively generating a process tree which needs to be recovered, preparing a name space, filling memory data, creating a socket, canceling memory mapping in a copying stage, and clearing injected addressing codes, wherein the high-frequency memory block and the low-frequency memory block are combined quickly, so that transmission overhead is reduced, and the stability of memory transmission is enhanced.
State retention for GPU
The CPU and the GPU are separated by using a mode that the client side sends the migration request and the server side receives the migration request, so that the coupling of the CPU and the GPU is reduced, and the POD and the GPU are separated. Specifically, the migration request of the migration POD for the GPU is forwarded to the server side through the client side, and the migration result calculated by the GPU is returned by the server side. This approach changes the complex GPU state passing to only have to keep the GPU state on the server and re-link the receiving POD.
In particular, the following optimization is performed on the operating state of the GPU on the server side, so that the GPU is more suitable for live migration. And taking the iteration times as a minimum unit of scheduling, distinguishing a GPU high-priority task from a GPU low-priority task, wherein the low-priority task often has more input and output and context switching, and adjusting the task priority to the front end of a queue in the migration process. And optionally, artificially manufacturing tasks and inserting queues to freeze GPU process scheduling. The low-priority tasks are preferentially executed, so that the operation of the GPU is kept in a relatively stable state, the GPU is conveniently in butt joint with a receiving POD, when the tasks are recovered after migration is completed, the high-priority tasks are preferentially executed, and the utilization rate of the GPU in normal operation is improved.
The purpose of rapid hot migration of GPU resources POD is achieved by solving the problem of GPU container mirror image transmission, rapidly copying the CPU state and maintaining the GPU state. The second-level heat transfer is basically not perceived by a user, and the maintenance of the running state has great significance to the algorithm with long task chain and long calculation time, so that the algorithm container can realize flexible scheduling collocation, and the robustness and the resource utilization efficiency of the algorithm container platform are enhanced.
Example two
Based on the same concept, referring to fig. 2, the present application further provides a hot migration apparatus for GPU resource POD, including:
a request module: the method comprises the steps that a GPU client sends a migration request to a migration POD and a GPU server side which need to be subjected to thermal migration, and a control server of an algorithm container cloud platform confirms to receive the POD through the migration request;
a file migration module: after receiving the migration request, the migration component in the migration POD checks the metadata tree-shaped graph index of the mirror image file in the migration POD to obtain a metadata index cache, receives the migration receiving component of the POD with the metadata cache index, establishes a file system in the receiving POD according to the metadata cache index, and places data corresponding to the metadata cache index into the file system;
a memory migration module: the method comprises the steps of collecting process tree information in a migration POD, adding an addressing code in each process of the process tree information, copying the addressing code to an address space of a corresponding memory when the corresponding process calls the memory, recording the change frequency of each memory in real time, dividing the memory into a high-frequency memory block and a low-frequency memory block according to the change frequency, and transmitting the high-frequency memory block and the low-frequency memory block to a migration receiving assembly as memory state files.
EXAMPLE III
The present embodiment also provides an electronic device, referring to fig. 3, comprising a memory 404 and a processor 402, wherein the memory 404 stores a computer program, and the processor 402 is configured to execute the computer program to perform the steps of any of the above method embodiments.
Specifically, the processor 402 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more integrated circuits of the embodiments of the present application.
Memory 404 may include, among other things, mass storage 404 for data or instructions. By way of example, and not limitation, memory 404 may include a hard disk drive (hard disk drive, HDD for short), a floppy disk drive, a solid state drive (SSD for short), flash memory, an optical disk, a magneto-optical disk, tape, or a Universal Serial Bus (USB) drive or a combination of two or more of these. Memory 404 may include removable or non-removable (or fixed) media, where appropriate. The memory 404 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 404 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, memory 404 includes Read-only memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or FLASH memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a static random-access memory (SRAM) or a dynamic random-access memory (DRAM), where the DRAM may be a fast page mode dynamic random-access memory 404 (FPMDRAM), an extended data output dynamic random-access memory (EDODRAM), a synchronous dynamic random-access memory (SDRAM), or the like.
Memory 404 may be used to store or cache various data files for processing and/or communication use, as well as possibly computer program instructions for execution by processor 402.
The processor 402 may be configured to implement any of the above-described embodiments of the method for hot migration of GPU resources POD by reading and executing computer program instructions stored in the memory 404.
Optionally, the electronic apparatus may further include a transmission device 406 and an input/output device 408, where the transmission device 406 is connected to the processor 402, and the input/output device 408 is connected to the processor 402.
The transmitting device 406 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wired or wireless network provided by a communication provider of the electronic device. In one example, the transmission device includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmitting device 406 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
The input and output devices 408 are used to input or output information. In this embodiment, the input information may be a migration request for migrating a POD, a high frequency memory block and a low frequency memory block, a metadata tree index, and the like, and the output information may be a migration result, a hash check result, and the like.
Alternatively, in this embodiment, the processor 402 may be configured to execute the following steps by a computer program:
s101, a GPU client sends a migration request to a migration POD and a GPU server which need to be subjected to thermal migration, and a control server of an algorithm container cloud platform confirms to receive the POD through the migration request;
s102, after receiving the migration request, a migration component in the migration POD checks a metadata tree diagram index of a mirror image file in the migration POD to obtain a metadata index cache, receives a migration receiving component of the POD from the metadata cache index, establishes a file system in the receiving POD according to the metadata cache index, and places data corresponding to the metadata cache index into the file system;
s103, collecting process tree information in a migration POD, adding an addressing code in each process of the process tree information, copying the addressing code to an address space of a corresponding memory when the corresponding process calls the memory, recording the change frequency of each memory in real time, dividing the memory into a high-frequency memory block and a low-frequency memory block according to the change frequency, and transmitting the high-frequency memory block and the low-frequency memory block to the migration receiving component as memory state files.
It should be noted that, for specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiment and optional implementation manners, and details of this embodiment are not described herein again.
In general, the various embodiments may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects of the invention may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
Embodiments of the invention may be implemented by computer software executable by a data processor of the mobile device, such as in a processor entity, or by hardware, or by a combination of software and hardware. Computer software or programs (also referred to as program products) including software routines, applets and/or macros can be stored in any device-readable data storage medium and they include program instructions for performing particular tasks. The computer program product may comprise one or more computer-executable components configured to perform embodiments when the program is run. The one or more computer-executable components may be at least one software code or a portion thereof. Further in this regard it should be noted that any block of the logic flow as in figure 3 may represent a program step, or an interconnected logic circuit, block and function, or a combination of a program step and a logic circuit, block and function. The software may be stored on physical media such as memory chips or memory blocks implemented within the processor, magnetic media such as hard or floppy disks, and optical media such as, for example, DVDs and data variants thereof, CDs. The physical medium is a non-transitory medium.
It should be understood by those skilled in the art that various technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, the scope of the present description should be considered as being described in the present specification.
The above examples are merely illustrative of several embodiments of the present application, and the description is more specific and detailed, but not to be construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application should be subject to the appended claims.

Claims (9)

1. A hot migration method of GPU resource POD comprises the following steps:
the method comprises the steps that a GPU client sends a migration request to a migration POD and a GPU server side which need to be subjected to thermal migration, and a control server of an algorithm container cloud platform confirms to receive the POD through the migration request;
after receiving the migration request, a migration component in the migration POD checks a metadata tree graph index of an image file in the migration POD to obtain a metadata cache index, a sidecar container in the migration POD retrieves the metadata tree graph index of the image file in the migration POD to obtain the metadata cache index, the metadata tree graph index is generated according to a data slice in the image file, the metadata cache index is a tree graph index corresponding to a used data slice in the metadata tree graph index, the metadata cache index is sent to a migration receiving component of the receiving POD, the migration receiving component of the receiving POD establishes a file system in the receiving POD according to the metadata cache index, and data corresponding to the metadata cache index is placed in the file system;
the method comprises the steps of collecting process tree information in a migration POD, adding an addressing code in each process of the process tree information, copying the addressing code to an address space of a corresponding memory when the corresponding process calls the memory, recording the change frequency of each memory in real time, dividing the memory into a high-frequency memory block and a low-frequency memory block according to the change frequency, and transmitting the high-frequency memory block and the low-frequency memory block to a migration receiving assembly as memory state files.
2. A method according to claim 1, wherein in the step of "control server of algorithm container cloud platform confirms receiving POD by said migration request", said control server confirms receiving POD according to pointing parameter of said migration request, if pointing parameter of said migration request is empty, then automatically allocating optimum server node as receiving POD according to resource and network topology.
3. The method according to claim 1, wherein in the step of "after the migration request is received by the migration component in the migration POD", the migration component of the migration POD is added to the algorithm container cloud platform to obtain the migration request.
4. The method according to claim 1, wherein in the step of creating, by the migration receiving component of the receiving POD, a file system in the receiving POD according to the metadata tree index and the metadata cache index, and placing metadata corresponding to the metadata cache index into the file system, the receiving POD uses its corresponding data slice through the metadata cache index first when using the file system, and calls, in a network call manner, a data slice in the image file of the migration POD for use through the metadata tree map index when needing to use a data slice other than the data slice corresponding to the metadata cache index.
5. The method according to claim 1, wherein in the step of "transmitting the high-frequency memory block and the low-frequency memory block as the memory state files to the migration receiving component", the low-frequency memory block is compressed and merged and then transmitted as the memory state files to the migration receiving component, and when a change frequency of the high-frequency memory block decreases to the change frequency of the low-frequency memory block, the high-frequency memory block is regarded as the low-frequency memory block, compressed and merged and then transmitted as the memory state files to the migration receiving component, or the high-frequency memory block is directly transmitted to the migration receiving component after a time step.
6. The method according to claim 1, wherein after the GPU server receives the migration request, traversing a task queue of the migration POD, performing input/output analysis and context switching on queues in the task queue with an iteration number as a minimum scheduling unit, and calling a GPU kernel forward, and then determining priorities of tasks in the task queue by calculating update dimensions of task gradients backward.
7. A device for hot migration of GPU resource PODs, comprising:
a request module: the method comprises the steps that a GPU client sends a migration request to a migration POD and a GPU server side which need to be subjected to thermal migration, and a control server of an algorithm container cloud platform confirms to receive the POD through the migration request;
a file migration module: after receiving the migration request, a migration component in the migration POD checks a metadata tree map index of an image file in the migration POD to obtain a metadata cache index, a sidecar container in the migration POD retrieves the metadata tree map index of the image file in the migration POD to obtain the metadata cache index, the metadata tree map index is generated according to data slices in the image file, the metadata cache index is a tree map index corresponding to a used data slice in the metadata tree map index, the metadata cache index is sent to a migration receiving component of the receiving POD, and the migration receiving component of the receiving POD establishes a file system in the receiving POD according to the metadata cache index and places data corresponding to the metadata cache index into the file system;
a memory migration module: the method comprises the steps of collecting process tree information in a migration POD, adding an addressing code in each process of the process tree information, copying the addressing code to an address space of a corresponding memory when the corresponding process calls the memory, recording the change frequency of each memory in real time, dividing the memory into a high-frequency memory block and a low-frequency memory block according to the change frequency, and transmitting the high-frequency memory block and the low-frequency memory block serving as memory state files to a migration receiving assembly.
8. An electronic device comprising a memory and a processor, wherein the memory stores a computer program, and the processor is configured to execute the computer program to perform a method for hot migration of GPU resources POD according to any of claims 1-6.
9. A readable storage medium having stored therein a computer program comprising program code for controlling a process to execute a process, the process comprising a method for hot-migration of a GPU resource POD according to any of claims 1-6.
CN202211169473.5A 2022-09-26 2022-09-26 Hot migration method, device and application of GPU (graphics processing Unit) resource POD (POD) Active CN115292051B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211169473.5A CN115292051B (en) 2022-09-26 2022-09-26 Hot migration method, device and application of GPU (graphics processing Unit) resource POD (POD)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211169473.5A CN115292051B (en) 2022-09-26 2022-09-26 Hot migration method, device and application of GPU (graphics processing Unit) resource POD (POD)

Publications (2)

Publication Number Publication Date
CN115292051A CN115292051A (en) 2022-11-04
CN115292051B true CN115292051B (en) 2023-01-03

Family

ID=83833755

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211169473.5A Active CN115292051B (en) 2022-09-26 2022-09-26 Hot migration method, device and application of GPU (graphics processing Unit) resource POD (POD)

Country Status (1)

Country Link
CN (1) CN115292051B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8285758B1 (en) * 2007-06-30 2012-10-09 Emc Corporation Tiering storage between multiple classes of storage on the same container file system
WO2015027513A1 (en) * 2013-09-02 2015-03-05 运软网络科技(上海)有限公司 System for migrating point of delivery across domains
CN107704618A (en) * 2017-10-27 2018-02-16 北京航空航天大学 A kind of heat based on aufs file system migrates method and system
CN108279969A (en) * 2018-02-26 2018-07-13 中科边缘智慧信息科技(苏州)有限公司 Stateful service container thermomigration process based on memory compression transmission
CN110119377A (en) * 2019-04-24 2019-08-13 华中科技大学 Online migratory system towards Docker container is realized and optimization method
CN112631715A (en) * 2020-12-04 2021-04-09 苏州浪潮智能科技有限公司 Mirror migration optimization method, system and medium based on cache matching
CN113918096A (en) * 2021-10-21 2022-01-11 城云科技(中国)有限公司 Method and device for uploading algorithm mirror image packet and application
CN114172729A (en) * 2021-12-08 2022-03-11 中国电信股份有限公司 Trusted container migration method and device and storage medium
CN114528086A (en) * 2022-02-26 2022-05-24 苏州浪潮智能科技有限公司 Method, device, equipment and medium for thermal migration of pooled heterogeneous cloud computing application
CN114816656A (en) * 2022-03-11 2022-07-29 新华三大数据技术有限公司 Container group migration method, electronic device and storage medium
CN114860378A (en) * 2022-04-29 2022-08-05 苏州浪潮智能科技有限公司 File system migration method, device, system and medium thereof
CN114968477A (en) * 2022-04-21 2022-08-30 京东科技信息技术有限公司 Container heat transfer method and container heat transfer device
CN115039089A (en) * 2019-11-29 2022-09-09 亚马逊技术有限公司 Warm tier storage for search services

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8639665B2 (en) * 2012-04-04 2014-01-28 International Business Machines Corporation Hybrid backup and restore of very large file system using metadata image backup and traditional backup
US10476809B1 (en) * 2014-03-12 2019-11-12 Amazon Technologies, Inc. Moving virtual machines using migration profiles
CN105893542B (en) * 2016-03-31 2019-04-12 华中科技大学 A kind of cold data file redistribution method and system in cloud storage system
US10162559B2 (en) * 2016-09-09 2018-12-25 Veritas Technologies Llc Systems and methods for performing live migrations of software containers
US10545913B1 (en) * 2017-04-30 2020-01-28 EMC IP Holding Company LLC Data storage system with on-demand recovery of file import metadata during file system migration
CN109697016B (en) * 2017-10-20 2022-02-15 伊姆西Ip控股有限责任公司 Method and apparatus for improving storage performance of containers
US10824466B2 (en) * 2018-09-26 2020-11-03 International Business Machines Corporation Container migration
US20220004428A1 (en) * 2020-07-02 2022-01-06 International Business Machines Corporation Artificial intelligence optimized cloud migration
US11314687B2 (en) * 2020-09-24 2022-04-26 Commvault Systems, Inc. Container data mover for migrating data between distributed data storage systems integrated with application orchestrators

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8285758B1 (en) * 2007-06-30 2012-10-09 Emc Corporation Tiering storage between multiple classes of storage on the same container file system
WO2015027513A1 (en) * 2013-09-02 2015-03-05 运软网络科技(上海)有限公司 System for migrating point of delivery across domains
CN107704618A (en) * 2017-10-27 2018-02-16 北京航空航天大学 A kind of heat based on aufs file system migrates method and system
CN108279969A (en) * 2018-02-26 2018-07-13 中科边缘智慧信息科技(苏州)有限公司 Stateful service container thermomigration process based on memory compression transmission
CN110119377A (en) * 2019-04-24 2019-08-13 华中科技大学 Online migratory system towards Docker container is realized and optimization method
CN115039089A (en) * 2019-11-29 2022-09-09 亚马逊技术有限公司 Warm tier storage for search services
CN112631715A (en) * 2020-12-04 2021-04-09 苏州浪潮智能科技有限公司 Mirror migration optimization method, system and medium based on cache matching
CN113918096A (en) * 2021-10-21 2022-01-11 城云科技(中国)有限公司 Method and device for uploading algorithm mirror image packet and application
CN114172729A (en) * 2021-12-08 2022-03-11 中国电信股份有限公司 Trusted container migration method and device and storage medium
CN114528086A (en) * 2022-02-26 2022-05-24 苏州浪潮智能科技有限公司 Method, device, equipment and medium for thermal migration of pooled heterogeneous cloud computing application
CN114816656A (en) * 2022-03-11 2022-07-29 新华三大数据技术有限公司 Container group migration method, electronic device and storage medium
CN114968477A (en) * 2022-04-21 2022-08-30 京东科技信息技术有限公司 Container heat transfer method and container heat transfer device
CN114860378A (en) * 2022-04-29 2022-08-05 苏州浪潮智能科技有限公司 File system migration method, device, system and medium thereof

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Good Shepherds Care For Their Cattle: Seamless Pod Migration in Geo-Distributed Kubernetes;Paulo Souza Junior 等;《2022 IEEE 6th International Conference on Fog and Edge Computing (ICFEC)》;20220620;第26-34页 *
一种面向节能的虚拟机在线迁移解决方案;赵丹等;《计算机技术与发展》;20171031(第02期);全文 *
容器热迁移的快速内存同步技术;游强志 等;《计算机与现代化》;20220131;第17-34页 *

Also Published As

Publication number Publication date
CN115292051A (en) 2022-11-04

Similar Documents

Publication Publication Date Title
US11928029B2 (en) Backup of partitioned database tables
US11036591B2 (en) Restoring partitioned database tables from backup
US10296494B2 (en) Managing a global namespace for a distributed filesystem
US11327949B2 (en) Verification of database table partitions during backup
US9906598B1 (en) Distributed data storage controller
AU2013266122B2 (en) Backup image duplication
US8548953B2 (en) File deduplication using storage tiers
US20180011874A1 (en) Peer-to-peer redundant file server system and methods
US9804928B2 (en) Restoring an archived file in a distributed filesystem
US8930364B1 (en) Intelligent data integration
CN112565325B (en) Mirror image file management method, device and system, computer equipment and storage medium
CN110347651A (en) Method of data synchronization, device, equipment and storage medium based on cloud storage
CN114090179A (en) Migration method and device of stateful service and server
US11080909B2 (en) Image layer processing method and computing device
CN115292051B (en) Hot migration method, device and application of GPU (graphics processing Unit) resource POD (POD)
CN116860527A (en) Migration method for container using local storage in Kubernetes environment
JP7193515B2 (en) Synchronous object placement for information lifecycle management
CN114860378A (en) File system migration method, device, system and medium thereof
JP2007293433A (en) Document management system
CN114625474A (en) Container migration method and device, electronic equipment and storage medium
CN117009310B (en) File synchronization method and device, distributed global content library system and electronic equipment
WO2024040902A1 (en) Data access method, distributed database system and computing device cluster
CN117596257A (en) Method, device, computer equipment and storage medium for fusing storage uploading objects
CN116719604A (en) Container migration method and device, storage medium and electronic equipment
CN116088755A (en) Data storage method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant