WO2024066904A1 - 一种容器创建方法、系统及节点 - Google Patents

一种容器创建方法、系统及节点 Download PDF

Info

Publication number
WO2024066904A1
WO2024066904A1 PCT/CN2023/116187 CN2023116187W WO2024066904A1 WO 2024066904 A1 WO2024066904 A1 WO 2024066904A1 CN 2023116187 W CN2023116187 W CN 2023116187W WO 2024066904 A1 WO2024066904 A1 WO 2024066904A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
file
file system
image
container
Prior art date
Application number
PCT/CN2023/116187
Other languages
English (en)
French (fr)
Inventor
罗先强
黄克骥
王�锋
张长建
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024066904A1 publication Critical patent/WO2024066904A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines

Definitions

  • the present application relates to the field of storage technology, and in particular to a container creation method, system and node.
  • Containers are created based on image files, which provide the programs, libraries, resources, configuration parameters and other files required for the container to run.
  • each node needs to pull the image file from the image repository and write the image file to the directory of the image file in the node.
  • the directory of the image file in the node describes the storage location of the image file in the node.
  • the storage location indicated by the directory of the image file in the node is located in the local storage device of the node.
  • the local storage device of the node refers to the storage device such as the hard disk in the node.
  • each node Since each node needs to perform an operation to pull the image file from the image warehouse, it will cause pressure on the image warehouse.
  • the image file is directly stored in the local storage device of the node, which will occupy the node's own storage space and compress the node's own storage space.
  • the present application provides a container creation method, system and node, which are used to reduce the occupation of node local storage space during container creation.
  • an embodiment of the present application provides a container creation method, in which the first node can mount the directory of the image file of the container to the first file system on the remote storage device before creating the container, and establish an association between the directory of the image file and the first file system, so that the image file written by the first node to the directory of the image file will be stored in the first file system.
  • the remote storage device is a storage device independent of the first node.
  • the remote storage device is a storage device outside the first node, for example, the remote storage device can be connected to the first node through a network.
  • the first node When the first node needs to create a container, it can first obtain the image file from the first file system, and create the container on the first node based on the image file.
  • the first node corresponds to the node 110N in the embodiment of the present application.
  • the directory of the mirror file of the first node can be mounted to the first file system on the remote storage device, so that the mirror file can be stored on the remote storage device, which can reduce the occupation of the local storage space of the first node by the mirror file.
  • the second node can also mount the directory of the container's image file to the first file system.
  • the second node is taken as an example as the first node to store the image file in the first file system.
  • the second node obtains the image file from the image repository and stores the image file in the first file system.
  • the second node corresponds to the node 110M in the embodiment of the present application.
  • the second node pulls the image file from the image repository and writes the image file to the first file system.
  • the first node needs to create a container, it no longer needs to pull the image file from the image repository, but only needs to obtain the image file from the first file system. This reduces the number of times the node that needs to deploy the container pulls the image file from the image repository, and realizes the sharing of image files among multiple nodes.
  • the second node when the second node needs to create a container, the second node obtains the image file from the first file system, and creates the container on the second node based on the image file.
  • the second node can also obtain the image file from the first file system when creating a container, which further ensures that the image file is shared between the first node and the second node.
  • the second node when the second node stores the image file in the first file system, the second node may store incremental data in the image file in the first file system, where the incremental data is data different from the image file and other image files stored in the first file system.
  • the second node does not need to save the complete image file in the first file system, but only needs to save some unsaved data in the first file system, thereby reducing the storage space occupied in the remote storage device and reducing the amount of data that needs to be exchanged between the second node and the first file system.
  • the image file obtained from the image warehouse can be
  • the incremental data in the image file is data that is different from the image file and other image files stored in the first file system.
  • the second node and the image repository only need to exchange incremental data of the image file, which can effectively reduce the amount of data transmitted between the second node and the image repository and improve the pulling speed of the image file.
  • first node and the second node may be located in different data centers, or the first node and the second node may be located in the same data center.
  • the deployment mode of the first node and the second node is more flexible and suitable for different scenarios.
  • the remote storage device and the first node may be located in the same data center.
  • the remote storage device and the first node may also be located in different data centers.
  • the remote storage device and the second node may be located in the same data center.
  • the remote storage device and the second node may also be located in different data centers.
  • the deployment mode of the remote storage device, the first node, and the second node is relatively flexible and applicable to different scenarios.
  • the interaction operation with the first file system can be offloaded to the DPU of the first node.
  • the DPU of the first node can access the first file system.
  • the DPU in the first node obtains the image file from the first file system.
  • the DPU in the first node can access the first file system, thereby reducing the occupation of the processor in the first node by access to the first file system.
  • the first node in addition to mounting the directory of the image file on the first file system of the remote storage device, the first node may also mount the directory of other files of the container on the file system of other remote storage devices.
  • the first node may also mount the directory of other files of the container on other file systems of the remote storage device. That is to say, the directories of different files of the container can be mounted on different file systems, and these different file systems can be on the same remote storage device or on different remote devices.
  • the first node mounts the directory of the root file of the container on a second file system on a storage device independent of the first node.
  • the storage device here may be the same device as the remote storage device where the first file system is located, or it may be a different device.
  • the first node mounts the directory of the persistent volume PVC of the container on a third file system on a storage device independent of the first node.
  • the storage device here may be the same device as the remote storage device where the first file system is located, or it may be a different device.
  • the first node mounts the directory of other files of the container on the file system of the remote storage device, so that other files of the container can also be stored on the file system, further reducing the occupation of the local storage space of the first node.
  • the DPU of the first node when the first node mounts the directory of other files of the container on the file system of the remote storage device, the DPU of the first node can be used to access the file system on which the directory of other files of the container is mounted, for example, the DPU of the first node can access the second file system and the third file system.
  • the DPU in the first node implements access to the remote storage device to implement reading and writing of other files in the container, further reducing the occupation of the processor in the first node by the reading and writing operations of other files in the container.
  • the second node obtains the image file from the image warehouse, and when storing the image file in the first file system, the second node obtains the compressed image file from the image warehouse.
  • the second node decompresses the compressed image file, and stores the decompressed image file in the first file system.
  • the second node can decompress the image file, so that when creating a container later, the decompressed image file can be directly read from the first file system, thereby improving the efficiency of container creation.
  • an embodiment of the present application provides a container creation system, and the beneficial effects can be found in the relevant description of the first aspect, which will not be repeated here.
  • the container creation system includes a first remote storage device and a first node.
  • the first remote storage device is a storage device independent of the first node.
  • a first file system is deployed on the first remote storage device.
  • the first node When the first node needs to create a container, it can mount the directory of the container's image file to the first file system, obtain the image file from the first file system, and create a container on the first node based on the image file.
  • system further includes a second node and an image repository.
  • the image warehouse stores image files.
  • the second node can mount the directory of the image file of the container to the first file system, and can also obtain the image file from the image warehouse and store the image file in the first file system.
  • the second node may also obtain the image file from the first file system, and create the container based on the image file.
  • the second node when the second node stores the image file in the first file system, it may store incremental data in the image file in the first file system, where the incremental data is data different from the image file in other image files stored in the first file system.
  • the image repository may send the incremental data in the image file to the second node, and the second node may The incremental data in the image file is obtained, where the incremental data is data different from the image file and other image files stored in the first file system.
  • the first node and the second node may be located in different data centers or in the same data center.
  • first remote storage device and the first node or the second node may be located in the same data center.
  • the first remote storage device and the first node or the second node may also be located in different data centers.
  • the first node may offload access operations of the first file system to the DPU. For example, when the first node obtains an image file from the first file system, the DPU in the first node may obtain the image file from the first file system.
  • the system further includes a second remote storage device and a third remote storage device; the second remote storage device and the third remote storage device are storage devices independent of the first node.
  • the second remote storage device is deployed with a second file system.
  • the third remote storage device is deployed with a third file system.
  • the first node can also mount the directory of the root file of the container to the second file system, and mount the directory of the persistent volume PVC of the container to the third file system.
  • the remote storage device here can be the same device as the remote storage device where the first file system is located, or it can be a different device.
  • the first file system, the second file system, and the third file system are located in different remote storage devices (that is, located in the first remote storage device, the second remote storage device, and the third remote storage device, respectively). In actual applications, part or all of the first file system, the second file system, and the third file system may also be located in the same remote storage device.
  • the DPU of the first node accesses the second file system and the third file system.
  • the second node obtains the image file from the image repository, and when storing the image file in the first file system, the second node obtains the compressed image file from the image repository; after decompressing the compressed image file, the decompressed image file is stored in the first file system.
  • the present application provides a container creation device, which has the function of implementing the first node behavior in the method example of the first aspect above.
  • the beneficial effects can be found in the description of the first aspect and will not be repeated here.
  • the function can be implemented by hardware, or by hardware executing the corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the structure of the container creation device includes a first mounting module, a first acquisition module, and a first creation module. These modules can execute the corresponding functions in the method example of the first aspect above. Please refer to the detailed description in the method example for details, which will not be repeated here.
  • the present application provides another container creation device, which has the function of implementing the second node behavior in the method example of the first aspect above.
  • the beneficial effects can be found in the description of the first aspect and will not be repeated here.
  • the function can be implemented by hardware or by executing the corresponding software through hardware.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the structure of the container creation device includes a second mounting module, a second acquisition module, and a second creation module, which can execute the corresponding functions of the method in the first aspect and each possible implementation method of the first aspect above. Please refer to the detailed description in the method example for details, which will not be repeated here.
  • the present application also provides a container creation node, which can be the first node or the second node in the method instance described in the first aspect and each possible implementation method of the first aspect.
  • the memory is used to store computer program instructions.
  • the processor has the function of implementing the behavior in the method instance described in the first aspect or any possible implementation method of the first aspect. The beneficial effects can be found in the description of the first aspect and will not be repeated here.
  • the container creation node may also include a DPU, which can be used to access the first file system, the second file system, or the third file system.
  • the present application further provides a computer-readable storage medium, wherein the computer-readable storage medium stores instructions, which, when executed on a computer, enables the computer to execute the method described in the first aspect and each possible implementation manner of the first aspect.
  • the present application also provides a computer chip, which is connected to a memory, and the chip is used to read and execute a software program stored in the memory, and to execute the methods in the above-mentioned first aspect and various possible implementation methods of the first aspect.
  • the present application further provides a computer program product comprising instructions, which, when executed on a computer, enables the computer to execute the method described in the first aspect and various possible implementations of the first aspect.
  • FIG1 is a schematic diagram of the architecture of a system provided by the present application.
  • FIG2 is a schematic diagram of the structure of a node provided by the present application.
  • FIG3 is a schematic diagram of a container creation method provided by the present application.
  • FIG4 is a schematic diagram of a mount information configuration interface provided by the present application.
  • FIG5 is a schematic diagram of an overlay system structure provided by the present application.
  • FIGS. 6 and 7 are schematic diagrams of the structure of a container creation device provided in the present application.
  • Virtualization is a resource management technology that abstracts and transforms various physical resources of a node, such as servers, networks, memory, and storage, and presents them.
  • Container is a type of virtualization technology.
  • Container is an independent operating environment simulated by virtualization technology.
  • Container is similar to a lightweight sandbox, which shields the software and hardware outside the container.
  • Container runs on the node and can be regarded as a special process in essence.
  • k8s is an open source container cluster management system. k8s builds a container scheduling service. k8s is user-oriented and enables users to manage container clusters through k8s. k8s can run in the node used to deploy containers, or it can be deployed in devices outside the node.
  • k8s users do not need to perform complex configuration work. Users only need to set some necessary parameters in the k8s client, such as the identification of the image file that the container needs to be based on and the number of container pods. Among them, a pod consists of a group of containers working on the same working node. In the embodiment of the present application, the necessary parameters also include the mount information of the container file, which refers to the remote file system mounted by the directory of different files of the container.
  • K8s will automatically select the appropriate node to perform specific container cluster scheduling processing based on the necessary parameters set by the user.
  • k8s is used as an example to manage containers.
  • other systems that can be used to implement container management can also be used.
  • the container has three files with different attributes, namely, the root file (root file system, rootfs), the image file, and the persistent volume (persistent volume claim, PVC).
  • the root file root file system, rootfs
  • the image file the persistent volume (persistent volume claim, PVC).
  • PVC persistent volume claim
  • An image is a read-only template for container runtime.
  • Each image consists of a series of layers, which can provide the programs, libraries, resources, configuration parameters and other files required for container runtime.
  • images have a hierarchical structure, when images exist in the form of files, different layers can correspond to a sub-file, and each sub-file can also include multiple sub-files.
  • the image file is also called an image file system.
  • An image is a special file system.
  • the image repository is used to store a large number of images and is responsible for the storage, management, and distribution of images.
  • a node needs to create a container, it pulls the required image from the image repository, creates a container using the pulled image, and starts the application in the container.
  • an image file can be pulled from an image repository by a node, stored in a remote file system, and a container is created based on the image file in the remote file system.
  • the remote file system can be a shared file system, that is, after the image file is stored in the shared file system, other nodes in the cluster to which the node belongs can obtain the image file from the shared file system and create a container based on the image file.
  • the image warehouse can be deployed in a cluster consisting of multiple devices, that is, the cluster provides one or more image files to the node.
  • the image warehouse can also be deployed on a single device, that is, the device provides one or more image files to the node.
  • the embodiment of the present application does not limit the specific deployment method of the image warehouse. In the following, for the convenience of description, the image warehouse is used to refer to a cluster or a single device deployed with the image warehouse.
  • Rootfs is also called root file system.
  • rootfs is the working directory of the container, which is mainly used to store temporary data, intermediate data, etc.
  • temporary data and intermediate data include The data that needs to be temporarily stored when users operate on the container, as well as some data that needs to be temporarily stored when the application in the container is running.
  • the life cycle of the data in the root file is consistent with the life cycle of the container, that is, when the container is deregistered, the root file will also be deleted.
  • the rootfs of the container is stored in the remote file system, and the rootfs of the container will not be stored in the local storage device of the node where the container is located, that is, it will not occupy the storage space of the node's own local storage device, so as to reduce the container's occupation of the node's own storage space.
  • PVC is a container's data volume.
  • PVC is used to store data that needs to be persistent.
  • the life cycle of data in PVC is longer than that of the container. That is, after the container instance disappears, the data in PVC still exists and will not be lost.
  • the data in PVC includes the data written to the PCV file by the user when operating the container, and also includes some data generated by the container application during operation that needs to be persistently stored.
  • PVC PVC
  • the existence of PVC also ensures that some data will not be lost when a container fails. After the failed container is migrated, the newly created container used to replace the failed container can continue to use the PVC of the failed container.
  • the PVC can be stored in a remote file system.
  • the remote file system can be a shared file system, that is, after the PVC is stored in the shared file system, other nodes in the cluster to which the node belongs can obtain the PVC from the shared file system.
  • a container fails, a new container can be pulled up on a node in the cluster to replace the failed container, realizing the failure migration of the container.
  • the new container can obtain the data in the PVC from the shared file system and can continue to write data to the PVC.
  • the remote storage device is a device with storage function. It is particularly emphasized here that the so-called “remote” storage device refers to a storage device independent of the node. The remote storage device is deployed outside the node and is connected to the node through a network. Correspondingly, the local storage device refers to the node's own storage device, such as a hard disk connected to the node through a system bus.
  • the data on the remote storage device is organized in the concept of files. Each file has a unique file name. Through these file groupings, the files in the same group are placed in a directory. Other files or directories (also called subdirectories) can be placed under a directory, forming a "file system" with a tree structure. For any file in the tree structure, go down from the root node of the tree structure step by step until the file is located.
  • the file system means that the data access to the remote storage device is file-level access.
  • the file system deployed on the remote storage device is referred to as a remote file system.
  • a remote file system two types of remote file systems are involved, one is a shared remote file system, and the other is an exclusive remote file system.
  • a shared remote file system can be shared by multiple nodes, that is, each node has established a connection with the shared remote file system, and each node can communicate with the remote storage device on which the shared remote file system is deployed based on a network protocol.
  • Any node is allowed to write data to the shared remote file system deployed by the remote storage device, such as writing a mirror file or writing a PVC.
  • Any node is allowed to read data from the shared remote file system deployed by the remote storage device, such as reading PVCs written by other nodes, or reading data previously written by the node in the PVC. For another example, reading data previously written by other nodes in the PVC.
  • An exclusive remote file system is dedicated to a node (or a container), so that the node (or a container) can write some data that only belongs to the node (or a container) in the exclusive remote file system.
  • the node is allowed to write data from the exclusive remote file system deployed by the remote storage device, such as writing to rootfs.
  • the node is allowed to read data from the exclusive remote file system deployed by the remote storage device, such as reading data previously written by the node in rootfs.
  • the files in the exclusive remote file system can be stored based on the key-value (KV) structure.
  • the key in the key-value pair is the file name.
  • the value in the key-value pair is the file.
  • the storage device on which the remote file system is deployed is called a remote storage device, but it does not mean that the remote file system is deployed in a storage device.
  • a remote file system can be deployed on multiple storage nodes to form a distributed file system.
  • the multiple storage nodes are taken as a whole, and the whole including the multiple storage nodes can be understood as a remote storage device. That is to say, in the embodiments of the present application, the remote storage device can be understood as a storage device, and can also be understood as a system including multiple storage nodes.
  • Rootfs directory image file directory
  • PVC directory PVC directory
  • the directory of the container's files describes the storage location of the file on the node where the container is located.
  • the directories of the files are the rootfs directory, the image file directory, and the PVC directory.
  • the rootfs directory describes the storage location of the rootfs on the node
  • the image file directory describes the storage location of the image file on the node
  • the PVC directory describes the storage location of the PVC on the node.
  • the directory of rootfs can be understood as a folder or the name of a folder in the node, in which rootfs needs to be stored.
  • the directory of image files can be understood as a folder or the name of a folder in the node.
  • the directory of PVC can be understood as a folder or the name of a folder in the node.
  • rootfs for a node, as long as the rootfs directory is known, it can know in which folder the rootfs is recorded.
  • the rootfs directory can be configured by the user, that is, the user can configure the name of the folder used to store the rootfs in the node.
  • the rootfs directory can also be recorded by the container's configuration file. The user only needs to check the configuration file to determine the name of the folder storing the rootfs, and then determine the folder storing the rootfs.
  • rootfs As an example, in the embodiment of the present application, by mounting (mount), it is allowed to associate the directory of rootfs with a remote file system, or it can be called mounting the directory of rootfs in a remote file system. With the help of mounting, the rootfs under the directory of rootfs is actually stored in the remote file system.
  • the node side when data needs to be written in rootfs, when the node side needs to write data to the directory of the rootfs, the node stores the data in the remote file system associated with the directory of the rootfs. When the data of rootfs needs to be displayed, the node can read the data from the remote file system to the node locally.
  • FIG1 it is a schematic diagram of the architecture of a container creation system provided in an embodiment of the present application.
  • the container creation system 100 includes at least one node 110 and a remote storage device 120.
  • the system 100 also includes an image repository 130.
  • the number of the remote storage devices 120 is not limited in the embodiments of the present application, and may be one or more.
  • One or more file systems are deployed on each remote storage device 120.
  • FIG. 1 three remote storage devices 120 are exemplarily shown, and the three remote storage devices 120 are remote storage device 120A, remote storage device 120B, and remote storage device 120C.
  • the remote storage device 120A is deployed with a file system A, which is an exclusive remote file system.
  • the remote storage device 120B is deployed with a file system B, which is a shared remote file system.
  • the remote storage device 120C is deployed with a file system C, which is a shared remote file system.
  • the node 110 may be a computing device, including but not limited to a personal computer, a server, a mobile phone, a tablet computer, or a smart car, etc.
  • the node 110 may also be a virtual machine.
  • the present application does not limit the deployment locations of the multiple nodes 110.
  • the multiple nodes 110 can be deployed in the same data center or in different data centers.
  • the present application does not limit the deployment locations of the nodes 110 and the remote storage device 120.
  • the nodes 110 and the remote storage device 120 can be located in the same data center or in different data centers.
  • the node 110 can mount the directory of the container's files and mount any directory of the container's files to a remote file system.
  • the node 110 can mount the directory of the container's rootfs to file system A, mount the directory of the container's image files to file system B, and mount the container's PVC to file system C.
  • Node 110 obtains the image file of the container from the image repository 130, stores the image file in the directory of the image file (actually in the remote file system mounted by the directory of the image file), and creates a container based on the image file.
  • the directories of the image files of the containers in these nodes 110 can be mounted to the same shared remote file system. In this way, only one of the nodes 110 needs to write the image file to the directory of the image file, and the other nodes 110 can obtain the image file from the shared remote file system and create a container based on the image file. That is, the other nodes 110 do not need to repeatedly pull up the image file from the image repository 130.
  • temporary data or intermediate data generated during the operation of the container can be written into the rootfs directory (that is, written into the rootfs under the rootfs directory), that is, the temporary data or intermediate data generated during the operation of the container will be transferred to the remote file system mounted by the rootfs directory.
  • Data generated during the operation of the container that needs to be persistently stored can be written into the PVC directory (that is, written into the PVC under the PVC directory), and the node 110 can transfer the data written into the PVC directory to the file system mounted by the PVC directory.
  • a remote file system is deployed on the remote storage device 120 to provide storage space for the node 110.
  • the embodiment of the present application does not limit the specific form of the remote storage device 120.
  • the remote storage device 120 can be expressed as a system including multiple storage nodes 110, or as a memory.
  • the node 110 side can provide the directory of the container's files and the mounting function of the remote file system, so that the container's files can be stored in the remote file system, occupying the storage space of the remote storage device 120, avoiding the container's local node 110 of storage devices.
  • node 110 when node 110 obtains an image file from the image repository 130, it can only obtain incremental data.
  • the so-called incremental data refers to the data (i.e., difference data) that is different from the image file that node 110 has obtained (the image file that node 110 has obtained is the image file that node 110 has stored in the remote file system) and the image file that currently needs to be obtained.
  • This can reduce the amount of data interacting between node 110 and the image repository 130 and improve the transmission rate of the image file.
  • node 110 stores the image file in the remote file system, it can also only store incremental data.
  • the remote file system does not need to store a large amount of duplicate data, which can improve the storage space utilization of the remote file system.
  • the amount of data interacting between node 110 and the remote file system will also be reduced, accelerating the interaction efficiency between node 110 and the remote file system.
  • the node 110 includes an I/O interface 113, a processor 111, a memory 112, and an accelerator 114.
  • the I/O interface 113, the processor 111, the memory 112, and the accelerator 114 may be connected via a system bus, which may be a peripheral component interconnect express (PCIe) bus, or a compute express link (CXL), a universal serial bus (USB) protocol, or other protocol buses.
  • PCIe peripheral component interconnect express
  • CXL compute express link
  • USB universal serial bus
  • FIG. 2 exemplarily shows a connection method.
  • the acceleration device 114 can be directly inserted into a card slot on the mainboard of the node 110 , and exchange data with the processor 111 through the PCIe bus 115 .
  • the I/O interface 113 is used to communicate with devices outside the node 110. For example, a container creation instruction sent by an external device is received through the I/O interface 113, an image file is obtained from the image repository 130 through the I/O interface 113, and the image file, rootfs or PCV file is sent to the remote storage device 120 through the I/O interface 113.
  • the processor 111 is the computing core and control core of the node 110, which can be a central processing unit (CPU) or other specific integrated circuits.
  • the processor can also be other general-purpose processors, digital signal processors (DSP), application specific integrated circuits (ASIC), field programmable gate arrays (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • DSP digital signal processors
  • ASIC application specific integrated circuits
  • FPGA field programmable gate arrays
  • the memory 112 is usually used to store various computer program instructions and data running in the operating system of the node 110. In order to improve the access speed of the processor 111, the memory 112 needs to have the advantage of fast access speed.
  • the memory 112 usually uses a dynamic random access memory 112 (dynamic random access memory, DRAM) as the memory 112.
  • DRAM dynamic random access memory
  • the memory 112 can also be other random access memories 112, such as static random access memory 112 (Static random access memory, SRAM), etc.
  • static random access memory 112 static random access memory 112 (Static random access memory, SRAM), etc.
  • the memory 112 can also be a read-only memory 112 (read only memory, ROM).
  • the read-only memory 112 can be a programmable read-only memory 112 (programmable read only memory, PROM), an erasable programmable read-only memory 112 (erasable programmable read only memory, EPROM), etc.
  • the memory 112 may also be a flash memory medium (FLASH), a hard disk drive (HDD) or a solid state drive (SSD), etc.
  • the processor 111 is connected to the memory 112 via a double data rate (DDR) bus or other types of buses.
  • DDR double data rate
  • the memory 112 is understood as the internal memory of the node 110, which is also called the main memory.
  • the processor 111 can execute all the methods that the node 110 needs to execute in the embodiment shown in Figure 3 below by calling the computer program instructions in the memory 112.
  • the processor 111 can also execute part of the methods that the node 110 needs to execute in the embodiment shown in Figure 3 below by calling the computer program instructions in the memory 112.
  • the processor 111 can complete the directory of the container's file and the mount operation of the remote file system, as well as the creation operation of the container.
  • the operation of the node 110 accessing the remote file system is performed by the acceleration device 114.
  • the operation of accessing the remote file system refers to reading data from the remote file system or writing data to the remote file system.
  • the acceleration device 114 includes a data processing unit (DPU) 1141.
  • the acceleration device 114 also includes a memory 1142, a power supply circuit, etc.
  • the DPU 1141 is connected to the memory 1142 via a system bus.
  • the system bus can be a PCIe-based line or a bus of CXL, USB protocol or other protocols.
  • DPU1141 is the main computing unit of the acceleration device 114 and the core unit of the acceleration device 114. DPU1141 assumes the main functions of the acceleration device 114. For example, some functions of the node 110 can be unloaded to DPU1141, and DPU1141 processes data and executes tasks assigned to the acceleration device 114 by the node 110. DPU1141 executes some methods that the node 110 needs to execute in the embodiment shown in Figure 3 below by calling the computer program instructions in the memory 1142. DPU1141 can access the remote file system, read data from the remote file system or write data to the remote file system. The data here can be data in the rootfs, image files or data in PVC.
  • the container creation method provided by the implementation of the present application is described below in conjunction with Figure 3.
  • the container is created on node 110M and node 110N for illustration.
  • Figure 3 only exemplarily describes the process of creating a container on node 110M and node 110N respectively.
  • the method of creating multiple containers on node 110 is similar to the method of creating a container on node 110, and will not be repeated here.
  • the container creation method shown in Figure 3 includes three parts. The first part is the process of mounting the directory of the container's file and the remote file system, see steps 300 to 302, the second part is the container creation process, see steps 303 to 305, and the third part is the process of writing and reading the container's files during the container's operation, see steps 306 to 307.
  • Step 300 The user configures the mounting information of the container file to trigger the container file mounting process, wherein the mounting information of the container file describes the association between the three files of the container and the remote file system.
  • Method 1 Use the container cluster management system to configure the mount information of the container file.
  • users can manage containers through k8s.
  • users can configure some necessary parameters on the client side of k8s deployed on the user side.
  • users can configure the image file that the container needs to be based on.
  • the user can enter an identifier that can uniquely identify the image file.
  • the identifier of the image file can be configured by the image warehouse 130 for the image file, or it can be set by the image file designer when the image file is stored in the image warehouse 130.
  • the user can configure the number of container warehouses. Among them, the number of containers in each container warehouse is pre-configured.
  • users can configure the mounting information of container files in k8s.
  • users can configure the file directory, the type of remote file system mounted on the file directory, and the name of the remote file system mounted on the file directory.
  • FIG4 it is a schematic diagram of an interface for configuring the mount information of a container file provided by k8s for users.
  • an interface for the mount information of the rootfs of the container is provided, wherein the mount information for the rootfs of the container includes the directory of the rootfs, the type of the remote file system mounted by the directory of the rootfs, and the name of the remote file system mounted by the directory of the rootfs.
  • the mount information may also include the entry address of the remote file system.
  • the entry address of the remote file system is the address of the remote storage device 120, and the entry address is used for data transmission between the node 110 and the remote file system through the network.
  • the entry address may be an Internet protocol (IP) address or a media access control address (MAC).
  • IP Internet protocol
  • MAC media access control address
  • the embodiment of the present application does not limit the specific type of the entry address. Any address that can realize communication with the remote storage address where the remote file system is located is used in the embodiment of the present application.
  • An interface for mounting information of the image file of the container is provided, including the directory of the image file, the type of the remote file system mounted by the directory of the image file, and the name of the remote file system mounted by the directory of the image file.
  • the mounting information may also include the entry address of the remote file system.
  • the entry address of the remote file system is used for data transmission between the node 110 and the remote file system.
  • An interface for mounting information of the container's PVC is provided, including the directory of the PVC, the type of the remote file system mounted by the directory of the PVC, and the name of the remote file system mounted by the directory of the PVC.
  • the mounting information may also include an entry address of the remote file system. The entry address of the remote file system is used for data transmission between the node 110 and the remote file system.
  • two different types of remote file systems are involved.
  • corresponding type names can be pre-designed for the two different types of remote file systems.
  • the type name of the shared remote file system is designed to be SFS
  • the type name of the shared remote file system is designed to be EFS.
  • the type names of the two different types of remote file systems are recognizable by node 110.
  • the container storage interface (CSI) plug-in in node 110 can be updated so that node 110 has the function of recognizing the type names of the two different types of remote file systems, and can automatically execute the type name mount instruction carrying any type of remote file system.
  • the directory of the rootfs is rootfs-1, indicating that the name of the folder storing the rootfs is rootfs-1;
  • the type of the remote file system mounted by the directory of the rootfs is EFS, indicating that the directory of the rootfs needs to be mounted in an exclusive remote file system, and
  • the name of the remote file system mounted by the directory of the rootfs is file system A, indicating that the file name of the exclusive remote file system is file system A.
  • the entry address of the remote file system is 10.10.0.1, indicating that the entry address of the remote file system is 10.10.0.1, and the node 110 can read data in the rootfs from the remote file system through the network or write data to the remote file system based on the entry address.
  • the directory of the image file is image-1, which indicates the name of the folder where the image file is stored. It is called image-1;
  • the type of the remote file system mounted on the directory of the image file is SFS, indicating that the directory of the image file needs to be mounted on a shared remote file system;
  • the remote file system name mounted on the directory of the image file is file system B, indicating that the file name of the shared remote file system is file system B.
  • the entry address of the remote file system is 10.10.0.2, indicating that the entry address of the remote file system is 10.10.0.2, and node 110 can read the image file from the remote file system through the network or write the image file to the remote file system based on the entry address.
  • the directory of the PVC is PVC-1, indicating that the name of the folder storing the PVC is PVC-1;
  • the type of the remote file system mounted by the directory of the image file is SFS, indicating that the directory of the image file needs to be mounted in a shared remote file system;
  • the name of the remote file system mounted by the directory of the image file is file system C, indicating that the file name of the shared remote file system is file system C.
  • the entry address of the remote file system is 10.10.0.3, indicating that the entry address of the remote file system is 10.10.0.3, and the node 110 can read the data in the PVC from the remote file system through the network or write the data in the PVC to the remote file system based on the entry address.
  • k8s can arrange multiple containers according to the user's configuration, determine which nodes 110 to deploy pods on, and how many pods to deploy on each node 110.
  • K8s sends a container mount request to the determined node 110, and the container mount request carries the mount information of the container file to request the node 110 to complete the mounting of the directory of the container file and the remote file system, triggering the container file mount process (that is, steps 301 to 303).
  • k8s can send a container mount request to node 110M and node 110N, and the container mount request can carry the configuration information of the container file configured in Figure 4.
  • k8s can also send an image pull request to some of the determined nodes 110, and the image pull request is used to request the node 110 to pull the image file from the image warehouse 130, and the image pull request carries the identifier of the image file. Taking k8s determining to deploy pods in nodes 110M and 110N as an example, k8s does not need to send image pull requests to nodes 110M and 110N respectively, but only needs to send image pull requests to nodes 110M or 110N.
  • the k8s side can provide the user with the configuration function of mounting information of three types of files of the container as shown in Figure 4, or it can only provide the user with the configuration function of mounting information of one or two files.
  • Method 2 The user updates the configuration file of the container and mounts the container file.
  • the node 110 stores a configuration file of the container, which records some parameters for creating the container.
  • the configuration file of the container may include mounting information of one or more files in the container.
  • the mounting information for any file in the configuration file of the container is preset information. These preset information are allowed to be changed.
  • the configuration file of the container includes preset mounting information for the image file of the container and mounting information for the rootfs of the container. Mounting information for the image file of the container.
  • the preset mount information of the rootfs for the container includes the preset directory of the rootfs, the type of the remote file system on which the preset directory of the rootfs is mounted, and the name of the remote file system on which the preset directory of the rootfs is mounted.
  • the mount information may also include an entry address of the preset remote file system.
  • the user can modify the mounting information of the rootfs for the container, such as modifying the directory of the rootfs, the type of the remote file system mounted by the directory of the rootfs, the name of the remote file system mounted by the directory of the rootfs, and other information.
  • the user can modify the directory of the rootfs to rootfs-A, modify the type of the remote file system mounted by the rootfs directory to EFS, and modify the name of the remote file system mounted by the rootfs directory to file system A.
  • the preset mounting information of the image file for the container includes the preset directory of the image file, the type of the remote file system on which the preset directory of the image file is mounted, and the name of the remote file system on which the preset directory of the image file is mounted.
  • the mounting information may also include the entry address of the remote file system.
  • the user can modify the directory of the image file to image-B, modify the type of the remote file system mounted on the directory of the image file to SFS, and modify the name of the remote file system mounted on the directory of the image file to file system B.
  • the subsequent node 110 when the subsequent node 110 needs to create a container, it will retrieve the modified mounting information and mount the directory of the container's files on the specified file system.
  • Method three directly sending a mount instruction to the node 110, wherein the mount instruction carries the mount information of the container file.
  • the user directly sends a mount instruction to the node 110M and the node 110N.
  • the user can directly operate the node 110 and input the mount instruction through an input and output device external to the node 110M and the node 110N.
  • mount command is as follows: mount -t remote file system entry address remote file system type remote file system name file directory.
  • the user enters the following three mount commands:
  • Mount command 1 indicates that the folder named rootfs-A is mounted to the exclusive remote file system named file system A with the entry address 10.10.0.1.
  • Mount command 2 indicates that the folder named image-B is mounted to the shared remote file system named file system B with the entry address 10.10.0.2.
  • Mount command 3 indicates that the folder named PVC-B is mounted to the shared remote file system named file system C with the entry address 10.10.0.3.
  • the file mount process of the container is triggered (ie, steps 301 to 302).
  • the above methods are only examples.
  • users when configuring the mount information of container files, users can use any of the above methods; they can also use multiple of the above methods.
  • users can configure the mount information of container PVC in the container cluster management system, and configure the mount information of container roofs and the mount information of image files by modifying the configuration information of the container on the node 110.
  • the embodiments of the present application can also use methods other than the above three methods to configure the mount information of the container.
  • the following description takes the creation of containers by the node 110M and the node 110N as an example.
  • Step 301 Node 110M mounts the directory of the container's rootfs to file system A, mounts the directory of the container's image file to file system B, and mounts the directory of the container's PVC to file system C.
  • node 110M (such as processor 111 in node 110M) can automatically execute the mount instruction.
  • the mount instruction automatically executed by node 110M is an unloading instruction similar to the aforementioned mount instruction 1, mount instruction 2, and mount instruction 3.
  • node 110M obtains the mount information of the file of the container modified by the user, and node 110M (such as processor 111 in node 110M) can automatically execute the mount instruction.
  • the mount instruction automatically executed by node 110M is an unloading instruction similar to the aforementioned mount instruction 1, mount instruction 2, and mount instruction 3.
  • node 110M (such as processor 111 in node 110M) can automatically execute the mount instruction.
  • the mount instructions automatically executed by node 110M are the mount instruction 1, mount instruction 2, and mount instruction 3 mentioned above.
  • node 110M By executing these mount instructions, node 110M establishes a connection with the remote storage device 120 on which the remote file system is deployed (node 110M communicates with the remote storage device 120 to inform the remote storage device 120 that some data will be written to the remote storage device 120 later), and establishes an association relationship between the directory of the container's files and the remote file system, so that the files under the directory of the container's files can be written to the remote file system associated with it.
  • the node 110M establishes a connection between the node 110M and the remote storage system 120A where the file system A is deployed by executing the mount instruction 1, mounts the directory of the rootfs of the container to the file system A, and establishes an association relationship between the directory of the rootfs of the container and the file system A.
  • the node 110M establishes a connection with the remote storage system 120B where the file system B is deployed by executing the mount instruction 2, mounts the directory of the container's image file to the file system B, and establishes an association relationship between the directory of the container's image file and the file system B.
  • the node 110M establishes a connection between the node 110M and the remote storage system 120C where the file system C is deployed by executing the mount instruction 3, mounts the directory of the container's PVC to the file system C, and establishes an association relationship between the directory of the container's PVC and the file system C.
  • Step 302 Node 110N mounts the directory of the container's rootfs to file system A, mounts the directory of the container's image file to file system B, and mounts the directory of the container's PVC to file system C.
  • the way in which node 110N performs step 302 is similar to the way in which node 110M performs step 301. For details, please refer to the above content and will not be repeated here.
  • Step 303 the node 110M obtains the image file from the image repository 130, and writes the image file into the image file directory.
  • node 110M (such as the processor 111 in node 110M) can receive the image pull request sent by k8s. After receiving the image pull request, node 110M (such as the processor 111 in node 110M) pulls the image file from the image repository 130 according to the identifier of the image file carried in the image pull request; node 110M (such as the processor 111 in node 110M or the acceleration device 114 in node 110M) writes the image file to the directory of the image file.
  • the process in which the node 110M writes the image file into the directory of the image file is essentially a process in which the node 110M writes the image file into the file system B mounted with the directory of the image file.
  • the node 110M (such as the processor 111 in the node 110M) may be deployed with a client of the file system B.
  • the client can run on the processor 111 in the node 110M.
  • the file system B client can communicate with the remote storage device 120B, transfer the image file to the remote storage device 120B, and store it in the file system B.
  • the node 110M may offload the function of accessing the remote file system to the acceleration device 114, that is, the acceleration device 114 communicates with each remote storage device 120 to access the remote file system.
  • step 303 after the processor 111 of the node 110M obtains the image file from the image repository 130, the acceleration device 114 may write the image file into the directory of the image file.
  • the acceleration device 114 may write the image file into the file system B mounted with the directory of the image file.
  • a client of file system B is deployed in the acceleration device 114, and the client of file system B can run on the DPU 1141 of the acceleration device 114.
  • the client of file system B can communicate with the remote storage device 120B, transfer the image file to the remote storage device 120B, and store it in the file system B.
  • the node 110M pulls the image file from the image warehouse 130, it can only obtain the incremental data of the image file, which refers to the difference data between the image file currently to be pulled and the image file already saved by the node 110M.
  • the image file saved by the node 110M refers to the image file saved in the file system B.
  • the incremental data refers to the difference data between the image file currently to be pulled and the image file already saved by the file system B.
  • the embodiment of the present application does not limit the granularity of the incremental data.
  • the image file is a hierarchical structure, that is, the image file includes multiple layers, and each layer of data can be divided into multiple block data.
  • the incremental data can be one or more layers in the image file, or one or more block data in the image file, and the multiple block data can be multiple block data in a layer of data in the image file, or multiple block data in different layers of data in the image file.
  • the embodiment of the present application does not limit the manner in which the node 110M obtains only the incremental data of the image file from the image repository 130.
  • the node 110M may record the identifier of the image file saved in the file system B, or the node 110M may interact with the file system B to obtain the identifier of the image file saved in the file system B.
  • the image file saved in the file system B refers to the image file saved in the remote storage device 120B.
  • the node 110M When the node 110M pulls the image file from the image repository 130, it can send the identifier of the saved image file and the identifier of the current ongoing file to be pulled to the image repository 130.
  • the image repository 130 determines the incremental data of the image file that needs to be pulled according to the identifier of the saved image file and the identifier of the current ongoing file to be pulled.
  • the image repository 130 sends the incremental data to the node 110M.
  • the node 110M writes the image file to the directory of the image file, that is, the node 110M writes the image file to the file system B.
  • the node 110M may also only save the incremental data of the image file in the file system B. In this scenario, there are two possible situations:
  • the node 110M directly writes the incremental data of the image file into the directory of the image file, that is, directly saves the incremental data of the image file in the file system B.
  • node 110M pulls the entire image file from the image file.
  • Node 110M can check file system B (such as node 110M can retrieve the image file in the directory of the image file associated with file system B) to determine the difference data between the image file saved in file system B and the currently pulled image file, that is, to determine the incremental data of the image file. Write the incremental data of the image file to the directory of the image file.
  • Node 110M can also directly write the image file to the directory of the image file, that is, to send the image file to file system B.
  • the remote storage device 120B After receiving the image file, the remote storage device 120B can determine the difference data between the image file saved in file system B and the currently received image file, that is, to determine the incremental data of the image file, and the remote storage device 120B only saves the incremental data of the image file.
  • the above-mentioned node 110M saves the incremental data of the image file in file system B by way of example only, and the embodiment of the present application does not limit the way in which node 110M saves the incremental data of the image file in file system B.
  • a method for node 110M to pull the incremental data of an image file from the image repository 130 and save the incremental data of the image file in the file system is introduced below.
  • the method includes the following steps:
  • Step 1 Node 110M sends an image request to the image repository 130 , and the image request carries the identifier of the image file to be pulled.
  • Step 2 After receiving the image request, the image repository 130 determines the image file to be pulled according to the image file identifier and sends a request to the node 110M sends summary information of the image file to node 110M.
  • the summary information is used to indicate the content of the image file.
  • the content of the image file includes but is not limited to: each layer included in the image file, and fingerprint information of each layer.
  • the fingerprint information of each layer can be understood as the identification of data, and the data included in the layer can be determined based on the fingerprint information.
  • the fingerprint information of each layer can be based on the granularity of block data, that is, each block data in the layer corresponds to the fingerprint information of one block data.
  • the fingerprint information of each layer includes the fingerprint information of each block data in the layer.
  • a layer of data in an image file has 1 megabyte (MB)
  • it is first divided into 1024 blocks of 1 kilobyte (kB).
  • kB kilobyte
  • a fingerprint information is calculated for each block of data.
  • the division method of each layer of data is only an example, and the embodiment of the present application does not limit the division method of the block data.
  • the calculation method of the fingerprint information here is also only an example, and the embodiment of the present application does not limit the calculation method of the fingerprint information.
  • Step 3 After receiving the summary information of the mirror, the node 110M sends the summary information to the remote storage device 120B.
  • Step 4 After receiving the summary information, the remote storage device 120B can determine which block data corresponding to the fingerprint information has been stored in the file system B and which block data corresponding to the fingerprint information has not been stored in the file system B based on the fingerprint information in the star-picking information. These block data not stored in the file system B are the incremental data of the mirror file.
  • Step 5 The remote storage device 120B generates indication information of incremental data, where the indication information of incremental data is used to indicate the incremental data of the image file.
  • the embodiment of the present application does not limit the manner in which the indication information of incremental data indicates the incremental data of the image file.
  • the indication information of incremental data includes fingerprint information of block data that is not stored in the file system B.
  • the indication information of incremental data indicates whether each block data block in the image file has been stored in the remote storage device 120B.
  • Step 6 The remote storage device 120B sends the indication information of the incremental data to the node 110M, and then the node 110M sends the indication information of the incremental data to the image repository 130.
  • Step 7 After receiving the indication information of the incremental data, the image repository 130 may send the image file to the node 110M.
  • the block data in the image file sent by the image repository 130 that is already stored in the file system B can be used to replace the fingerprint information of the block data, and the block data that is not stored in the file system B is the block data itself.
  • Step 8 The node 110M writes the image file into the directory of the image file, that is, the node 110M sends the image file to the remote storage device 120B.
  • Step 9 After receiving the image file, the remote storage device 120B saves the image file.
  • k8s sends an image pull request to node 110M to make node 110M execute step 303 as an example.
  • the user can also directly type in the node 110M an instruction for instructing to pull the image file, so that node 110M can execute step 303 under the user's trigger.
  • the embodiment of the present application does not limit the method of triggering node 110M to execute step 303, and any method that can make node 110M execute step 303 is applicable to the embodiment of the present application.
  • Step 304 the node 110M (eg, the processor 111 in the node 110M) creates a container on the node 110M based on the image file.
  • the node 110M eg, the processor 111 in the node 110M
  • Case 1 After node 110M writes the image file to the directory of the image file, k8s can send a container creation request to node 110M to request node 110M to create a container.
  • the container creation request carries the number of pods to be deployed in node 110M.
  • node 110M After receiving the container creation request, node 110M can automatically execute the container creation instruction (such as the docker run instruction) to load the required data in the image file to the local computer, and create a container by running the program in the image file, calling the library and resources in the image file, and completing the configuration of the configuration parameters in the image file.
  • the container creation instruction such as the docker run instruction
  • Case 2 After node 110M writes the image file to the directory of the image file, the user can type a container creation instruction in node 110M to directly instruct node 110M to create a container. After detecting the container creation instruction, node 110M can execute the container creation instruction, load the required data in the image file to the local, and create a container by running the program in the image file, calling the library and resources in the image file, and completing the configuration of the configuration parameters in the image file.
  • step 301 only the directory of files of a general container is mounted on the remote file system in node 110.
  • the directory of files of the container in step 301 is not for a specific container.
  • the directory of files of the container configured in step 301 needs to be associated with the created container. So that the directory of files of the container in step 301 is the directory of files of the created container.
  • the node 110M configures the directory of the file of the container for the created container. For example, the node 110M associates the directory of the rootfs of the container named rootfs-A in the uninstall instruction 1 with the directory rootfs-1 of the file of the rootfs, and the node 110M associates the directory of the image file of the container named image-B in the uninstall instruction 2 with the directory image-1 of the file of the image. The node 110M associates the directory of the image file of the container named image-B in the uninstall instruction 2 with the directory image-1 of the file of the image. In 3, the directory of the image file named PVC-B container is associated with the directory PVC-1 of the image file.
  • the process of loading the image file locally can be executed by the processor 111 of the node 110M, or by the acceleration device 114 in the node 110M, so as to reduce the occupation of the processor 111.
  • the processor 111 or the acceleration device 114 (such as the DPU 1141 in the acceleration device 114) communicates with the remote storage device 120B to obtain the image file.
  • the image file in the image warehouse 130 is a compressed image file, that is, the image file obtained by the node 110M is a compressed image file.
  • the node 110M (such as the processor 111 or the acceleration device 114 of the node 110M) decompresses the compressed image file. After decompression, the decompressed image file is written to the directory of the image file, that is, the decompressed image file is stored in the remote file system mounted with the directory of the image file.
  • the decompression operation can also be performed by the remote storage device 120B, that is, when the processor 111 or the acceleration device 114 in the node 110M writes the compressed image file to the file system B mounted by the directory of the image file, the remote storage device 120B can decompress the compressed image file.
  • the processor 111 or the acceleration device 114 (such as the DPU 1141 in the acceleration device 114) of the node 110M loads the required data in the decompressed image file locally and creates a container based on the loaded data.
  • the node 110M may create the container that needs to be deployed on the node 110M based on an overlay system.
  • the overlay system is described below.
  • the overlay system is a special file system, which is a multi-layer file system.
  • the layered result of the overlay system can be seen in Figure 5.
  • the right part of Figure 5 shows the three hierarchical directories in the overlay system, which are the mount point (merged), the read-write layer (upperdir) and the read-only layer (lowerdir).
  • the upperdir and lowerdir can be mounted to the same or different file systems.
  • the data in the merged directory is a combination of the data in the upperdir and lowerdir directories.
  • the files or directories in the upperdir will overwrite the files or directories with the same name in the lowerdir layer.
  • the file2 seen in the merged directory is the file2 in the upperdir directory, not the file2 in the lowerdir.
  • the three directories are presented as an overlay file system.
  • the mount point is the merged directory, that is, upperdir and lowerdir are invisible.
  • the data is actually operated in upperdir and lowerdir.
  • Read operation that is, reading data. For example, when reading the data of file1, file1 will be read from lowerdir. For another example, when reading file2, file1 will be read from upperdir.
  • write operation that is, writing data.
  • file1 will be read from lowerdir first, the data in file1 will be modified, and then the modified file1 will be saved in upperdir to create file1.
  • the files required by the container can be stored in the overlay system.
  • the image folder can be stored in the lowerdir to ensure that the image file is not modified. Modifications to the image and creation of temporary files (i.e., roofs) in the container are placed in the upperdir layer.
  • lowerdir corresponds to the directory of images of the container. In other words, lowerdir is associated with the directory of the image file, and because the directory of the image file is mounted to the remote file system, in essence, lowerdir is associated with the remote file system.
  • upperdir corresponds to the directory of the rootfs of the container. In other words, upperdir is associated with the directory of the rootfs, and because the directory of the rootfs is mounted to the remote file system, in essence, upperdir is associated with the remote file system.
  • Step 305 Node 110N obtains the image file and creates a container on node 110N.
  • the way node 110N performs step 305 is similar to the way node 110M performs step 304.
  • node 110M since node 110M writes the image file to the file system B mounted with the directory of the image file, and since the directory of the image file of the container in node 110N is also mounted on the file system B, node 110M puts the image file into the file system B, which is equivalent to writing the image file to the directory of the image file of the container in node 110N. Since file system B is a shared remote file system, node 110N can directly load the image file from file system B to the local, and create a container by running the program in the image file, calling the library and resources in the image file, and completing the configuration of the configuration parameters in the image file.
  • multiple nodes 110 are allowed to mount the directory of a certain file of the container to the same shared file system. In this way, when the container on one of the multiple nodes 110 writes the data in the file to the directory of the file, the data will be written to the shared file system, and the remaining nodes 110 in the multiple nodes 110 can obtain the data in the file in the shared file system.
  • the remote storage device 120 where the shared file system is located is the node for deploying the same type of container.
  • Point 110 configures the same storage space for storing image files.
  • any node 110 of the nodes 110 that deploy the same type of container obtains the image file and writes it to the directory of the image file
  • the image file will be written to the storage space.
  • any node 110 of the nodes 110 that deploy the same type of container can also obtain the data in the storage space.
  • the remaining nodes 110 of the nodes 110 that deploy the same type of container can read the image file in the storage space by checking the data in the directory of their own image files.
  • the remote storage device 120 where the shared file system is located can configure the same storage space for storing image files for the nodes 110 that deploy the same type of containers. For example, when k8s determines which nodes 110 need to deploy the same type of containers, k8s can send an indication message to the remote storage device 120, which notifies the remote storage device 120 of the nodes 110 that need to allocate the same storage space for image files (the indication message can carry the identifier of the node 110), and the remote storage device 120 is the remote storage device 120 where the shared file system is mounted by the directory of the image files of the containers in these nodes 110.
  • the node 110 when the node 110 communicates with the remote storage device 120 by executing the mount command for the image file, the node 110 can inform the remote storage device 120 of its own node 110 identifier, so that the remote storage device 120 can determine the nodes 110 that need to allocate the same storage space for image files through the identifiers of each node 110 obtained, and allocate the same storage space for them.
  • these nodes 110 when these nodes 110 communicate with the remote storage device 120 by executing a mount command for an image file, these nodes 110 can inform the remote storage device 120 of the image file identifier, so that the remote storage device 120 can determine which nodes 110 need to store the same image file in the directory of the container's image file through the image file identifier sent by each node 110, and these nodes 110 that send the same image file identifier are the nodes 110 that need to allocate the same segment of storage space for storing image files. Only two of these methods are listed here, and this application is also applicable to other methods that enable the remote storage device 120 where the shared file system is located to configure the same segment of storage space for storing data in the container's files for the nodes 110 that deploy the same type of container.
  • the containers in node 110M and node 110N have been created.
  • the container runs the application deployed on it, and the user can also operate in the container through the client deployed on the user side, such as the user can view data, modify data, and save data in the container.
  • the client here can be understood as the client software used to control the container or the client device used to operate the container (that is, a client with hardware form).
  • Step 306 After the container in the node 110M is created, the node 110M writes temporary data, intermediate data and other data during the container operation process into the rootfs directory, and writes data that needs to be persisted generated during the container operation process into the PCV file directory.
  • the application deployed in the container will perform some services, such as database services, voice call services, video encoding and decoding services, etc. In the process of performing these services, some data will be generated.
  • the application can write temporary data or intermediate data that does not need to be stored for a long time (that is, data in rootfs) into the rootfs directory. Some data that needs to be persisted in these data is written to the directory of the PCV file.
  • the processor 111 in the node 110M When the processor 111 in the node 110M detects that the application needs to write data to the rootfs directory, the processor 111 in the node 110M writes the data to the file system A mounted on the rootfs directory.
  • the processor 111 in the node 110M can send the rootfs to the remote storage device 120A and store it in the file system A.
  • the way in which the processor 111 in the node 110M writes data to the file system A mounted on the rootfs directory is similar to the way in which the processor 111 in the node 110M writes the image file to the file system A mounted on the image directory, and will not be repeated here.
  • the processor 111 in the node 110M When the processor 111 in the node 110M detects that the application needs to write data to the directory of the PVC, the processor 111 in the node 110M writes the data to the file system A mounted on the directory of the PVC.
  • the processor 111 in the node 110M can send the data to the remote storage device 120A and store it in the file system A.
  • the way in which the processor 111 in the node 110M writes the data to the file system A mounted on the directory of the rootfs is similar to the way in which the processor 111 in the node 110M writes the image file to the file system A mounted on the directory of the image, and will not be repeated here.
  • the acceleration device 114 can replace the processor 111 in the node 110M to write data into the file system A mounted on the directory of the rootfs.
  • the way in which the acceleration device 114 in the node 110M writes data into the file system A mounted on the directory of the rootfs is similar to the way in which the processor 111 in the node 110M writes data into the file system A mounted on the directory of the rootfs, except that the execution subject is different. For details, please refer to the above description, which will not be repeated here.
  • the acceleration device 114 can replace the processor 111 in the node 110M to write data into the file system A mounted on the directory of the PVC.
  • the acceleration device 114 in the node 110M writes data into the file system C mounted on the directory of the PVC in a manner similar to the manner in which the processor 111 in the node 110 writes data into the file system C mounted on the directory of the PVC, except that the execution
  • the line body is different, please refer to the above description for details, and will not be repeated here.
  • the processor 111 or the acceleration device 114 in the node 110M can also obtain data from the file system A mounted on the rootfs directory and load it to the local node 110M for the application to call.
  • the operation performed by the acceleration device 114 is suitable for the scenario of unloading the function of accessing the remote file system to the acceleration device 114.
  • the processor 111 or acceleration device 114 in node 110M can also obtain data from the file system C mounted on the directory of the PVC and load it to the local node 110M for the application to call.
  • the operation performed by the acceleration device 114 is suitable for the scenario of offloading the function of accessing the remote file system to the acceleration device 114.
  • users can also perform some operations in the container, such as data modification and data saving. Users can save some data as data in the rootfs directory or save some data as data in the PVC directory according to their own needs.
  • the processor 111 in the node 110M When the processor 111 in the node 110M detects that the user needs to write data to the directory of the PVC, the processor 111 in the node 110M writes the data to the file system C mounted on the directory of the PVC.
  • the processor 111 in the node 110M may send the data to the remote storage device 120C and store it in the file system C.
  • the way in which the processor 111 in the node 110M writes the data to the file system C mounted on the directory of the PVC is similar to the way in which the processor 111 in the node 110M writes the image file to the file system B mounted on the directory of the image file, and will not be described in detail here.
  • the acceleration device 114 in the node 110M can replace the processor 111 in the node 110M to write data into the file system C mounted on the directory of the PVC.
  • the way in which the acceleration device 114 in the node 110M writes data into the file system C mounted on the directory of the PVC is similar to the way in which the processor 111 in the node 110M writes data into the file system C mounted on the directory of the PVC, except that the execution subject is different.
  • Step 307 After the container is created, the node 110N writes temporary data and other data during the container operation process to the rootfs directory, and writes data that needs to be persisted during the container operation process to the PCV file directory.
  • the way in which the node 110N performs step 307 is similar to the way in which the node 110M performs step 306. For details, please refer to the above description, which will not be repeated here.
  • the embodiment of the present application also provides a container creation device, which is used to execute the method executed by the node 110N in the method embodiment shown in FIG. 3 .
  • the relevant features can be found in the method embodiment, which will not be described here.
  • the container creation device 600 includes a first mounting module 601 , a first acquisition module 602 , and a first creation module 603 .
  • the first mounting module 601 is used to mount the directory of the image file of the container to the first file system on the remote storage device.
  • the first acquisition module 602 is used to acquire an image file from a first file system.
  • the first creation module 603 is used to create a container on the first node based on the image file.
  • the first mounting module 601 may mount the directory of the root file of the container to the second file system on the remote storage device; and mount the directory of the persistent volume PVC of the container to the third file system on the remote storage device.
  • the embodiment of the present application also provides another container creation device, which is used to execute the method executed by the node 110M in the method embodiment shown in Figure 3.
  • the container creation device 700 includes a second mounting module 701, a second acquisition module 702, and optionally, a second creation module 703.
  • the second mounting module 701 is used to mount the directory of the image file of the container to the first file system.
  • the second acquisition module 702 is used to acquire the image file from the image repository and store the image file in the first file system.
  • the second creation module 703 may obtain the image file from the first file system, and create the container on the second node based on the image file.
  • the second acquisition module 702 when storing the image file in the first file system, stores incremental data in the image file in the first file system, where the incremental data is data different from the image file in other image files stored in the first file system.
  • the second acquisition module 702 acquires incremental data in the image file from the image repository when acquiring the image file from the image repository, where the incremental data is data different from the image file in other image files stored in the first file system.
  • the second acquisition module 702 obtains the image file from the image repository, and when storing the image file in the first file system, decompresses the compressed image file obtained from the image repository, and stores the decompressed image file in the first file system.
  • the division of modules in the embodiments of the present application is schematic and is only a logical function division. There may be other division methods in actual implementation.
  • the functional modules in the embodiments of the present application may be integrated into a processing module, or each module may exist physically separately, or two or more modules may be integrated into one module.
  • the above-mentioned integrated modules may be implemented in the form of hardware or in the form of software functional modules.
  • the above embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination thereof.
  • the above embodiments may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from one website, computer, server or data center to another website, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that a computer can access or a data storage device such as a server or data center that contains one or more available media sets.
  • the available medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium.
  • the semiconductor medium may be a solid state drive (SSD).
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment in combination with software and hardware. Moreover, the present application may adopt the form of a computer program product implemented in one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) that contain computer-usable program code.
  • a computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.

Abstract

一种容器创建方法、系统及节点,第一节点能够在创建容器之前,将容器的镜像文件的目录挂载到远端存储设备上的第一文件系统,建立镜像文件的目录与第一文件系统的关联,这样第一节点写入到镜像文件的目录下的镜像文件会存储在第一文件系统中。其中,远端存储设备为独立于第一节点的存储设备。第一节点在需要创建容器时,先从第一文件系统获取镜像文件,基于镜像文件创建第一容器。第一节点上容器的镜像文件的目录能够挂载到远端存储设备上的第一文件系统,这样镜像文件可以保存在远端存储设备上,能够减少镜像文件对第一节点的本地存储空间的占用。

Description

一种容器创建方法、系统及节点
相关申请的交叉引用
本申请要求在2022年09月29日提交中国专利局、申请号为202211205094.7、申请名称为“一种容器创建方法、系统及节点”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及存储技术领域,尤其涉及一种容器创建方法、系统及节点。
背景技术
容器是基于镜像(image)文件创建,镜像文件提供容器运行时所需的程序、库、资源、配置参数等文件。
节点需要创建容器时,每个节点均需要从镜像仓库中拉取镜像文件,并将该镜像文件写入到该节点中镜像文件的目录下。该节点中镜像文件的目录描述了该镜像文件在该节点中的存储位置。通常该节点中镜像文件的目录所指示的存储位置位于该节点的本地存储设备中。该节点的本地存储设备是指节点中的硬盘等存储设备。
由于每个节点均需要执行一次从镜像仓库中拉取镜像文件的操作,会造成该镜像仓库的压力。另外,镜像文件直接保存在节点的本地存储设备中,会占用节点自身的存储空间,压缩了节点自身的存储空间。
发明内容
本申请提供一种容器创建方法、系统及节点,用以减少容器创建过程中对节点本地存储空间的占用。
第一方面,本申请实施例提供了一种容器创建方法,在该方法中,第一节点能够在创建容器之前,将容器的镜像文件的目录挂载到远端存储设备上的第一文件系统,建立镜像文件的目录与第一文件系统的关联,这样第一节点写入到镜像文件的目录下的镜像文件会存储在第一文件系统中。其中,该远端存储设备为独立于第一节点的存储设备。该远端存储设备是第一节点之外的存储设备,例如,该远端存储设备可以与该第一节点通过网络连接。
第一节点在需要创建容器时,可以先从第一文件系统获取镜像文件,基于镜像文件在第一节点上创建容器。该第一节点对应于本申请实施例中的节点110N。
通过上述方法,第一节点的镜像文件的目录能够挂载到远端存储设备上的第一文件系统,这样镜像文件可以保存在远端存储设备上,能够减少镜像文件对第一节点的本地存储空间的占用。
在一种可能的实施方式中,与第一节点类似,第二节点也可以将容器的镜像文件的目录挂载到第一文件系统。这里以第二节点为首个将镜像文件存储到第一文件系统的节点为例进行说明。第二节点从镜像仓库中获取镜像文件,将镜像文件存储第一文件系统。该第二节点对应于本申请实施例中的节点110M。
通过上述方法,第二节点从镜像仓库拉取镜像文件,并将镜像文件写入到第一文件系统中,这样第一节点在需要创建容器时,无需再从镜像仓库拉取镜像文件,仅需要从第一文件系统获取该镜像文件即可,减少了需部署容器的节点从镜像仓库拉取镜像文件的次数,实现镜像文件在多个节点之间的共享。
在一种可能的实施方式中,第二节点在需要创建容器时,从第一文件系统获取镜像文件,基于镜像文件在第二节点上创建容器。
通过上述方法,第二节点在创建容器时,也可以从第一文件系统中获取镜像文件,进一步保证了该镜像文件在第一节点以及第二节点之间的共享。
在一种可能的实施方式中,第二节点将镜像文件存储在第一文件系统时,第二节点可以将镜像文件中的增量数据存储在第一文件系统中,增量数据为第一文件系统中已存储的其他镜像文件与镜像文件不同的数据。
通过上述方法,第二节点无需将完整的镜像文件保存在第一文件系统中,仅需要在第一文件系统中保存一些未保存的数据,减少对远端存储设备中存储空间的占用,减少第二节点与第一文件系统之间需要交互的数据量。
在一种可能的实施方式中,第二节点从镜像仓库中获取的镜像文件时,可以从镜像仓库中获取的镜 像文件中的增量数据,增量数据为第一文件系统中已存储的其他镜像文件与镜像文件不同的数据。
通过上述方法,第二节点与镜像仓库之间仅需交互镜像文件的增量数据,能够有效减少第二节点与镜像仓库之间的传输的数据量,提升镜像文件的拉取速度。
在一种可能的实施方式中,第一节点与第二节点可以位于不同的数据中心,第一节点与第二节点也可以位于同一数据中心。
通过上述方法,第一节点与第二节点的部署方式较为灵活,适用于不同的场景。
在一种可能的实施方式中,对于远端存储设备与第一节点的部署位置,远端存储设备与第一节点可以位于同一数据中心。远端存储设备与第一节点也可以位于不同的数据中心。类似的,对于远端存储设备与第二节点的部署位置,远端存储设备与第二节点可以位于同一数据中心。远端存储设备与第二节点也可以位于不同的数据中心。
通过上述方法,远端存储设备、第一节点、以及第二节点的部署方式较为灵活,适用于不同的场景。
在一种可能的实施方式中,第一节点从第一文件系统,获取镜像文件时,可以将与第一文件系统之间的交互操作卸载到第一节点的DPU上。也就是说,第一节点的DPU可以实现对第一文件系统的访问。例如,第一节点中DPU从第一文件系统获取镜像文件。
通过上述方法,第一节点中的DPU可以访问第一文件系统,减少第一文件系统的访问对第一节点中处理器的占用。
在一种可能的实施方式中,除了将镜像文件的目录挂载到远端存储设备的第一文件系统,第一节点也可以将容器的其他文件的目录挂载在其他远端存储设备的文件系统上。第一节点也可以将容器的其他文件的目录挂载在该远端存储设备的其他文件系统上。也就是说,容器的不同文件的目录可以挂载到不同的文件系统上,这些不同的文件系统可以在相同的远端存储设备上,也可以在不同的远端设备上。例如,第一节点将容器的根文件的目录挂载到独立于第一节点的存储设备上的第二文件系统。这里的存储设备可以与第一文件系统所在的远端存储设备是同一个设备,也可以是不同的设备。又例如,第一节点将容器的持续化卷PVC的目录挂载到独立于第一节点的存储设备上的第三文件系统。这里的存储设备可以与第一文件系统所在的远端存储设备是同一个设备,也可以是不同的设备。
通过上述方法,第一节点将容器的其他文件的目录挂载在远端存储设备的文件系统上,使得容器的其他文件也可以保存在文件系统上,进一步减少对第一节点本地存储空间的占用。
在一种可能的实施方式中,第一节点将容器的其他文件的目录挂载在远端存储设备的文件系统的情况下,第一节点的DPU可以用于访问容器的其他文件的目录所挂载在的文件系统,例如,第一节点的DPU可以访问第二文件系统以及第三文件系统。
通过上述方法,第一节点中的DPU实现对远端存储设备的访问,以实现容器其他文件的读写,进一步减少容器其他文件的读写操作对第一节点中处理器的占用。
在一种可能的实施方式中,第二节点从镜像仓库中获取的镜像文件,将镜像文件存储第一文件系统时,第二节点从镜像仓库中获取的压缩后的镜像文件。第二节点对压缩后的镜像文件进行解压,将解压获得的镜像文件存储在第一文件系统。
通过上述方法,第二节点可以实现镜像文件的解压缩,便于后续在创建容器时,能够直接从第一文件系统读取解压缩后的镜像文件,提升容器创建效率。
第二方面,本申请实施例提供了一种容器创建系统,有益效果可以参见第一方面的相关描述,此处不再赘述。该容器创建系统中包括第一远端存设备以及第一节点。第一远端存储设备为独立于第一节点的存储设备。
第一远端存储设备上部署有第一文件系统。第一节点在需要创建容器时,可以将容器的镜像文件的目录挂载到第一文件系统,从第一文件系统获取镜像文件,基于镜像文件在第一节点上创建容器。
在一种可能的实施方式中,该系统还包括第二节点以及镜像仓库。
镜像仓库保存有镜像文件。第二节点可以将容器的镜像文件的目录挂载到第一文件系统,还可以从镜像仓库中获取的镜像文件,将镜像文件存储在第一文件系统。
在一种可能的实施方式中,第二节点还可以从第一文件系统获取镜像文件,基于镜像文件创建容器。
在一种可能的实施方式中,第二节点在将镜像文件存储第一文件系统时,可以将镜像文件中的增量数据存储在第一文件系统中,增量数据为第一文件系统中已存储的其他镜像文件与镜像文件不同的数据。
在一种可能的实施方式中,镜像仓库可以向第二节点发送该镜像文件中的增量数据,第二节点可以 获取镜像文件中的增量数据,增量数据为第一文件系统中已存储的其他镜像文件与镜像文件不同的数据。
在一种可能的实施方式中,第一节点与第二节点可以位于不同的数据中心,也可以位于同一数据中心。
在一种可能的实施方式中,第一远端存储设备与第一节点或第二节点可以位于同一数据中心。第一远端存储设备与第一节点或第二节点也可以位于不同的数据中心。
在一种可能的实施方式中,第一节点可以将第一文件系统的访问操作卸载到DPU上,例如,第一节点从第一文件系统,获取镜像文件时,第一节点中的DPU可以从第一文件系统获取镜像文件。
在一种可能的实施方式中,该系统还包括第二远端存储设备以及第三远端存储设备;第二远端存储设备以及第三远端存储设备为独立于第一节点的存储设备。第二远端存储设备部署有第二文件系统。第三远端存储设备部署有第三文件系统。
第一节点还可以将容器的根文件的目录挂载到二文件系统,将容器的持续化卷PVC的目录挂载到第三文件系统。这里的远端存储设备可以与第一文件系统所在的远端存设备为相同的设备,也可以为不同的设备。
在上述说明中,第一文件系统、第二文件系统、第三文件系统位于不同的远端存储设备(也即分别位于第一远端存储设备、第二远端存储设备以及第三远端存储设备)为例进行说明的。在实际应用中,第一文件系统、第二文件系统、第三文件系统中的部分或全部也可以位于同一远端存储设备。
在一种可能的实施方式中,第一节点的DPU访问第二文件系统以及第三文件系统。
在一种可能的实施方式中,第二节点从镜像仓库中获取的镜像文件,将镜像文件存储第一文件系统时,第二节点从镜像仓库中获取的压缩后的镜像文件;在对压缩后的镜像文件进行解压后,将解压获得的镜像文件存储在第一文件系统。
第三方面,本申请提供了一种容器创建装置,该容器创建装置具有实现上述第一方面的方法实例中第一节点行为的功能,有益效果可以参见第一方面的描述此处不再赘述。功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个与上述功能相对应的模块。在一个可能的设计中,容器创建装置的结构中包括第一挂载模块、第一获取模块、第一创建模块,这些模块可以执行上述第一方面方法示例中的相应功能,具体参见方法示例中的详细描述,此处不做赘述。
第四方面,本申请提供了另一种容器创建装置,该容器创建装置具有实现上述第一方面的方法实例中第二节点行为的功能,有益效果可以参见第一方面的描述此处不再赘述。功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个与上述功能相对应的模块。在一个可能的设计中,容器创建装置的结构中包括第二挂载模块、第二获取模块、第二创建模块,这些模块可以执行上述第一方面以及第一方面的各个可能的实施方式中的方法中的相应功能,具体参见方法示例中的详细描述,此处不做赘述。
第五方面,本申请还提供了一种容器创建节点,该容器创建节点可以为第一方面以及第一方面的各个可能的实施方式中所述的方法实例中的第一节点或第二节点。该存储器用于存储计算机程序指令。处理器具有实现上述第一方面或第一方面任一种可能的实现方式所述的方法实例中行为的功能,有益效果可以参见第一方面的描述此处不再赘述。该容器创建节点还可以包括DPU,该DPU可以用于访问第一文件系统、第二文件系统或第三文件系统。
第六方面,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面以及第一方面的各个可能的实施方式中所述的方法。
第七方面,本申请还提供一种计算机芯片,芯片与存储器相连,芯片用于读取并执行存储器中存储的软件程序,执行上述第一方面以及第一方面的各个可能的实现方式中的方法。
第八方面,本申请还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述第一方面以及第一方面的各个可能的实施方式中所述的方法。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现。
附图说明
图1为本申请提供的一种系统的架构示意图;
图2为本申请提供的一种节点的结构示意图;
图3为本申请提供的一种容器创建方法示意图;
图4为本申请提供的一种挂载信息配置界面示意图;
图5为本申请提供的一种overlay系统结构示意图;
图6~图7为本申请提供的一种容器创建装置的结构示意图。
具体实施方式
在对本申请实施例提供的一种容器创建方法、系统及节点介绍之前,对本申请涉及的一些概念进行说明:
(1)、虚拟化技术、容器(container)
虚拟化(virtualization)是一种资源管理技术,通过虚拟化技术将节点的各种物理资源,如服务器、网络、内存及存储等,予以抽象、转换后呈现出来。
容器是虚拟化技术中的一种,容器是通过虚拟化技术模拟的一种独立的运行环境,容器类似于一个轻量级的沙盒,实现对容器外部的软件以及硬件进行屏蔽,容器运行在节点上,实质上可以看成一种特殊的进程。
(2)、kubernetes(k8s)
k8s是开源的容器集群管理系统,k8s构建了一个容器的调度服务。k8s面向用户能够使得用户通过k8s进行容器集群的管理。k8s可以运行在用于部署容器的节点中,也可以部署在该节点之外的设备中。
借助k8s,用户无需进行复杂的配置工作。用户只需在k8s的客户端中设置一些必要参数,如容器需要基于的镜像文件的标识、容器仓(pod)的数量。其中,pod由工作在同一工作节点上的一组容器构成。在本申请实施例中该必要参数还包括容器文件的挂载信息,以是指容器的不同文件的目录所挂载的远端文件系统。
k8s会根据用户设置的必要参数自动选取合适的节点来执行具体的容器集群调度处理工作。
需要说明的是,在本申请实施例中仅是以k8s对容器进行管理为例进行说明,在实际应用中,也可以采用其他、能够用于实现容器管理的系统。
(3)、容器的三种文件
容器有三种不同属性的文件,分别为根文件(root file system,rootfs)、镜像(image)文件、以及持久化卷(persistent volume claim,PVC)。在本申请实施例中,允许容器的三种不同属性的文件中的任意文件存储在远端存储设备上部署的文件系统中(为了方便说明,远端存储设备上部署的文件系统称为远端文件系统)上。
下面分别对这三种文件进行说明:
1)、镜像文件
镜像是容器运行时的只读模板,每个镜像由一系列的层(layers)组成,能够提供容器运行时所需的程序、库、资源、配置参数等文件。鉴于镜像具备层级结构,在镜像以文件形式存在时,不同层可对应一个子文件,每个子文件还可以包括多个子文件。基于镜像的这种构成,镜像文件也被称为镜像文件系统。镜像是一个特殊的文件系统。
镜像仓库用于存储大量镜像,负责镜像的存储、管理和分发。某一个节点在需要创建容器时,从镜像仓库中拉取所需的镜像,利用所拉取的镜像创建容器,启动容器中的应用。
在本申请实施例中,镜像文件可以由一个节点从镜像仓库中拉出,存储在远端文件系统,基于远端文件系统中的镜像文件创建容器。该远端文件系统可以为共享文件系统,也即该节点所属的集群中的其他节点在该镜像文件存放在共享文件系统后,能够从该共享文件系统中获取该镜像文件,基于该镜像文件创建容器。
镜像仓库可以部署在多个设备构成的集群中,也就是说,集群面向节点提供一个或多个镜像文件。该镜像仓库也可以部署在单个设备上,也就是说,由该设备面向节点提供一个或多个镜像文件。本申请实施例并不限定镜像仓库的具体部署方式,在下文中,为了便于描述,利用镜像仓库指代部署有镜像仓库的集群或者单个设备。
2)、rootfs
由于一些rootfs在以文件形式存在时,该文件中可以包括子文件,子文件内部也可以包括子文件。rootfs也被称为根文件系统。
rootfs是容器的工作目录,主要用于存放临时数据、中间数据等。这些临时数据、中间数据包括用 户在容器上进行操作时需要暂时存储的数据、以及容器中的应用在运行过程产生的一些需临时存储的数据。
根文件中数据的生命周期与容器的生命周期一致,也就是说,当容器被注销,根文件也会随着被删除。
在本申请实施例中,容器在创建完成后,容器的rootfs存储在远端文件系统,容器的rootfs不会被存储在该容器所在节点的本地存储设备中,也即不会占用节点自己的本地存储设备的存储空间,以减少容器对节点自身的存储空间的占用。
3)、PVC
PVC是容器的数据卷,PVC用于存放需要持久化的数据,PVC中的数据的生命周期长于容器,即容器实例消亡之后,PVC中的数据依然存在,不会丢失。PVC中的数据包括用户在容器进行操作时,操作写入到PCV文件的数据,还包括容器的应用在运行过程中产生的一些需要持久化存储的数据。
PVC的存在也使得容器在发生故障时,一些数据不会被丢失。后续将故障的容器迁移后,重新创建的、用于替换故障容器的容器仍能继续使用故障容器的PVC。
在本申请实施例中,容器创建完成后,PVC可以存储在远端文件系统。该远端文件系统可以为共享文件系统,也即该节点所属的集群中的其他节点在该PVC存放在共享文件系统后,能够从该共享文件系统中获取该PVC。这种情况下,当容器发生故障时,在集群的一个节点上可以拉起一个新的容器用于代替该故障容器,实现容器的故障迁移,新的容器可以从共享文件系统中获取该PVC中的数据,还能够继续向该PVC中写入数据。
(4)、远端存储设备、远端文件系统
在本申请实施例中,远端存储设备是一种具备存储功能的设备。这里特别强调,所谓“远端”存储设备是指独立于节点的存储设备。该远端存储设备部署在节点的外部,与该节点是通过网络连接的。相应的,本地存储设备是指该节点自身的存储设备,如通过系统总线与节点连接的硬盘等。
远端存储设备侧的数据是以文件的概念组织的。每个文件具备唯一的文件名。通过这些文件分组,将同组的文件放置在一个目录中,在一个目录之下还可以放置其他文件,或目录(也称为子目录),形成了具备树状结构的“文件系统”。对于该树状结构中的任一文件,从该树状结构的根节点逐级向下,直至定位到该文件。文件系统是指对远端存储设备的数据访问是文件级别的访问。
为了方便说明,将远端存储设备上部署的文件系统称为远端文件系统。在本申请实施例中涉及两种类型的远端文件系统。一种为共享型远端文件系统,另一种为独享型远端文件系统。
对于共享型远端文件系统,能够被多个节点共享使用,也即该各个节点均与该共享远端文件系统建立了连接,各个节点能够基于网络协议与部署该共享远端文件系统的远端存储设备能够进行通信,任一节点被允许从该远端存储设备部署的共享型远端文件系统中写入数据,如写入镜像文件或者写入PVC。任一节点被允许能够从该远端存储设备部署的共享型远端文件系统中读取数据,如读取其他节点已写入的PVC,又如,读取该节点之前在PVC中写入数据。又例如,读取其他节点之前在PVC中写入数据。
对于独享型远端文件系统,是专属于一个节点(或一个容器),以供该节点(或一个容器)在该独享型远端文件系统中写入一些只属于该节点(或一个容器)的数据。
该节点被允许从该远端存储设备部署的独享型远端文件系统中写入数据,如写入rootfs。该节点被允许能够从该远端存储设备部署的独享型远端文件系统中读取数据,如读取该节点之前在rootfs中写入数据。
为了能够进一步提高独享型远端文件系统中的数据读写速度,该独享型远端文件系统中的文件可以基于键值对(key value,KV)的结构存储。该键值对中的键(key)为文件的文件名。该键值对中的值(value)为该文件。
需要说明的是,在本申请实施例中以部署远端文件系统的存储设备称为远端存储设备,但并不是说,远端文件系统是部署在一个存储设备中。鉴于实际应用中,一个远端文件系统可以被部署多个存储节点上,形成分布式的文件系统。将该多个存储节点作为一个整体,该包括该多个存储节点的整体可以理解为远端存储设备。也就是说,在本申请实施例中远端存储设备可以理解为一个存储设备,也可以理解为包括多个存储节点的系统。
(5)、rootfs的目录、镜像文件的目录、以及PVC的目录
容器的文件的目录,表述了该文件在容器所在节点的存储位置。与容器的三种文件对应,容器的文 件的目录分别为rootfs的目录、镜像文件的目录、以及PVC的目录。
rootfs的目录描述了该rootfs在节点的存储位置,镜像文件的目录描述了镜像文件在节点的存储位置。PVC的目录描述了该PVC在节点的存储位置。
rootfs的目录可以理解为节点中的一个文件夹或者一个文件夹的名称,在该文件夹中需要存储rootfs。类似的,镜像文件的目录可以理解为节点中的一个文件夹或者一个文件夹的名称。PVC的目录可以理解为节点中的一个文件夹或者一个文件夹的名称。
以rootfs为例,对于节点来说,只要知道rootfs的目录,可以知道该rootfs记录在哪一个文件夹中。该rootfs的目录可以是用户自行配置的,也即用户可以配置节点中用于存储rootfs的文件夹的名称。该rootfs的目录也可以是由容器的配置文件记录的,用户只需查看配置文件即可确定该存储rootfs的文件夹的名称,进而确定存储rootfs的文件夹。
以rootfs为例,在本申请实施例中通过挂载(mount),允许将rootfs的目录与一个远端文件系统关联在一起,或者可以称为将rootfs的目录挂载在一个远端文件系统中。借助挂载,rootfs的目录下的rootfs实际上是存储在远端文件系统中。而在节点侧,在需要在rootfs中写入数据时,在节点侧需要将数据写入到该rootfs的目录中时,该节点将该数据存储在于该rootfs的目录关联的远端文件系统中。在需要展示rootfs的数据,该节点可以从远端文件系统将数据读到节点本地。
如图1所示,为本申请实施例提供的一种容器创建系统的架构示意图,该容器创建系统100包括至少一个节点110、以及远端存储设备120。可选的,该系统100还包括镜像仓库130。
本申请实施例中并不限定该远端存储设备120的数量,可以为一个,也可以为多个。每个远端存储设备120上部署有一个文件系统或多个文件系统。在图1中示例性的展示了三个远端存储设备120,该三个远端存储设备120分别为远端存储设备120A、远端存储设备120B、以及远端存储设备120C。
其中,远端存储设备120A部署有文件系统A,文件系统A为一个独享型远端文件系统。远端存储设备120B部署有文件系统B,文件系统B为一个共享型远端文件系统。远端存储设备120C部署有文件系统C,文件系统C为一个共享型远端文件系统。
节点110可以为计算设备,包括但不限于个人电脑、服务器、手机、平板电脑或者智能车等。该节点110也可以为虚拟机。
当该容器创建系统100中包括多个节点110时,本申请并不限定该多个节点110的部署位置,该多个节点110可以部署在同一数据中心中,也可以部署在不同数据中心中。同样的,本申请也不限定节点110与远端存储设备120的部署位置。节点110和远端存储设备120可以为位于同一数据中心,也可以位于不同数据中心。
对于任一用于部署容器的节点110,该节点110能够实现容器的文件的目录的挂载,将容器的任一种文件的目录挂载到远端文件系统中。例如,节点110可以将容器的rootfs的目录挂载到文件系统A,将容器的镜像文件的目录挂载到文件系统B,将容器的PVC挂载到文件系统C中。
节点110从镜像仓库130中获取容器的镜像文件,将该镜像文件存储到该镜像文件的目录中(实际上是存储在该镜像文件的目录所挂载的远端文件系统中),基于该镜像文件创建容器。
另外,对于需要部署同一类型容器的节点110,这些节点110中容器的镜像文件的目录可以挂载到同一个共享型远端文件系统中。这样,只需其中一个节点110将镜像文件写入到镜像文件的目录中,其他节点110可以从该共享型远端文件系统获取该镜像文件,基于该镜像文件创建容器。也即其他节点110无需重复地从镜像仓库130中拉起该镜像文件。
容器在创建之后,容器运行过程中产生的临时数据或中间数据可以写入到rootfs的目录中(也即写入到rootfs目录下的rootfs中),也就是说,容器运行过程中产生的临时数据或中间数据将传输至该rootfs的目录所挂载的远端文件系统中。容器运行过程中产生的需要持久化存储的数据可以写入到PVC的目录中(也即写入到PVC目录下的PVC中),节点110可以将写入到PVC的目录下的数据传输至该PVC的目录所挂载的文件系统中。
对于任一远端存储设备120,远端存储设备120上部署有远端文件系统,向节点110提供了存储空间,本申请实施例并不限定该远端存储设备120的具体形态,该远端存储设备120可以表现为一个包括多个存储节点110的系统,也可以表现为存储器。
在本申请实施例中,节点110侧能够提供容器的文件的目录与远端文件系统的挂载功能,使得容器的文件可以存储在远端文件系统中,占用远端存储设备120的存储空间,避免了容器的对节点110本地 的存储设备的占用。
另外,在本申请实施例中,节点110从镜像仓库130获取镜像文件时,可以只获取增量数据。所谓增量数据是指节点110已获取的镜像文件(节点110已获取的镜像文件为节点110已存储在远端文件系统中的镜像文件)与当前需要获取的镜像文件不同的数据(也即差异数据)。这样能够减少节点110与镜像仓库130之间交互的数据量,提升镜像文件的传输速率。而节点110将镜像文件存储在远端文件系统时,也可以只存储增量数据。远端文件系统不需要存储大量的重复数据,能够提升远端文件系统的存储空间利用率,节点110与远端文件系统之间交互的数据量也会减少,加速节点110与远端文件系统之间的交互效率。
具体到节点110内部,下面对节点110的内部结构进行说明:
如图2所示,节点110包括I/O接口113、处理器111、存储器112、以及加速装置114。I/O接口113、处理器111、存储器112、以及加速装置114之间可通过系统总线连接,该系统总线可以为快捷外围部件互连标准(peripheral component interconnect express,PCIe)总线,也可以为计算快速互联(compute express link,CXL)、通用串行总线(universal serial bus,USB)协议或其他协议的总线。
图2示例性的展示了一种连接方式,图2中,加速装置114可以直接插在节点110的主板上的卡槽中,通过PCIe总线115与处理器111交换数据。
I/O接口113用于与位于节点110外部的设备通信。例如,通过I/O接口113接收外部设备发送的容器创建指令,通过I/O接口113从镜像仓库130中获取镜像文件,将镜像文件、rootfs或PCV文件通过I/O接口113发送给远端存储设备120。
处理器111是节点110的运算核心和控制核心,它可以是中央处理器(central processing unit,CPU),也可以是其他特定的集成电路。处理器还可以是其他通用处理器、数字信号处理器(digital signal processing,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。
存储器112通常用来存放节点110操作系统中各种正在运行的计算机程序指令以及数据等。为了提高处理器111的访问速度,存储器112需要具备访问速度快的优点。存储器112通常采用动态随机存取存储器112(dynamic random access memory,DRAM)作为存储器112。除了DRAM之外,存储器112还可以是其他随机存取存储器112,例如静态随机存取存储器112(Static random access memory,SRAM)等。另外,存储器112也可以是只读存储器112(read only memory,ROM)。而对于只读存储器112,举例来说,可以是可编程只读存储器112(programmable read only memory,PROM)、可抹除可编程只读存储器112(erasable programmable read only memory,EPROM)等。存储器112还可以为闪存介质(FLASH)、硬盘驱动器(hard disk drive,HDD)或固态驱动器(solid state disk,SSD)等。
本实施例并不限定存储器112的数量。处理器111通过双倍速率(double data rate,DDR)总线或者其他类型的总线和存储器112相连。存储器112理解为节点110的内存(internal memory),内存又称为主存(main memory)。
处理器111可以通过调用该存储器112中的计算机程序指令执行如下文中图3所示的实施例中节点110需执行的全部方法。处理器111也可以通过调用该存储器112中的计算机程序指令执行如下文中图3所示的实施例中节点110需执行的部分方法。例如,在本申请实施例中处理器111可以完成容器的文件的目录与远端文件系统的挂载操作、以及容器的创建操作。而对于节点110访问远端文件系统的操作由加速装置114执行。访问远端文件系统的操作是指从远端文件系统读取数据或向远端文件系统写入数据。
加速装置114包括数据处理单元(data process unit,DPU)1141,可选的,该加速装置114还包括存储器1142、供电电路等,DPU1141与存储器1142通过系统总线连接,该系统总线可以为基于PCIe的线路,也可以为CXL、USB协议或其他协议的总线。
DPU1141为加速装置114的主要运算单元,是加速装置114的核心单元,DPU1141承担了加速装置114的主要功能。例如节点110的部分功能可以卸载到DPU1141上,由DPU1141处理数据,执行节点110交由加速装置114的任务。DPU1141通过调用该存储器1142中的计算机程序指令执行如下文中图3所示的实施例中节点110需执行的一些方法。DPU1141可以访问远端文件系统,从远端文件系统读取数据或将数据写入到远端文件系统。这里的数据可为rootfs中的数据、镜像文件或PVC中的数据。
下面结合附图3对本申请实施提供的容器创建方法进行说明,在图3所示的实施例中以在节点110M以及节点110N上创建容器进行说明,图3仅示例性的描述了在节点110M以及节点110N分别创建一个容器的过程。在节点110上创建多个容器的方式与在节点110上创建一个容器的方法类似,此处不再赘述。图3所示的容器创建方法包括三部分,第一部分为容器的文件的目录与远端文件系统挂载的过程,参见步骤300~步骤302,第二部分为容器创建过程,参见步骤303~步骤305,第三部分为容器运行过程中,容器的文件的写入以及读取过程,参见步骤306~步骤307。
步骤300:用户配置容器文件的挂载信息,触发容器的文件挂载流程。其中,容器文件的挂载信息描述容器的三种文件与远端文件系统的关联关系。
下面列举三种用户配置容器文件的挂载信息的方式:
方式一:利用容器集群管理系统配置容器文件的挂载信息。
以k8s为例,用户能够通过k8s对容器进行管理,用户在需要部署容器时,可以在部署在用户侧的k8s侧的客户端配置一些必要参数。例如,用户可以配置容器所需基于的镜像文件,用户在k8s在配置镜像文件时,可以输入能够唯一标识该镜像文件的标识,该镜像文件的标识可以是镜像仓库130为该镜像文件配置的,也可以是镜像文件设计人员在将该镜像文件存放在镜像仓库130时为该镜像文件设置的。又例如,用户可以配置容器仓的数量。其中,每个容器仓中容器的数量是预先配置的。
又例如,用户可以在k8s配置容器文件的挂载信息,针对容器的rootfs、镜像文件以及PVC的任一种,用户可以配置该文件的目录、该文件的目录所挂载的远端文件系统的类型、该文件的目录所挂载的远端文件系统名。
如图4所示,为k8s为用户提供的一种配置容器文件的挂载信息的界面示意图,在图4中,提供了针对容器的rootfs的挂载信息的接口,其中,针对容器的rootfs的挂载信息包括该rootfs的目录、该rootfs的目录所挂载的远端文件系统的类型、该rootfs的目录所挂载的远端文件系统名,可选的,该挂载信息中还可以包括远端文件系统的入口地址。远端文件系统的入口地址为该远端存储设备120的地址,该入口地址用于节点110与该远端文件系统通过网络进行数据传输。该入口地址可为网络协议(internet protocol,IP)地址,也可以为媒体存储控制地址(media access control addres,MAC),本申请实施例并不限定该入口地址的具体类型,凡是能够实现与该远端文件系所在的远端存储地址进行通信的地址均是用于本申请实施例。
提供了针对容器的镜像文件的挂载信息的接口,其中,包括该镜像文件的目录、该镜像文件的目录所挂载的远端文件系统的类型、该镜像文件的目录所挂载的远端文件系统名。可选的,该挂载信息中还可包括远端文件系统的入口地址。远端文件系统的入口地址用于节点110与该远端文件系统进行数据传输。
提供了针对容器的PVC的挂载信息的接口,其中,包括该PVC的目录、该PVC的目录所挂载的远端文件系统的类型、该PVC的目录所挂载的远端文件系统名。可选的,该挂载信息中还可包括远端文件系统的入口地址。远端文件系统的入口地址用于节点110与该远端文件系统进行数据传输。
在本申请实施例中涉及了两种不同类型的远端文件系统,为了与已有的网络文件系统(network file system,NFS)进行区分,可以为这两种不同类型的远端文件系统预先设计相应的类型名。例如,将共享型远端文件系统的类型名设计为SFS,将共享型远端文件系统的类型名设计EFS。两种不同类型的远端文件系统的类型名是节点110能够识别的。在实际应用中为了能够使得节点110能够识别这两种不同类型的远端文件系统,可以通过更新节点110中的容器存储接口(container storage interface,CSI)插件,使得节点110具备识别这两种不同类型的远端文件系统的类型名的功能,能够自动执行携带其中任一类型的远端文件系统的类型名挂载指令。
在图4中针对容器的rootfs的挂载信息中,rootfs的目录为rootfs-1,表明存储rootfs的文件夹的名称为rootfs-1;该rootfs的目录所挂载的远端文件系统的类型为EFS,表明该rootfs的目录需要挂载到一个独享型远端文件系统中、该rootfs的目录所挂载的远端文件系统名为文件系统A,表明该独享型远端文件系统的文件名为文件系统A。远端文件系统的入口地址为10.10.0.1,表明远端文件系统的入口地址为10.10.0.1,节点110能够基于该入口地址通过网络从远端文件系统读取rootfs中的数据或者将数据写入到该远端文件系统。
针对容器的镜像文件的挂载信息中,镜像文件的目录为image-1,表明存储镜像文件的文件夹的名 称为image-1;该镜像文件的目录所挂载的远端文件系统的类型为SFS,表明该镜像文件的目录需要挂载到一个共享型远端文件系统中;该镜像文件的目录所挂载的远端文件系统名为文件系统B,表明该共享型远端文件系统的文件名为文件系统B。远端文件系统的入口地址为10.10.0.2,表明远端文件系统的入口地址为10.10.0.2,节点110能够基于该入口地址通过网络从远端文件系统读取镜像文件或者将镜像文件写入到远端文件系统。
针对容器的PVC的挂载信息中,PVC的目录为PVC-1,表明存储PVC的文件夹的名称为PVC-1;该镜像文件的目录所挂载的远端文件系统的类型为SFS,表明该镜像文件的目录需要挂载到一个共享型远端文件系统中;该镜像文件的目录所挂载的远端文件系统名为文件系统C,表明该共享型远端文件系统的文件名为文件系统C。远端文件系统的入口地址为10.10.0.3,表明远端文件系统的入口地址为10.10.0.3,节点110能够基于该入口地址通过网络从远端文件系统读取PVC中的数据或者将PVC中的数据写入到远端文件系统。
用户在k8S的客户端配置完成后,k8s可以根据用户的配置编排多个容器,确定在哪些节点110上部署pod,每个节点110部署多少个pod。k8s向所确定节点110发送容器挂载请求,该容器挂载请求中携带了容器文件的挂载信息,以请求节点110完成容器的文件的目录与远端文件系统的挂载,触发容器的文件挂载流程(也即步骤301~步骤303)。例如,k8s可以向节点110M和节点110N下发容器挂载请求,该容器挂载请求中可以携带图4所配置的容器文件的配置信息。
此外,k8s还可以向所确定节点110中的部分节点110发送镜像拉取请求,该镜像拉取请求用于请求节点110从镜像仓库130中拉取镜像文件,该镜像拉取请求中携带了镜像文件的标识。以k8s确定了在节点110M以及节点110N中部署pod为例,k8s不需要向节点110M和节点110N分别发送镜像拉取请求,仅需向节点110M或节点110N发送镜像拉取请求。
当然,在实际应用中k8s侧可以如图4所示向用户提供容器的三种文件的挂载信息的配置功能,也可以只向用户提供其中一种或两种文件的挂载信息的配置功能。
方式二:用户通过更新容器的配置文件配置,容器文件的挂载信息。
在节点110侧,节点110保存有容器的配置文件,容器的配置文件记录了容器创建的一些参数,该容器的配置文件可以包括容器中一种或多种文件的挂载信息。容器的配置文件中针对任一中文件的挂载信息属于预设的信息。这些预设的信息的允许进行变更的。
举例来说,容器的配置文件中包括预设的针对容器的镜像文件的挂载信息、以及针对容器的rootfs的挂载信息。针对容器的镜像文件的挂载信息。
预设的针对容器的rootfs的挂载信息包括预设的该rootfs的目录、预设的该rootfs的目录所挂载的远端文件系统的类型、预设的该rootfs的目录所挂载的远端文件系统名,可选的,该挂载信息中还可以包括预设的远端文件系统的入口地址。
用户可以对该针对容器的rootfs的挂载信息进行修改,如修改该rootfs的目录、该rootfs的目录所挂载的远端文件系统的类型、该rootfs的目录所挂载的远端文件系统名等信息。
例如,用户可以将该rootfs的目录修改为rootfs-A,将该rootfs的目录所挂载的远端文件系统的类型修改为EFS,将该rootfs的目录所挂载的远端文件系统名修改为文件系统A。
预设的针对容器的镜像文件的挂载信息包括预设的该镜像文件的目录、预设的该镜像文件的目录所挂载的远端文件系统的类型、预设的该镜像文件的目录所挂载的远端文件系统名。可选的,该挂载信息中还可包括远端文件系统的入口地址。
例如,用户可以将该镜像文件的目录修改为image-B将该镜像文件的目录所挂载的远端文件系统的类型修改为SFS,将该镜像文件的目录所挂载的远端文件系统名修改为文件系统B。
这样,当后续节点110需要创建容器时会调取该修改后的挂载信息,将容器的文件的目录挂载到所指定的文件系统上。
方式三:直接向节点110下发挂载指令,该挂载指令中携带了容器文件的挂载信息。
用户直接向节点110M、节点110N下发挂载指令,如用户可以直接操作该节点110,通过该节点110M、节点110N外接的输入输出设备键入挂载指令。
挂载指令的格式如下:mount-t远端文件系统的入口地址远端文件系统类型远端文件系统名文件的目录。
例如,用户键入如下三个挂载指令:
挂载命令1、mount-t 10.10.0.1 EFS文件系统A rootfs-A。
挂载命令2、mount-t 10.10.0.2 SFS文件系统B image-B。
挂载命令3、mount-t 10.10.0.3 SFS文件系统C PVC-C。
其中,挂载命令1指示名为rootfs-A的文件夹挂载到名为文件系统A、入口地址为10.10.0.1的独享型远端文件系统。挂载命令2指示名为image-B的文件夹挂载到名为文件系统B、入口地址为10.10.0.2的共享型远端文件系统。挂载命令3指示名为PVC-B的文件夹挂载到名为文件系统C、入口地址为10.10.0.3的共享型远端文件系统。
用户在键入这三个挂载指令后,触发容器的文件挂载流程(也即步骤301~步骤302)。
上述方式仅是举例说明的,在实际应用中,用户在配置容器文件的挂载信息时,可以采用上述方式的任一种;也可以采用上述方式中的多种,例如,用户可以在容器集群管理系统配置容器PVC的挂载信息,并在节点110侧通过修改容器的配置信息配置容器的roofs的挂载信息以及镜像文件的挂载信息。当然,本申请实施例也可以采用除上述三种方式之外的方式配置容器的挂载信息。
下面以节点110M以及节点110N创建容器为例进行说明。
步骤301:节点110M将容器的rootfs的目录挂载到文件系统A,将容器的镜像文件的目录挂载到文件系统B,将容器的PVC的目录挂载到文件系统C。
针对步骤300中提及的方式一,节点110M在接收容器挂载请求后,节点110M(如节点110M中的处理器111)可以自动执行挂载指令,节点110M自动执行的挂载指令即为与前述提及的挂载指令1、挂载指令2、以及挂载指令3类似的卸载指令。
针对步骤300中提及的方式二,节点110M获取用户修改后的容器的文件的挂载信息,节点110M(如节点110M中的处理器111)可以自动执行挂载指令,节点110M自动执行的挂载指令即为与前述提及的挂载指令1、挂载指令2、以及挂载指令3类似的卸载指令。
针对步骤300中提及的方式三,节点110M在检测到用户键入的卸载指令后,节点110M(如节点110M中的处理器111)可以自动执行挂载指令,节点110M自动执行的挂载指令即为前述提及的挂载指令1、挂载指令2、以及挂载指令3。
节点110M通过执行这些挂载指令,使得节点110M与部署远端文件系统的远端存储设备120建立连接(节点110M会与远端存储设备120进行通信,以告知远端存储设备120后续会向远端存储设备120中写入一些数据),建立了容器的文件的目录与远端文件系统之间的关联关系,使得容器的文件的目录下的文件可以写入到与其关联的远端文件系统中。
节点110M通过执行挂载指令1,在节点110M与部署文件系统A的远端存储系统120A之间建立了连接,将容器的rootfs的目录挂载到文件系统A中,建立了容器的rootfs的目录与文件系统A的关联关系。
节点110M通过执行挂载指令2,在节点110M与部署文件系统B的远端存储系统120B之间建立了连接,将容器的镜像文件的目录挂载到文件系统B中,建立了容器的镜像文件的目录与文件系统B的关联关系。
节点110M通过执行挂载指令3,在节点110M与部署文件系统C的远端存储系统120C之间建立了连接,将容器的PVC的目录挂载到文件系统C中,建立了容器的PVC的目录与文件系统C的关联关系。
步骤302:节点110N将容器的rootfs的目录挂载到文件系统A,将容器的镜像文件的目录挂载到文件系统B,将容器的PVC的目录挂载到文件系统C。节点110N执行步骤302的方式与节点110M执行步骤301的方式类似,具体可以参见前述内容,此处不再赘述。
步骤303:节点110M从镜像仓库130获取镜像文件,将镜像文件写入到镜像文件的目录下。
针对步骤300中提及的方式1,节点110M(如节点110M中的处理器111)能够接收k8s发送的镜像拉取请求,节点110M(如节点110M中的处理器111)在接收到该镜像拉取请求后,根据该镜像拉取请求中携带的镜像文件的标识从镜像仓库130中拉取镜像文件;节点110M(如节点110M中的处理器111或节点110M中的加速装置114)将镜像文件写入到镜像文件的目录。
节点110M在将镜像文件写入到镜像文件的目录的过程中实质上节点110M将镜像文件写入到与镜像文件的目录所挂载的文件系统B的过程。
节点110M(如节点110M中的处理器111)中可以部署有文件系统B的客户端,该文件系统B的 客户端可以运行在节点110M中的处理器111上,这样,在节点110M中在将镜像文件写入到镜像文件的目录时,该文件系统B客户端可以与远端存储设备120B通信,将该镜像文件传输至远端存储设备120B中,存储在文件系统B中。
为了减少对节点110M的处理器111的占用,节点110M可以将访问远端文件系统的功能卸载到加速装置114中,也即是说,由加速装置114与各个远端存储设备120进行通信,访问远端文件系统。
在步骤303中,当节点110M的处理器111在从镜像仓库130中获取镜像文件后,加速装置114可以将镜像文件写入到镜像文件的目录。加速装置114可以将该镜像文件写入到与镜像文件的目录所挂载的文件系统B中。
在这种将访问远端文件系统的功能卸载到加速装置114的场景中,加速装置114中部署有文件系统B的客户端,该文件系统B的客户端可以运行在加速装置114的DPU1141上,这样,在节点110M中在将镜像文件写入到镜像文件的目录时,该文件系统B的客户端可以与远端存储设备120B通信,将该镜像文件传输至远端存储设备120B中,存储在文件系统B中。
节点110M在从镜像仓库130拉取镜像文件时,可以只获取该镜像文件的增量数据,该增量数据是指当前需要拉取的镜像文件与节点110M已保存的镜像文件之间的差异数据。在本申请实施例由于节点110M的镜像文件的目录与文件系统B挂载,说明节点110M已保存的镜像文件实质上是存储在文件系统B中的。故而,节点110M已保存的镜像文件是指文件系统B中保存的镜像文件。那么,增量数据是指当前需要拉取的镜像文件与文件系统B已保存的镜像文件之间的差异数据。本申请实施例并不限定该增量数据的粒度。由于镜像文件是层级结构,也即镜像文件包括多层,而每一层数据可以划分为多个块数据。该增量数据可以是镜像文件中的一层或多层,也可以是该镜像文件中的一个或多个块数据,多个块数据可以是镜像文件中一层数据中的多个块数据,也可以是镜像文件中不同层数据中的多个块数据。
本申请实施例并不限定节点110M从镜像仓库130只获取该镜像文件的增量数据的方式。例如,节点110M侧可以记录文件系统B中已保存的镜像文件的标识,或者节点110M可以与文件系统B交互获取文件系统B中已保存的镜像文件的标识。其中,从设备角度,文件系统B中已保存的镜像文件是指远端存储设备120B已保存的镜像文件。
节点110M从镜像仓库130拉取镜像文件时,可以向镜像仓库130发送已保存的镜像文件的标识以及当前需要拉取的进行文件的标识。镜像仓库130根据已保存的镜像文件的标识以及当前需要拉取的进行文件的标识确定当前需要拉取的镜像文件的增量数据。镜像仓库130将该增量数据发送给节点110M。
类似的,节点110M将镜像文件写入到镜像文件的目录下,也即节点110M将镜像文件写到文件系统B中,节点110M也可以只将该镜像文件的增量数据保存在文件系统B中。在这种场景下,存在两种可能的情况:
情况一、节点110M从镜像仓库130中拉取了该镜像文件的增量数据。
在这种情况下,节点110M直接将镜像文件的增量数据写入到镜像文件的目录下,也即直接将镜像文件的增量数据保存在文件系统B中。
情况二、节点110M从镜像仓库130中拉取了该镜像文件。
在这种情况下,节点110M从镜像文件中拉取了整个镜像文件。节点110M可以查看文件系统B(如节点110M可以调取与文件系统B关联的镜像文件的目录下的镜像文件),确定文件系统B中已保存的镜像文件与当前拉取的镜像文件之间的差异数据,也即确定镜像文件的增量数据。将镜像文件的增量数据写入到镜像文件的目录下。节点110M也可以直接将镜像文件写入到镜像文件的目录下,也即将镜像文件发送到文件系统B。远端存储设备120B在接收到该镜像文件后,可以确定文件系统B中已保存的镜像文件与当前接收到的镜像文件的差异数据,也即确定镜像文件的增量数据,远端存储设备120B只保存该镜像文件的增量数据。上述节点110M将取该镜像文件的增量数据保存在文件系统B中的方式仅是举例,本申请实施例并不限定节点110M将取该镜像文件的增量数据保存在文件系统B中的方式。
下面以增量数据为块数据为例,介绍一种节点110M从镜像仓库130拉取镜像文件的增量数据,并在文件系统中保存该镜像文件的增量数据的方式,该方式包括如下步骤:
步骤1.节点110M向镜像仓库130发送镜像请求,该镜像请求中携带了需要拉取的镜像文件的标识。
步骤2.镜像仓库130收到镜像请求后,根据该镜像文件的标识确定需要拉取的镜像文件,向节点 110M发送镜像文件的摘要信息发送给节点110M。其中,摘要信息用于指示该镜像文件中的内容。该镜像文件中的内容包括但不限于:镜像文件包括的各个层,以及每一层的指纹信息。
每一层的指纹信息可以理解为数据的标识,根据该指纹信息能够确定该层中所包括的数据。在本申请实施例中,每一层的指纹信息可以是以块数据为粒度的,也即该层中的每个块数据对应一个块数据的指纹信息。每一层的指纹信息包括该层中各个块数据的指纹信息。
例如,镜像文件中的一层的数据有1兆字节(MByte,MB),那么首先对它进行划分,划分为1024个大小为1千字节(Kilobyte,kB)的块数据。基于哈希算法,为每一个块数据计算一个指纹信息。这里每层数据的划分方式仅是举例,本申请实施例并不限定块数据的划分方式。这里指纹信息的计算方式也仅是举例,本申请实施例也不限定指纹信息的计算方式。
步骤3.节点110M收到镜像的摘要信息后,将该摘要信息发送到远端存储设备120B。
步骤4.远端存储设备120B收到该摘要信息后,可以根据该摘星信息中的指纹信息确定哪一些指纹信息对应的块数据已存储在文件系统B中,哪一些指纹信息对应的块数据未存储在文件系统B中,这些未存储在文件系统B中的块数据即为镜像文件的增量数据。
步骤5.远端存储设备120B生成增量数据的指示信息,该增量数据的指示信息用于指示该镜像文件的增量数据。
本申请实施例并不限定增量数据的指示信息指示该镜像文件的增量数据的方式,例如,该增量数据的指示信息包括未存储在文件系统B中的块数据的指纹信息。又例如,该增量数据的指示信息指示了该镜像文件中的每一个块数据块是否已保存在远端存储设备120B中。
步骤6.远端存储设备120B将增量数据的指示信息发送给节点110M,然后节点110M将增量数据的指示信息发送给镜像仓库130。
步骤7.镜像仓库130收到增量数据的指示信息后,可以向节点110M发送镜像文件。
这里镜像仓库130发送的镜像文件中已存储在文件系统B中的块数据可以用于该块数据的指纹信息代替,未存储在文件系统B中的块数据即为该块数据本身。
步骤8.节点110M将镜像文件写入到该镜像文件的目录中,也即节点110M将镜像文件发送给远端存储设备120B。
步骤9.远端存储设备120B在接收到该镜像文件后,保存该镜像文件。
在前述说明中,是以k8s通过向节点110M下发镜像拉取请求,以使得节点110M执行步骤303为例进行说明的。在实际应用中,用户也可以直接在节点110M中键入用于指示拉取镜像文件的指令,以使得节点110M能够在用户的触发下执行步骤303。本申请实施例并不限定触发节点110M执行步骤303的方式,凡是能够使得节点110M执行步骤303的方式均适用于本申请实施例。
步骤304:节点110M(如节点110M中的处理器111)基于镜像文件在节点110M上创建容器。
情况一、节点110M在将镜像文件写入到镜像文件的目录下后,k8s可以向节点110M发送容器创建请求,用于请求节点110M创建容器,该容器创建请求中携带有该节点110M中需要部署的pod的数量。节点110M在接收到该容器创建请求后,可以自动执行容器创建指令(如docker run指令)可以将该镜像文件中所需的数据加载到本地,通过运行该镜像文件中的程序、调用镜像文件中的库以及资源、完成镜像文件中配置参数的配置等操作,创建容器。
情况二、节点110M在将镜像文件写入到镜像文件的目录下后,用户可以在节点110M中键入容器创建指令,以直接指示节点110M创建容器,节点110M在检测到容器创建指令后,可以执行该容器创建指令,将该镜像文件中所需的数据加载到本地,通过运行该镜像文件中的程序、调用镜像文件中的库以及资源、完成镜像文件中配置参数的配置等操作,创建容器。
在步骤301中仅是在节点110中仅是实现通用的容器的文件的目录与远端文件系统的挂载,步骤301的容器的文件的目录并非针对某一个具体的容器。在步骤304创建容器的过程中,需要将步骤301中已配置的容器的文件的目录关联到所创建的容器上。使得步骤301中的容器的文件的目录是针对创建的容器的文件的目录。
节点110M在创建容器的过程中为所创建的容器配置该容器的文件的目录。例如,节点110M将卸载指令1中名为rootfs-A容器的rootfs的目录关联为rootfs的文件的目录rootfs-1,节点110M将卸载指令2中名为image-B容器的镜像文件的目录关联为镜像的文件的目录image-1,节点110M将卸载指令 3中名为PVC-B容器的镜像文件的目录关联为镜像的文件的目录PVC-1。
在上述两种情况中,将该镜像文件加载到本地的过程可以由节点110M的处理器111执行,也可以由节点110M中的加速装置114执行,以减少对处理器111的占用。在加载镜像文件时,处理器111或加速装置114(如加速装置114中的DPU1141)与远端存储设备120B通信,获取该镜像文件。
通常,为了节约存储空间,镜像仓库130中的镜像文件为压缩后的镜像文件,也就是说,节点110M获取的镜像文件是压缩后的镜像文件,在将镜像文件写入到镜像文件的目录之前,节点110M(如节点110M的处理器111或者加速装置114)将压缩后的镜像文件进行解压,在解压后,再将解压后的获得的镜像文件写入到镜像文件的目录下,也即将解压后获得的镜像文件存储与该镜像文件的目录挂载的远端文件系统中。
为了进一步减少对处理器111或者加速装置114的消耗,解压的操作也可以有远端存储设备120B执行,也就是说,当节点110M中的处理器111或加速装置114在将该压缩后的镜像文件写入到镜像文件的目录所挂载的文件系统B,远端存储设备120B可以对该压缩后的镜像文件进行解压。
这样,当节点110M需要创建容器时,节点110M的处理器111或加速装置114(如加速装置114中的DPU1141)将该解压后的镜像文件中所需的数据加载到本地,基于加载的数据创建容器。
节点110M在创建容器时,节点110M可以基于叠置(overlay)系统创建节点110M上需要部署的容器。
下面先对overlay系统进行说明,overlay系统是一种特殊的文件系统,它是一个多层文件系统,overlay系统的层状结果可以参见图5。图5中右侧部分展示了该overlay系统中的三个层级目录,该三个层级目录分别为挂载点(merged)、读写层(upperdir)和只读层(lowerdir)。
其中,upperdir和lowerdir可以挂载到相同或者不同的文件系统。目录merged中下的数据upperdir和lowerdir两个目录中的数据组合。其中upperdir中的文件或者目录会覆盖lowerdir层中的同名文件或者同名目录。比如在merged目录看到的file2是upperdir目录中的file2,而不是lowerdir中的。
这三个目录整体对外呈现为overlay文件系统,以mount的方式挂载overlay文件系统时,看到的挂载点是merged目录,即upperdir和lowerdir是看不到的。当在挂载点操作数据时,实际是在upperdir以及lowerdir中进行操作数据。有以下几种常见操作:
1、读操作,也即读取数据。例如,读取file1的数据时,会从lowerdir中读取file1。又例如,读取file2时,会从upperdir中读取file1。
2、写操作,也即写入数据。例如,在写file1中写入数据,会先从lowerdir中读取file1,修改file1中的数据,之后,再将修改后的file1保存在upperdir中创建file1。
对于容器,可以将容器所需的文件存放在overlay系统中。例如,可以将镜像文件夹存放在lowerdir中,保证镜像文件不被修改。容器中对镜像的修改和临时文件(也即roofs)的创写都放在upperdir层。在本申请实施例中,lowerdir对应容器的images的目录。也就是说,lowerdir与镜像文件的目录存在关联,又由于镜像文件的目录挂载到了远端文件系统,实质上,lowerdir与远端文件系统之间关联。upperdir对应容器的rootfs的目录。也就是说,upperdir与rootfs的目录存在关联,又由于rootfs的目录挂载到了远端文件系统,实质上,upperdir与远端文件系统之间关联。
步骤305:节点110N获取镜像文件,在节点110N上创建容器。节点110N执行步骤305的方式与节点110M执行步骤304的方式类似。
值得注意的是,由于节点110M将镜像文件写入到与镜像文件的目录所挂载的文件系统B中,由于节点110N中容器的镜像文件的目录也挂载到了该文件系统B中,节点110M将镜像文件将该文件系统B中,也相当于将镜像文件写入到了节点110N中容器的镜像文件的目录下,又由于文件系统B属于共享型远端文件系统,节点110N能够直接从文件系统B将该镜像文件加载到本地,通过运行该镜像文件中的程序、调用镜像文件中的库以及资源、完成镜像文件中配置参数的配置等操作,创建容器。
在本申请实例允许多个节点110将容器的某一种文件的目录挂载到同一个共享型文件系统上。这样,当该多个节点110中的一个节点110上的容器将该文件中的数据写入到该文件的目录下时,该数据会被写入到该共享型文件系统中,该多个节点110中的其余节点110能够获取该共享型文件系统中该文件中的数据。以多个节点110需要部署同一类型容器的节点110均将容器的镜像文件的目录挂载到同一个共享型文件系统上为例,这种情况下,该共享文件系统所在的远端存储设备120为部署同一类型容器的节 点110配置同一段用于存储镜像文件的存储空间。也就是说,部署同一类型容器的节点110中的任一节点110获取的镜像文件后写到镜像文件的目录下,该镜像文件均会被写入到该存储空间上。那么部署同一类型容器的节点110中的任一节点110同样能够获取该存储空间中的数据。基于这种原理,对于镜像文件来说,只要部署同一类型容器的节点110中有一个节点110将镜像文件写入到该存储空间,部署同一类型容器的节点110中其余节点110就可以通过查看自身镜像文件的目录中的数据,就可以读取该存储空间中的镜像文件。
该共享文件系统所在的远端存储设备120为部署同一类型容器的节点110配置同一段用于存储镜像文件的存储空间的方式有很多种。例如,k8s在确定了哪些节点110需要部署同一类型的容器时,k8s可以向远端存储设备120发送指示消息,该指示消息告知远端存储设备120需要分配同一段存储镜像文件空间的节点110(该指示消息中可以携带节点110的标识),该远端存储设备120为这些节点110中容器的镜像文件的目录所挂载的共享型文件系统所在的远端存储设备120。后续当节点110通过执行针对镜像文件的挂载命令,与该远端存储设备120进行通信时,节点110可以将自身的节点110标识告知远端存储设备120,这样远端存储设备120可通过获取的各个节点110的标识确定需要分配同一段存储镜像文件空间的节点110,并为其分配同一段存储空间。又例如,当这些节点110通过执行针对镜像文件的挂载命令,与该远端存储设备120进行通信时,这些节点110可以将镜像文件的标识告知远端存储设备120,这样远端存储设备120可通过各个节点110发送的镜像文件的标识确定哪些节点110中容器的镜像文件的目录需要存储相同的镜像文件,这些发送相同镜像文件标识的节点110即为需要分配同一段存储镜像文件空间的节点110。这里仅列举其中两种方式,对于其他能够使得共享文件系统所在的远端存储设备120为部署同一类型容器的节点110配置同一段用于存储容器的文件中数据的存储空间的方式,本申请也同样适用。
至此节点110M以及节点110N中的容器已创建完成,容器在创建完成后,容器运行在其上部署的应用,用户也可以将通过部署在用户侧的客户端,在该容器中进行操作,如用户可以在该容器中查看数据、修改数据、保存数据。这里的客户端可以理解为用于操控容器的客户端软件或者是用于操作容器的客户端设备(也即具备硬件形态的客户端)。
步骤306:节点110M中的容器创建之后,节点110M将容器运行过程中的临时数据、中间数据等数据写入到rootfs的目录下,将容器运行过程产生的需要持久化的数据写入到PCV文件的目录下。
容器在运行过程中,容器部署的应用会进行一些业务,如数据库业务、语音通话业务、视频编解码业务等。应用在进行这些业务的过程中会伴随有一些数据的产生,应用可以根据已有的配置将这些数据中的临时数据、或中间数据等无需长久保存的数据(也即rootfs中的数据)写入到rootfs的目录。将这些数据中一些需要持久化的数据写入到PCV文件的目录下。
节点110M中的处理器111在检测到应用需要将数据写入到rootfs的目录的动作时,将数据写入到与该rootfs的目录所挂载的文件系统A中。节点110M中的处理器111可以将rootfs发送到远端存储设备120A,存储在文件系统A中。节点110M中的处理器111将数据写入到与该rootfs的目录所挂载的文件系统A的方式与节点110M的处理器111将镜像文件写入到与该image的目录所挂载的文件系统A的方式类似,此处不再赘述。
节点110M中的处理器111在检测到应用需要将数据写入到PVC的目录的动作时,将数据写入到与该PVC的目录所挂载的文件系统A中。节点110M中的处理器111可以将数据发送到远端存储设备120A,存储在文件系统A中。节点110M中的处理器111将数据写入到与该rootfs的目录所挂载的文件系统A的方式与节点110M的处理器111将镜像文件写入到与该image的目录所挂载的文件系统A的方式类似,此处不再赘述。
在将访问远端文件系统的功能卸载到加速装置114的场景中,加速装置114可以代替节点110M中的处理器111将数据写入到与该rootfs的目录所挂载的文件系统A中。节点110M中的加速装置114将数据写入到与该rootfs的目录所挂载的文件系统A中方式与节点110M中的处理器111将数据写入到与该rootfs的目录所挂载的文件系统A中的方式类似,区别在于执行主体不同,具体可以参见前述说明,此处不再赘述。
加速装置114可以代替节点110M中的处理器111将数据写入到与该PVC的目录所挂载的文件系统A中。节点110M中的加速装置114将数据写入到与该PVC的目录所挂载的文件系统C中方式与节点110中的处理器111将数据写入到与该PVC的目录所挂载的文件系统C中的方式类似,区别在于执 行主体不同,具体可以参见前述说明,此处不再赘述。
相应的,当应用需要调用rootfs的目录下的数据时,也可以由节点110M中的处理器111或者加速装置114从与该rootfs的目录所挂载的文件系统A中获取数据,加载到节点110M本地,以供应用调用,由加速装置114执行该操作适用于将访问远端文件系统的功能卸载到加速装置114的场景中。
当应用需要调用PVC的目录下的数据时,也可以由节点110M中的处理器111或者加速装置114从与该PVC的目录所挂载的文件系统C中获取数据,加载到节点110M本地,以供应用调用,由加速装置114执行该操作适用于将访问远端文件系统的功能卸载到加速装置114的场景中。
容器在运行过程中,用户也可以在容器中执行一些操作,如数据修改操作、数据保存操作中。用户可以根据自身需求将一些数据作为rootfs中的数据保存到rootfs的目录下,或将一些数据作为PVC中的数据保存到PVC的目录下。
节点110M中的处理器111在检测到用户需要将数据写入到PVC的目录的操作时,将数据写入到与该PVC的目录所挂载的文件系统C中。节点110M中的处理器111可以将数据发送到远端存储设备120C,存储在文件系统C中。节点110M中的处理器111将数据写入到与该PVC的目录所挂载的文件系统C的方式与节点110M的处理器111将镜像文件写入到与镜像文件的目录所挂载的文件系统B的方式类似,此处不再赘述。
在将访问远端文件系统的功能卸载到加速装置114的场景中,节点110M中的加速装置114可以代替节点110M中的处理器111将数据写入到与该PVC的目录所挂载的文件系统C中。节点110M中的加速装置114将数据写入到与该PVC的目录所挂载的文件系统C中方式与节点110M中的处理器111将数据写入到与该PVC的目录所挂载的文件系统C中的方式类似,区别在于执行主体不同。
步骤307:节点110N在容器创建之后,将容器运行过程中的临时数据等数据写入到rootfs的目录下,将容器运行过程产生的需要持久化的数据写入到PCV文件的目录下。节点110N执行步骤307的方式与节点110M执行步骤306的方式类似,具体可以参见前述说明,此处不再赘述。
基于与方法实施例同一发明构思,本申请实施例还提供了一种容器创建装置,该容器创建装置用于执行上述如图3所示的方法实施例中节点110N执行的方法,相关特征可参见上述方法实施例,此处不再赘述。如图6所示,容器创建装置600包括第一挂载模块601、第一获取模块602以及第一创建模块603。
第一挂载模块601,用于将容器的镜像文件的目录挂载到远端存储设备上的第一文件系统。
第一获取模块602,用于从第一文件系统获取镜像文件。
第一创建模块603,用于基于镜像文件在第一节点上创建容器。
在一种可能的实施方式中,第一挂载模块601可以将容器的根文件的目录挂载到远端存储设备上的第二文件系统;将容器的持续化卷PVC的目录挂载到远端存储设备上的第三文件系统。
基于与方法实施例同一发明构思,本申请实施例还提供了另一种容器创建装置,该容器创建装置用于执行上述如图3所示的方法实施例中节点110M执行的方法,相关特征可参见上述方法实施例,此处不再赘述。如图7所示,容器创建装置700包括第二挂载模块701、第二获取模块702,可选的,还包括第二创建模块703。
第二挂载模块701,用于将容器的镜像文件的目录挂载到第一文件系统。
第二获取模块702,用于从镜像仓库中获取的镜像文件,将镜像文件存储第一文件系统。
在一种可能的实施方式中,第二创建模块703可以从第一文件系统获取镜像文件,基于镜像文件在第二节点上创建容器。
在一种可能的实施方式中,第二获取模块702在将镜像文件存储第一文件系统时,将镜像文件中的增量数据存储在第一文件系统中,增量数据为第一文件系统中已存储的其他镜像文件与镜像文件不同的数据。
在一种可能的实施方式中,第二获取模块702将从镜像仓库中获取的镜像文件时,从镜像仓库中获取的镜像文件中的增量数据,增量数据为第一文件系统中已存储的其他镜像文件与镜像文件不同的数据。
在一种可能的实施方式中,第二获取模块702从镜像仓库中获取的镜像文件,将镜像文件存储第一文件系统时,将从镜像仓库中获取的压缩后的镜像文件;对压缩后的镜像文件进行解压,将解压获得的镜像文件存储第一文件系统。
需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。在本申请的实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载或执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘(solid state drive,SSD)。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变形在内。

Claims (24)

  1. 一种容器创建方法,其特征在于,所述方法包括:
    第一节点将容器的镜像文件的目录挂载到远端存储设备上的第一文件系统,其中,所述远端存储设备为独立于所述第一节点的存储设备;
    所述第一节点从所述第一文件系统获取镜像文件;
    所述第一节点基于所述镜像文件在所述第一节点上创建容器。
  2. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    第二节点将所述镜像文件的目录挂载到所述第一文件系统;
    所述第二节点从镜像仓库中获取所述镜像文件,将所述镜像文件存储在所述第一文件系统。
  3. 如权利要求2所述的方法,其特征在于,所述方法还包括:
    所述第二节点从所述第一文件系统获取所述镜像文件,基于所述镜像文件在第二节点上创建所述容器。
  4. 如权利要求2或3所述的方法,其特征在于,所述第二节点将所述镜像文件存储在所述第一文件系统,包括:所述第二节点将所述镜像文件中的增量数据存储在所述第一文件系统中,所述增量数据为所述第一文件系统中已存储的其他镜像文件与所述镜像文件不同的数据。
  5. 如权利要求2或3所述的方法,其特征在于,所述第二节点从镜像仓库中获取所述镜像文件,包括:
    所述第二节点从所述镜像仓库中获取所述镜像文件中的增量数据,所述增量数据为所述第一文件系统中已存储的其他镜像文件与所述镜像文件不同的数据。
  6. 如权利要求2~5任一项所述的方法,其特征在于,所述第一节点与所述第二节点位于不同的数据中心。
  7. 如权利要求2~6任一项所述的方法,其特征在于,所述远端存储设备与所述第一节点位于同一数据中心。
  8. 如权利要求2~6任一项所述的方法,其特征在于,所述远端存储设备与所述第一节点位于不同的数据中心。
  9. 如权利要求1~8任一项所述的方法,其特征在于,所述第一节点从所述第一文件系统,获取镜像文件,包括:
    所述第一节点中的数据处理单元DPU从所述第一文件系统获取所述镜像文件。
  10. 如权利要求1~9任一项所述的方法,其特征在于所述,所述第一节点基于所述镜像文件创建容器之前,还包括:
    所述第一节点将容器的根文件的目录挂载到第二文件系统,所述第二文件系统所在的设备为独立于所述第一节点的存储设备;
    所述第一节点将容器的持续化卷PVC的目录挂载到第三文件系统,所述第三文件系统所在的设备为独立于所述第一节点的存储设备。
  11. 如权利要求10所述的方法,其特征在于,所述方法还包括:
    所述第一节点的DPU访问所述第二文件系统以及所述第三文件系统。
  12. 一种容器创建系统,其特征在于,所述系统包括第一远端存储设备和第一节点,所述第一远端存储设备为独立于所述第一节点的存储设备;
    所述第一远端存储设备,用于部署第一文件系统;
    所述第一节点,用于将容器的镜像文件的目录挂载到所述第一文件系统,从所述第一文件系统获取镜像文件,基于所述镜像文件在所述第一节点上创建容器。
  13. 如权利要求12所述的系统,其特征在于,所述系统还包括第二节点以及镜像仓库;
    所述镜像仓库,用于保存所述镜像文件;
    所述第二节点,用于将所述镜像文件的目录挂载到所述第一文件系统;从所述镜像仓库中获取所述镜像文件,将所述镜像文件存储在所述第一文件系统。
  14. 如权利要求13所述的系统,其特征在于,所述第二节点,还用于从所述第一文件系统获取所述镜像文件,基于所述镜像文件在所述第二节点上创建所述容器。
  15. 如权利要求13或14所述的系统,其特征在于,所述第二节点在将所述镜像文件存储在所述第 一文件系统,用于:
    将所述镜像文件中的增量数据存储在所述第一文件系统中,所述增量数据为所述第一文件系统中已存储的其他镜像文件与所述镜像文件不同的数据。
  16. 如权利要求13或14所述的系统,其特征在于,所述第二节点在从所述镜像仓库中获取的所述镜像文件,用于:
    从所述镜像仓库中获取所述镜像文件中的增量数据,所述增量数据为所述第一文件系统中已存储的其他镜像文件与所述镜像文件不同的数据。
  17. 如权利要求13~16任一项所述的系统,其特征在于,所述第一节点与所述第二节点位于不同的数据中心。
  18. 如权利要求12~17任一项所述的系统,其特征在于,所述第一远端存储设备与所述第一节点位于同一数据中心。
  19. 如权利要求12~17任一项所述的系统,其特征在于,所述第一远端存储设备与所述第一节点位于不同的数据中心。
  20. 如权利要求12~19任一项所述的系统,其特征在于,所述第一节点从所述第一文件系统,获取镜像文件,用于:
    所述第一节点中的数据处理单元DPU从所述第一文件系统获取所述镜像文件。
  21. 如权利要求12~20任一项所述的系统,其特征在于所述,所述系统还包括第二远端存储设备以及第三远端存储设备,所述第二远端存储设备以及所述第三远端存储设备为独立于所述第一节点的存储设备;
    所述第二远端存储设备,用于部署第二文件系统;
    所述第三远端存储设备,用于部署第三文件系统;
    所述第一节点还用于:
    将容器的根文件的目录挂载到所述第二文件系统;
    将容器的持续化卷PVC的目录挂载到远所述第三文件系统。
  22. 如权利要求21所述的系统,其特征在于,所述第一节点还用于:
    所述第一节点的DPU访问所述第二文件系统以及所述第三文件系统。
  23. 一种容器创建节点,其特征在于,所述节点包括处理器和存储器,所述存储器用于存储计算机程序指令,所述处理器用于执行如权利要求1-11任一项所述的方法。
  24. 一种计算机存储介质,其特征在于,所述计算机可读存储介质存储有计算机可执行指令,所述计算机可执行指令用于使计算机执行权利要求1-11任一项所述的方法。
PCT/CN2023/116187 2022-09-29 2023-08-31 一种容器创建方法、系统及节点 WO2024066904A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211205094.7A CN117827363A (zh) 2022-09-29 2022-09-29 一种容器创建方法、系统及节点
CN202211205094.7 2022-09-29

Publications (1)

Publication Number Publication Date
WO2024066904A1 true WO2024066904A1 (zh) 2024-04-04

Family

ID=90476052

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/116187 WO2024066904A1 (zh) 2022-09-29 2023-08-31 一种容器创建方法、系统及节点

Country Status (2)

Country Link
CN (1) CN117827363A (zh)
WO (1) WO2024066904A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704162A (zh) * 2019-09-27 2020-01-17 北京百度网讯科技有限公司 物理机共享容器镜像的方法、装置、设备及存储介质
CN113391875A (zh) * 2020-03-13 2021-09-14 华为技术有限公司 容器部署方法与装置
CN114860344A (zh) * 2022-05-26 2022-08-05 中国工商银行股份有限公司 容器启动方法、装置、计算机设备和存储介质

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704162A (zh) * 2019-09-27 2020-01-17 北京百度网讯科技有限公司 物理机共享容器镜像的方法、装置、设备及存储介质
CN113391875A (zh) * 2020-03-13 2021-09-14 华为技术有限公司 容器部署方法与装置
CN114860344A (zh) * 2022-05-26 2022-08-05 中国工商银行股份有限公司 容器启动方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
CN117827363A (zh) 2024-04-05

Similar Documents

Publication Publication Date Title
US11556349B2 (en) Booting a secondary operating system kernel with reclaimed primary kernel memory
US11245774B2 (en) Cache storage for streaming data
CN110096220B (zh) 一种分布式存储系统、数据处理方法和存储节点
US20240039995A1 (en) Data access system and method, device, and network adapter
US8600999B2 (en) System and method for efficient resource management
CN113032099B (zh) 云计算节点、文件管理方法及装置
CN113806300B (zh) 数据存储方法、系统、装置、设备及存储介质
CN115686932B (zh) 备份集文件恢复方法、装置和计算机设备
CN115270033A (zh) 一种数据访问系统、方法、设备以及网卡
US11625192B2 (en) Peer storage compute sharing using memory buffer
CN113805789A (zh) 存储设备中的元数据处理方法及相关设备
WO2024066904A1 (zh) 一种容器创建方法、系统及节点
CN113535068A (zh) 数据读取方法和系统
CN116594551A (zh) 一种数据存储方法及装置
JP6418419B2 (ja) ハードディスクがアプリケーションコードを実行するための方法および装置
WO2022242665A1 (zh) 一种数据存储方法及相关装置
CN113934510A (zh) 镜像处理方法、装置、电子设备及计算机可读存储介质
CN108848136B (zh) 一种云服务集群的共享存储方法
CN113254415B (zh) 一种分布式文件系统读请求处理方法及装置
WO2023231572A1 (zh) 一种容器的创建方法、装置及存储介质
WO2023125285A1 (zh) 一种数据库系统更新方法及相关装置
CN114490544B (zh) 一种新加入主机下载容器镜像的方法及装置
US20240119029A1 (en) Data processing method and related apparatus
WO2023070462A1 (zh) 一种文件去重方法、装置和设备
WO2023201650A1 (zh) 文件的操作方法、装置、计算机设备和可读存储介质