WO2023045256A1 - 一种镜像的拉取方法、系统、计算机设备及可读存储介质 - Google Patents

一种镜像的拉取方法、系统、计算机设备及可读存储介质 Download PDF

Info

Publication number
WO2023045256A1
WO2023045256A1 PCT/CN2022/078481 CN2022078481W WO2023045256A1 WO 2023045256 A1 WO2023045256 A1 WO 2023045256A1 CN 2022078481 W CN2022078481 W CN 2022078481W WO 2023045256 A1 WO2023045256 A1 WO 2023045256A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
service
node
cluster
mirror
Prior art date
Application number
PCT/CN2022/078481
Other languages
English (en)
French (fr)
Inventor
王继玉
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2023045256A1 publication Critical patent/WO2023045256A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular to a mirror image pulling method, system, computer equipment and readable storage medium.
  • AI clusters that use images for training.
  • AI training is based on images, and model training is performed in containers. If it is a distributed task, multiple containers need to be trained at the same time, and these containers may be on one or more servers.
  • AI clusters are basically microservices, support containerized deployment, and support kubernetes scheduling and management. Also known as k8s, kubernetes is an open source system for automating the deployment, scaling, and management of containerized applications. k8s has fault-tolerant capabilities. If there is a problem with the mirror image and mirror warehouse of the service, the containers running each service may be affected, resulting in problems with the function of the AI cluster. Therefore, the management and maintenance of the mirror image of the AI cluster and the mirror warehouse must be important.
  • a large AI cluster may have dozens to hundreds of different types of images, and the size of the images may be tens of MB or tens of GB.
  • each training task needs to pull its own framework image to the local to start the container for model training.
  • This high-concurrency or high-frequency pull image requires not only It takes a lot of time, and at the same time, it will increase the load of the cluster, occupy various resources such as the network bandwidth of the cluster, and affect the performance of the cluster.
  • this application proposes a method, system, computer equipment and readable storage medium for pulling images, which optimizes the image pulling link of AI training tasks, improves the efficiency of image pulling and the resource utilization of AI clusters .
  • an aspect of the embodiment of the present application provides a mirror image pulling method, which specifically includes the following steps:
  • the mirror p2p service component package In response to deploying the mirrored p2p service in the AI cluster, obtain the mirrored p2p service component package written based on ansible and the configuration information of the AI cluster, and modify the deployment configuration file in the mirrored p2p service component package based on the configuration information, wherein , the mirror p2p service component package also includes a mirror p2p service component;
  • the AI task is scheduled to the work node, so that the work node pulls the image from the harbor warehouse or the node providing p2p service.
  • obtaining the configuration information of the AI cluster, and modifying the deployment configuration file based on the configuration information includes:
  • the deployment configuration file is modified based on the determined bandwidth limit of the mirror p2p service, the disk cleanup policy and the mirror p2p service policy.
  • the mirroring p2p service strategy includes:
  • Mirroring p2p service strategy for infrequent use of mirroring mirroring p2p service strategy for frequent use of non-high concurrency, mirroring p2p service strategy for high concurrency using different mirroring scenarios, mirroring p2p service strategy for high concurrency and frequent use of mirroring;
  • Determining the mirroring p2p service strategy based on the mirroring usage scenario includes:
  • the image of the image p2p service component and the corresponding image file are distributed to each node in the AI cluster based on the deployment configuration file, including:
  • the method further includes: configuring an interception policy for the node; Pull images include:
  • the method further comprises:
  • the mirror p2p service of each node is normal based on heartbeat detection, if the mirror p2p service of the node is abnormal, restart the mirror p2p service of the node to restore the mirror p2p service of the node.
  • the method further comprises:
  • the updated image is pushed to the harbor warehouse, and the control node is operated based on k8s to pull the updated image from the harbor warehouse.
  • Another aspect of the embodiment of the present application also provides a mirror image pull system, the pull system includes:
  • An acquisition module configured to respond to the deployment of mirrored p2p services in the AI cluster, obtain the mirrored p2p service component package written based on ansible and the configuration information of the AI cluster, and modify the mirrored p2p service based on the configuration information
  • a distribution module the distribution module is configured to distribute the image of the image p2p service component and the corresponding image file to each node in the AI cluster based on the deployment configuration file, and push the image served by each node to the harbor warehouse ;
  • the deployment module is configured to write a yaml file, and based on k8s and the yaml file, each of the nodes is deployed to a control node or a work node, and each of the nodes is mounted to a storage system path;
  • a pull module configured to respond to the AI cluster receiving the AI task, and based on the control node, schedule the AI task to the work node, so that the work node will receive the AI task from the harbor warehouse or provide p2p service The node pulls the image.
  • a computer device including: at least one processor; and a memory, the memory stores a computer program that can run on the processor, and the computer program is executed by the When the above-mentioned processor is executed, the steps of the above-mentioned method are realized.
  • a computer-readable storage medium stores a computer program for implementing the above-mentioned method steps when executed by a processor.
  • This application has at least the following beneficial technical effects: through the scheme of this application, the image pulling link of AI training tasks is optimized, and the one-click containerized deployment of platform image p2p services is realized through k8s, and storage systems such as NFS and Beegfs can be connected to Store the cached image and information, improve the efficiency of image pull, and manage the p2p service of the image through k8s to improve the fault tolerance and stability of the service.
  • storage systems such as NFS and Beegfs can be connected to Store the cached image and information, improve the efficiency of image pull, and manage the p2p service of the image through k8s to improve the fault tolerance and stability of the service.
  • FIG. 1 is a block diagram of an embodiment of a method for pulling a mirror image provided by the present application
  • FIG. 2 is a schematic diagram of an embodiment of a mirror image pulling system provided by the present application.
  • FIG. 3 is a schematic structural diagram of an embodiment of a computer device provided by the present application.
  • FIG. 4 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided by the present application.
  • Ansible An automated operation and maintenance tool, developed based on Python, integrates the advantages of many operation and maintenance tools (puppet, chef, func, fabric), and can realize functions such as batch system configuration, batch program deployment, and batch operation commands.
  • p2p peer-to-peer point-to-point, which means that the data transmission no longer passes through the server, but transmits data between network nodes.
  • a peer In a p2p network, a peer is both a resource provider and a resource consumer.
  • a piece is a part of the image to be pulled, and can also be regarded as an image fragment.
  • the image p2p service downloads an image, it does not transmit the entire image, but downloads the image in pieces.
  • DaemonSet Run a Daemon Pod in Kubernetes, which runs on each node of the cluster managed by Kubernetes by default.
  • yaml A language specially used to write configuration files.
  • the first aspect of the embodiments of the present application proposes an embodiment of a method for pulling a mirror image. As shown in Figure 1, it includes the following steps:
  • the mirrored p2p service component package In response to deploying the mirrored p2p service in the AI cluster, obtain the mirrored p2p service component package written based on ansible and the configuration information of the AI cluster, and modify the deployment configuration file in the mirrored p2p service component package based on the configuration information , wherein, the mirrored p2p service component package also includes a mirrored p2p service component;
  • S105 Write a yaml file, and deploy each of the nodes to a control node or a working node based on the k8s and the yaml file, and mount each of the nodes to a storage system path;
  • the mirror p2p service component package includes the mirror p2p service component and the deployment configuration file.
  • the deployment configuration file includes the steps of distributing and deploying the mirror p2p service component and related configuration files. Based on k8s Deploy mirrored p2p service components containerized into AI clusters.
  • the AI cluster When the AI cluster deploys the mirrored p2p service, obtain the configuration information of the mirrored p2p service component package and the AI cluster, and modify the deployment configuration file in the mirrored p2p service component package based on the configuration information; based on the deployment configuration file, the mirrored p2p service component
  • the image and the corresponding image file are distributed to each node in the AI cluster; the image includes the super node image and the client image, and the super node image and its image configuration file are distributed to the designated node so that the designated node is used to provide the super node service, and the client
  • the terminal image and its client image configuration file are distributed to the rest of the nodes in the cluster except the specified node so that the remaining nodes are used to provide client services, and the images of each node service are pushed to the harbor warehouse;
  • k8s belongs to the master-slave distributed architecture and consists of Master Node and Worker Node.
  • the Master Node is the control node, which is responsible for scheduling and managing the cluster;
  • the Worker Node is the working node, which is responsible for running the container of the business application.
  • the AI cluster When the AI cluster receives an AI task, the AI task will be delivered to the control node first, and then the control node will dispatch the AI task to the corresponding working node. After the working nodes receive their respective AI tasks, they pull images from the harbor warehouse or from nodes that provide p2p services.
  • the image pulling link of the AI training task is optimized, and the one-key containerized deployment of the platform image p2p service is realized through k8s, and storage systems such as NFS and Beegfs can be connected to store cached images and information, improving It improves the efficiency of image pulling, and manages the p2p service of the image through k8s to improve the fault tolerance and stability of the service.
  • storage systems such as NFS and Beegfs can be connected to store cached images and information, improving It improves the efficiency of image pulling, and manages the p2p service of the image through k8s to improve the fault tolerance and stability of the service.
  • obtaining the configuration information of the AI cluster, and modifying the deployment configuration file based on the configuration information includes:
  • the deployment configuration file is modified based on the determined bandwidth limit of the mirror p2p service, the disk cleanup policy and the mirror p2p service policy.
  • the mirroring p2p service strategy includes:
  • Mirroring p2p service strategy for infrequent use of mirroring mirroring p2p service strategy for frequent use of non-high concurrency, mirroring p2p service strategy for high concurrency using different mirroring scenarios, mirroring p2p service strategy for high concurrency and frequent use of mirroring;
  • Determining the mirroring p2p service strategy based on the mirroring usage scenario includes:
  • the AI cluster when the AI cluster deploys mirroring p2p services, determine the network bandwidth, disk space, and mirroring usage scenarios of the cluster.
  • the bandwidth limit of the mirror p2p service based on the network bandwidth. For example, when the cluster client pulls the mirror through p2p, it will pull the mirror according to the bandwidth configured by the mirror p2p service during deployment. The higher the bandwidth supported by the cluster, the more The faster the speed, the lower the bandwidth setting, the longer the image pull time, and the bandwidth of the configured image p2p service generally does not exceed the maximum network bandwidth of the cluster. Generally, when the bandwidth is sufficient, it takes a very short time to pull each layer of a large image with tens of GB. However, to pull the entire image successfully, not only the image layer is pulled, but also each layer image needs to be extracted to the local.
  • the scenarios where the mirroring service is used in the cluster are divided into: scenarios where mirroring is not frequently used, scenarios where mirroring is frequently used without high concurrency, scenarios where different mirroring is used with high concurrency, scenarios where mirroring is frequently used with high concurrency, and mirroring based on each mirroring usage scenario
  • the p2p service strategy is as follows:
  • Mirroring p2p service strategy for infrequently used mirroring scenarios If the same mirroring is not used within the specified time, it means that the mirroring is not a frequently used mirroring, and there is no need to open the download service of the node peer and the corresponding task task process all the time. If the download service of the node peer is closed and the image continues to be used later, the image will be directly pulled from the cache source of the super node. At the same time, after the image is pulled by other peers, other peers can provide download within a specified time Services like this are opened after use, and do not use recycled configuration strategies, which can improve the resource utilization efficiency of AI clusters;
  • Mirror p2p service strategy for frequent use of mirrors and non-high concurrency scenarios: If the mirror is frequently used, but not used with high concurrency, each cached peer provides a p2p download function, and the service time can be adjusted as needed. Because every time it is used, there will be a new node to provide the download service within the specified time. Even if the peer that started the download service first is recycled, it will not affect the subsequent peers to provide p2p services. At the same time, resources can also be recycled to improve the overall resources of the AI cluster. usage efficiency;
  • Mirroring p2p service strategy for high concurrency and frequent use scenarios If the mirroring is used with high concurrency, the p2p function of each peer can be fully utilized. The node peers that have pulled the mirroring in large clusters will continue to open the download service. The peer that starts the download service will be recycled once the specified time is up. If all peers are recycled, but there is a high concurrency scenario, the peers who download the image at the same time will provide p2p services to each other, and provide download services while pulling the image fragments, without affecting the p2p function of the image .
  • mirroring p2p service strategies are provided for different scenarios using mirroring p2p services, which ensures efficient and stable downloading of mirroring to the local area, and improves cluster resource efficiency and mirroring download efficiency.
  • distributing the image of the image p2p service component and the corresponding image file to each node in the AI cluster based on the deployment configuration file includes:
  • the method further includes: configuring an interception policy for the node; Pull images include:
  • an interception strategy is configured for nodes, so that mirrored p2p services only intercept specified harbor warehouses, and other unspecified mirrored warehouses are not intercepted. Furthermore, configure the interception policy of the node where the image p2p client is located to not intercept the image of the harbor warehouse. For example, when the working node receives the AI task, it will start to pull the image, and the image p2p client of the node will intercept the pull image. Let it pull the image from the mirror p2p network instead of directly pulling the image from the harbor warehouse.
  • the method further comprises:
  • the mirror p2p service of each node is normal based on heartbeat detection, if the mirror p2p service of the node is abnormal, restart the mirror p2p service of the node to restore the mirror p2p service of the node.
  • the method further comprises:
  • the updated image is pushed to the harbor warehouse, and the control node is operated based on k8s to pull the updated image from the harbor warehouse.
  • the function of the mirror p2p service is modified, you only need to modify the p2p mirror, push the modified mirror to the harbor warehouse, and then operate the control node of the cluster node based on the k8s yaml file, and let the control node pull the modification from the harbor warehouse After the final image, the super node and client pods are gradually operated through k8s, so that the super node and client pods re-pull the p2p image from the harbor warehouse, and use the new image for containerized deployment services.
  • the embodiment of the present application also provides a mirror image pulling system, and the pulling system includes:
  • the acquisition module 110 is configured to respond to the deployment of image p2p service in AI cluster, obtain the image p2p service component package based on ansible and the configuration information of the AI cluster, and modify the image based on the configuration information
  • the deployment configuration file in the p2p service component package wherein, the mirror image p2p service component package also includes a mirror image p2p service component;
  • Distribution module 120 the distribution module 120 is configured to distribute the image of the image p2p service component and the corresponding image file to each node in the AI cluster based on the deployment configuration file, and push the image served by each node to harbor warehouse;
  • the deployment module 130 is configured to write a yaml file, and based on k8s and the yaml file, each of the nodes is deployed to a control node or a work node, and each of the nodes is mounted to a storage system path;
  • the pull module 140 is configured to respond to the AI cluster receiving the AI task, based on the control node to schedule the AI task to the work node, so that the work node from the harbor warehouse or provide p2p The serving node pulls the image.
  • the embodiment of the present application also provides a computer device 20, which includes a processor 210 and a memory 220, and the memory 220 stores There is a computer program 221 executable on the processor, which when executed by the processor 210 performs the steps of the method as described above.
  • the embodiment of the present application also provides a computer-readable storage medium 30.
  • the computer-readable storage medium 30 stores the Computer program 310 of the method described above.
  • the storage medium may be a read-only memory, a magnetic disk or an optical disk, and the like.

Abstract

本申请公开了一种镜像的拉取方法、系统、计算机设备及可读存储介质,方法包括:在AI集群部署镜像p2p服务时,获取镜像p2p服务组件包和集群的配置信息,并修改部署配置文件;基于部署配置文件将镜像p2p服务组件的镜像分发到集群的节点,并将节点服务的镜像推送到harbor仓库;编写yaml文件,基于k8s以及yaml文件将各个节点分别部署到控制节点或工作节点,并将节点挂载存储系统路径;响应于集群接收到任务,基于控制节点将任务调度到工作节点,使工作节点从harbor仓库或提供p2p服务的节点拉取镜像。通过本申请,提高了镜像拉取效率与集群的资源利用率。

Description

一种镜像的拉取方法、系统、计算机设备及可读存储介质
本申请要求在2021年09月22日提交中国专利局、申请号为202111104441.2、发明名称为“一种镜像的拉取方法、系统、计算机设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种镜像的拉取方法、系统、计算机设备及可读存储介质。
背景技术
本申请主要应用于使用镜像进行训练的AI(Artificial Intelligence,人工智能)集群。现阶段AI训练都是基于镜像,在容器中进行模型训练。如果是分布式任务,需要多个容器同时进行训练,这些容器可能在一台或者多台服务器上。同时,AI集群基本都是微服务,支持容器化部署,并且支持kubernetes的调度和管理。kubernetes也被称为k8s,是一个用于自动化部署、扩展和管理容器化应用程序的开源系统。k8s具备容错能力,如果服务的镜像和镜像仓库出现问题,运行各个服务的容器,都可能受到影响,导致AI集群的功能出现问题,所以,对AI集群的镜像、以及镜像仓库的管理和维护至关重要。
现阶段,进行模型训练,需要安装不同深度学习框架的镜像,大型的AI集群可能拥有几十到上百个不同类型的镜像,镜像的大小可能是几十MB,也可能是几十GB。如果对于大型的集群,高并发或者有大量用户进行模型训练,每个训练任务都需要拉取自己的框架镜像到本地,才能启动 容器进行模型训练,这个高并发或高频率拉取镜像,不仅需要花费大量时间,同时,也会增加集群的负载,占用集群的网络带宽等各类资源,影响集群的性能。
发明内容
有鉴于此,本申请提出了一种镜像的拉取方法、系统、计算机设备及可读存储介质,优化了AI训练任务的镜像拉取环节,提高了镜像拉取效率与AI集群的资源利用率。
基于上述目的,本申请实施例的一方面提供了一种镜像的拉取方法,具体包括如下步骤:
响应于在AI集群部署镜像p2p服务,获取基于ansible编写的镜像p2p服务组件包以及所述AI集群的配置信息,并基于所述配置信息修改所述镜像p2p服务组件包中的部署配置文件,其中,所述镜像p2p服务组件包还包括镜像p2p服务组件;
基于所述部署配置文件将镜像p2p服务组件的镜像和对应的镜像文件分发到AI集群中的各个节点,并将各个所述节点所服务的镜像推送到harbor仓库;
编写yaml文件,并基于k8s以及所述yaml文件将各个所述节点分别部署到控制节点或工作节点,并将各个所述节点挂载存储系统路径;
响应于AI集群接收到AI任务,基于所述控制节点将所述AI任务调度到工作节点,使所述工作节点从所述harbor仓库或提供p2p服务的节点拉取镜像。
在一些实施方式中,获取所述AI集群的配置信息,并基于所述配置信息修改所述部署配置文件包括:
获取所述AI集群中各个所述节点的网络带宽、磁盘空间、镜像使用场景;
基于所述网络带宽确定镜像p2p服务的带宽限制,并基于所述磁盘空 间确定磁盘清理策略,并基于所述镜像使用场景确定镜像p2p服务策略;
基于确定的镜像p2p服务的带宽限制、磁盘清理策略和镜像p2p服务策略修改所述部署配置文件。
在一些实施方式中,所述镜像p2p服务策略包括:
镜像不频繁使用场景的镜像p2p服务策略、镜像频繁使用非高并发场景的镜像p2p服务策略、高并发使用不同镜像场景的镜像p2p服务策略、镜像高并发频繁使用场景的镜像p2p服务策略;
基于所述镜像使用场景确定镜像p2p服务策略包括:
基于所述镜像使用场景从所述镜像p2p服务策略中选取对应场景的镜像p2p服务策略。
在一些实施方式中,基于所述部署配置文件将镜像p2p服务组件的镜像和对应的镜像文件分发到AI集群中的各个节点,包括:
基于所述部署配置文件将镜像p2p服务组件的超级节点镜像和对应的镜像文件分发到所述AI集群中的指定节点以使所述指定节点提供镜像p2p服务,并将镜像p2p服务组件的客户端镜像和对应的客户端文件分发到AI集群中的其余节点以使其余所述节点提供镜像p2p服务。
在一些实施方式中,在将所述节点所服务的镜像推送到harbor仓库之后,方法进一步包括:为所述节点配置拦截策略;并且使所述工作节点从所述harbor仓库或提供p2p服务的节点拉取镜像包括:
使所述工作节点基于所述拦截策略从所述harbor仓库或提供p2p服务的节点拉取镜像。
在一些实施方式中,方法进一步包括:
基于心跳检测各个节点的镜像p2p服务是否正常,若是所述节点的镜像p2p服务不正常,则重启所述节点的镜像p2p服务以恢复所述节点的镜像p2p服务。
在一些实施方式中,方法进一步包括:
响应于所述镜像有更新,将更新后的镜像推送到harbor仓库,基于k8s操作所述控制节点从所述harbor仓库拉取更新后的镜像。
本申请实施例的另一方面,还提供了一种镜像的拉取系统,拉取系统包括:
获取模块,所述获取模块配置为响应于在AI集群部署镜像p2p服务,获取基于ansible编写的镜像p2p服务组件包以及所述AI集群的配置信息,并基于所述配置信息修改所述镜像p2p服务组件包中的部署配置文件,其中,所述镜像p2p服务组件包还包括镜像p2p服务组件;
分发模块,所述分发模块配置为基于所述部署配置文件将镜像p2p服务组件的镜像和对应的镜像文件分发到AI集群中的各个节点,并将各个所述节点所服务的镜像推送到harbor仓库;
部署模块,所述部署模块配置为编写yaml文件,并基于k8s以及所述yaml文件将各个所述节点分别部署到控制节点或工作节点,并将各个所述节点挂载存储系统路径;
拉取模块,所述拉取模块配置为响应于AI集群接收到AI任务,基于所述控制节点将所述AI任务调度到工作节点,使所述工作节点从所述harbor仓库或提供p2p服务的节点拉取镜像。
本申请实施例的又一方面,还提供了一种计算机设备,包括:至少一个处理器;以及存储器,所述存储器存储有可在所述处理器上运行的计算机程序,所述计算机程序由所述处理器执行时实现如上所述方法的步骤。
本申请实施例的再一方面,还提供了一种计算机可读存储介质,计算机可读存储介质存储有被处理器执行时实现如上所述方法步骤的计算机程序。
本申请至少具有以下有益技术效果:通过本申请的方案,优化了AI训练任务的镜像拉取环节,通过k8s实现平台镜像p2p服务的一键容器化部署,并且可以对接NFS、Beegfs等存储系统以存储缓存的镜像和信息,提高了镜像拉取效率,并通过k8s管理镜像的p2p服务,提高服务的容错能力和稳 定性。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的实施例。
图1为本申请提供的镜像的拉取方法的一实施例的框图;
图2为本申请提供的镜像的拉取系统的一实施例的示意图;
图3为本申请提供的计算机设备的一实施例的结构示意图;
图4为本申请提供的计算机可读存储介质的一实施例的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本申请实施例进一步详细说明。
为了更好的理解本申请实施例,首先对本申请实施例中所涉及的相关技术术语进行说明。
Ansible:一种自动化运维工具,基于Python开发,集合了众多运维工具(puppet、chef、func、fabric)的优点,能够实现批量系统配置、批量程序部署、批量运行命令等功能。
p2p:peer-to-peer点对点,指数据的传输不再通过服务器,而是各网络节点之间传递数据。
peer:p2p网络中,peer既是资源的提供者,也是资源的消费者。
piece:piece是将要被拉取镜像的一部分,也可以视为镜像分片,镜像p2p服务下载镜像时,不是传输整个镜像,而是将镜像分片下载。
DaemonSet:在Kubernetes里运行一个Daemon Pod,默认运行在 Kubernetes所管理的集群的每一个节点上。
yaml:一种专门用来写配置文件的语言。
需要说明的是,本申请实施例中所有使用“第一”和“第二”的表述均是为了区分两个相同名称非相同的实体或者非相同的参量,可见“第一”、“第二”仅为了表述的方便,不应理解为对本申请实施例的限定,后续实施例对此不再一一说明。
基于上述目的,本申请实施例的第一个方面,提出了一种镜像的拉取方法的实施例。如图1所示,其包括如下步骤:
S101、响应于在AI集群部署镜像p2p服务,获取基于ansible编写的镜像p2p服务组件包以及所述AI集群的配置信息,并基于所述配置信息修改所述镜像p2p服务组件包中的部署配置文件,其中,所述镜像p2p服务组件包还包括镜像p2p服务组件;
S103、基于所述部署配置文件将镜像p2p服务组件的镜像和对应的镜像文件分发到AI集群中的各个节点,并将各个所述节点所服务的镜像推送到harbor仓库;
S105、编写yaml文件,并基于k8s以及所述yaml文件将各个所述节点分别部署到控制节点或工作节点,并将各个所述节点挂载存储系统路径;
S107、响应于AI集群接收到AI任务,基于所述控制节点将所述AI任务调度到工作节点,使所述工作节点从所述harbor仓库或提供p2p服务的节点拉取镜像。
具体的,预先使用ansible编写好镜像p2p服务组件包,镜像p2p服务组件包包括镜像p2p服务组件以及部署配置文件,部署配置文件包括分发以及部署镜像p2p服务组件的步骤及相关的配置文件,基于k8s将镜像p2p服务组件容器化部署到AI集群中。
当AI集群部署镜像p2p服务时,获取镜像p2p服务组件包以及AI集群的配置信息,并基于配置信息修改所述镜像p2p服务组件包中的部署配置文件;基于部署配置文件将镜像p2p服务组件的镜像和对应的镜像文 件分发到AI集群中的各个节点;镜像包括超级节点镜像和客户端镜像,将超级节点镜像及其镜像配置文件分发到指定节点使指定节点用于提供超级节点服务,将客户端镜像及其客户端镜像配置文件分发到集群中除了指定节点以外的其余节点使其余节点用于提供客户端服务,并将各个节点服务的镜像推送到harbor仓库;
k8s属于主从分布式架构,由Master Node和Worker Node组成。Master Node是控制节点,负责对集群进行调度管理;Worker Node是工作节点,负责运行业务应用的容器。
根据镜像使用场景编写k8s的DeamonSet应用的yaml文件,基于k8s的yaml文件将各个分发到镜像的节点分别部署到控制节点或工作节点,即,将超级节点服务和客户端服务分别部署到控制节点或工作节点。超级节点服务和客户端服务可以同时部署在同一台服务器,也可以分别部署在不同的服务器,在大集群环境中(集群中节点数量较多的环境),为了提高超级节点服务的性能和效率,一般将超级节点服务和客户端服务分开部署。可选的,将超级节点服务部署到工作节点,将客户端服务部署到控制节点。将集群中的节点挂载存储系统路径,挂载的存储系统用于存放镜像缓存数据和信息,存储系统可以为NFS、Beegfs等存储系统。
AI集群接收到AI任务时,AI任务会先下发到控制节点,再由控制节点将AI任务调度到相应的工作节点。工作节点接收到各自的AI任务后,从所述harbor仓库或从提供p2p服务的节点拉取镜像。
通过本申请的实施例,优化了AI训练任务的镜像拉取环节,通过k8s实现平台镜像p2p服务的一键容器化部署,并且可以对接NFS、Beegfs等存储系统以存储缓存的镜像和信息,提高了镜像拉取效率,并通过k8s管理镜像的p2p服务,提高服务的容错能力和稳定性。
在一些实施方式中,获取所述AI集群的配置信息,并基于所述配置信息修改所述部署配置文件包括:
获取所述AI集群中各个节点的网络带宽、磁盘空间、镜像使用场景;
基于所述网络带宽确定镜像p2p服务的带宽限制,并基于所述磁盘空间确定磁盘清理策略,并基于所述镜像使用场景确定镜像p2p服务策略;
基于确定的镜像p2p服务的带宽限制、磁盘清理策略和镜像p2p服务策略修改所述部署配置文件。
在一些实施方式中,所述镜像p2p服务策略包括:
镜像不频繁使用场景的镜像p2p服务策略、镜像频繁使用非高并发场景的镜像p2p服务策略、高并发使用不同镜像场景的镜像p2p服务策略、镜像高并发频繁使用场景的镜像p2p服务策略;
基于所述镜像使用场景确定镜像p2p服务策略包括:
基于所述镜像使用场景从所述镜像p2p服务策略中选取对应场景的镜像p2p服务策略。
具体的,当AI集群部署镜像p2p服务时,确定集群的网络带宽、磁盘空间、镜像使用场景。
基于网络带宽确定镜像p2p服务的带宽限制,例如,集群客户端在通过p2p拉取镜像时,会根据部署时镜像p2p服务配置的带宽来拉取镜像,集群支持的带宽越高,拉取镜像的速度越快,带宽设置越低,拉取镜像时间越长,配置的镜像p2p服务的带宽一般不超过集群的网络带宽的最大值。一般,带宽充足时,拉取几十GB的大镜像各层,耗时特别短。但是整个镜像的拉取成功,不光是拉取镜像层,还需要抽取每层镜像到本地,带宽充足时,抽取镜像层花费的时间比下载镜层花费的时间更长。可选的,为了提高AI集群镜像的拉取速度和效率,缩短镜像拉取时间,可以使用Dockerfile制作镜像,并优化Dockerfile文件中镜像的每一层操作。
基于磁盘空间确定磁盘清理策略,在使用镜像p2p服务过程中,如果镜像p2p缓存的镜像临时文件超过设置的阈值,会触发清理任务,清理时,根据缓存文件日期和使用时间,先清理最早缓存和未被使用的镜像临时文件,释放磁盘文件。
根据集群使用镜像业务的不同场景配置不同的镜像p2p服务策略。
首先对集群使用镜像业务的场景进行划分,分别为:镜像不频繁使用场景、镜像频繁使用非高并发场景、高并发使用不同镜像场景、镜像高并发频繁使用场景,基于各个镜像使用场景制定的镜像p2p服务策略如下:
1)镜像不频繁使用场景的镜像p2p服务策略:如果规定时间之内没有相同镜像被使用,说明该镜像不是被频繁使用的镜像,无需一直开启节点peer的下载服务,以及对应的任务task进程。如果节点peer的下载服务被关闭,后面继续使用该镜像,则会直接从超级节点的缓存源拉取该镜像,同时,该镜像被其它peer拉取后,其它peer又可以提供规定时间内的下载服务,像这种使用后即被开启,不使用被回收的配置策略,可以提高AI集群的资源利用效率;
2)镜像频繁使用非高并发场景的镜像p2p服务策略:如果该镜像被频繁使用,但不是高并发使用,则各缓存过的peer提供p2p下载功能,可以根据需要调整服务时间。因为每回使用,都会有新的节点提供规定时间内的下载服务,最早开启下载服务的peer即使被回收,也不影响后续的peer提供p2p服务,同时也能回收资源,提高AI集群的整体资源利用效率;
3)高并发使用不同镜像场景的镜像p2p服务策略:高并发同时拉取不同镜像,在规定时间内,如果存在该镜像的peer,则继续提供p2p服务,如果不存在peer,则从超级节点拉取。压力最大的场景是既没有peer提供p2p下载服务,超级节点也未缓存该镜像,则高并发时,超级节点边缓存,边提供镜像的piece分片下载服务,所有的并发压力通过超级节点减轻集群的harbor仓库的压力,同时,拉取过程中,各peer会开启各自镜像的p2p下载服务。在新环境下都会遇到此类问题,环境使用时间越长,镜像使用越频繁,镜像p2p服务的优势越明显。
4)镜像高并发频繁使用场景的镜像p2p服务策略:如果是高并发使用该镜像,则能够充分发挥各peer的p2p功能,大集群拉取过该镜像的节点peer,会继续开启下载服务,最早开启下载服务的peer,规定时间一到,就会被回收。如果遇到所有peer都被回收,但是出现高并发场景的 情况,则同时下载该镜像的peer会相互之间提供p2p服务,边拉取镜像分片piece,边提供下载服务,不影响镜像p2p功能。
通过本申请的实施例,为不同的使用镜像p2p服务的场景提供不同的镜像p2p服务策略,保证了镜像可以高效且稳定的下载到本地,提高了集群的资源利率了和镜像的下载效率。
在一些实施方式中,基于所述部署配置文件将镜像p2p服务组件的镜像和对应的镜像文件分发到AI集群中的各个节点包括:
基于所述部署配置文件将镜像p2p服务组件的超级节点镜像和对应的镜像文件分发到所述AI集群中的指定节点以使所述指定节点提供镜像p2p服务,并将镜像p2p服务组件的客户端镜像和对应的客户端文件分发到AI集群中的其余节点以使其余所述节点提供镜像p2p服务。
在一些实施方式中,在将所述节点所服务的镜像推送到harbor仓库之后,方法进一步包括:为所述节点配置拦截策略;并且使所述工作节点从所述harbor仓库或提供p2p服务的节点拉取镜像包括:
使所述工作节点基于所述拦截策略从所述harbor仓库或提供p2p服务的节点拉取镜像。
具体的,为了实现harbor和镜像p2p服务的VIP高可用,为节点配置拦截策略,让镜像p2p服务只拦截指定的harbor仓库,其它未指定的镜像仓库不进行拦截。更进一步的,将镜像p2p客户端所在节点的拦截策略配置为不拦截harbor仓库的镜像,例如,当工作节点接收到AI任务后,会开始拉取镜像,该节点的镜像p2p客户端会拦截拉取的镜像,让其从镜像p2p网络中拉取该镜像,不再从harbor仓库直接拉取镜像。
在一些实施方式中,方法进一步包括:
基于心跳检测各个节点的镜像p2p服务是否正常,若是所述节点的镜像p2p服务不正常,则重启所述节点的镜像p2p服务以恢复所述节点的镜像p2p服务。
在一些实施方式中,方法进一步包括:
响应于所述镜像有更新,将更新后的镜像推送到harbor仓库,基于k8s操作所述控制节点从所述harbor仓库拉取更新后的镜像。
具体的,如果镜像p2p服务的功能被修改,只需要修改p2p镜像,将修改后的镜像推送到harbor仓库,然后基于k8s的yaml文件操作集群节点的控制节点,让控制节点从harbor仓库拉取修改后的镜像,通过k8s逐步操作超级节点和客户端的pod,让超级节点和客户端的pod从harbor仓库重新拉取p2p镜像,并使用新镜像进行容器化部署服务。
基于同一发明构思,根据本申请的另一个方面,如图2所示,本申请的实施例还提供了一种镜像的拉取系统,拉取系统包括:
获取模块110,所述获取模块110配置为响应于在AI集群部署镜像p2p服务,获取基于ansible编写的镜像p2p服务组件包以及所述AI集群的配置信息,并基于所述配置信息修改所述镜像p2p服务组件包中的部署配置文件,其中,所述镜像p2p服务组件包还包括镜像p2p服务组件;
分发模块120,所述分发模块120配置为基于所述部署配置文件将镜像p2p服务组件的镜像和对应的镜像文件分发到AI集群中的各个节点,并将各个所述节点所服务的镜像推送到harbor仓库;
部署模块130,所述部署模块130配置为编写yaml文件,并基于k8s以及所述yaml文件将各个所述节点分别部署到控制节点或工作节点,并将各个所述节点挂载存储系统路径;
拉取模块140,所述拉取模块140配置为响应于AI集群接收到AI任务,基于所述控制节点将所述AI任务调度到工作节点,使所述工作节点从所述harbor仓库或提供p2p服务的节点拉取镜像。
基于同一发明构思,根据本申请的另一个方面,如图3所示,本申请的实施例还提供了一种计算机设备20,在该计算机设备20中包括处理器210以及存储器220,存储器220存储有可在处理器上运行的计算机程序221,处理器210执行程序时执行如上所述的方法的步骤。
基于同一发明构思,根据本申请的另一个方面,如图4所示,本申请 的实施例还提供了一种计算机可读存储介质30,计算机可读存储介质30存储有被处理器执行时执行如上所述方法的计算机程序310。
最后需要说明的是,本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,可以通过计算机程序来指令相关硬件来完成,程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,程序的存储介质可为磁碟、光盘、只读存储记忆体(ROM)或随机存储记忆体(RAM)等。上述计算机程序的实施例,可以达到与之对应的前述任意方法实施例相同或者相类似的效果。
本领域技术人员还将明白的是,结合这里的公开所描述的各种示例性逻辑块、模块、电路和算法步骤可以被实现为电子硬件、计算机软件或两者的组合。为了清楚地说明硬件和软件的这种可互换性,已经就各种示意性组件、方块、模块、电路和步骤的功能对其进行了一般性的描述。这种功能是被实现为软件还是被实现为硬件取决于具体应用以及施加给整个系统的设计约束。本领域技术人员可以针对每种具体应用以各种方式来实现的功能,但是这种实现决定不应被解释为导致脱离本申请实施例公开的范围。
以上是本申请公开的示例性实施例,但是应当注意,在不背离权利要求限定的本申请实施例公开的范围的前提下,可以进行多种改变和修改。根据这里描述的公开实施例的方法权利要求的功能、步骤和/或动作不需以任何特定顺序执行。此外,尽管本申请实施例公开的元素可以以个体形式描述或要求,但除非明确限制为单数,也可以理解为多个。
应当理解的是,在本文中使用的,除非上下文清楚地支持例外情况,单数形式“一个”旨在也包括复数形式。还应当理解的是,在本文中使用的“和/或”是指包括一个或者一个以上相关联地列出的项目的任意和所有可能组合。
上述本申请实施例公开实施例序号仅仅为了描述,不代表实施例的优劣。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
所属领域的普通技术人员应当理解:以上任何实施例的讨论仅为示例性的,并非旨在暗示本申请实施例公开的范围(包括权利要求)被限于这些例子;在本申请实施例的思路下,以上实施例或者不同实施例中的技术特征之间也可以进行组合,并存在如上的本申请实施例的不同方面的许多其它变化,为了简明它们没有在细节中提供。因此,凡在本申请实施例的精神和原则之内,所做的任何省略、修改、等同替换、改进等,均应包含在本申请实施例的保护范围之内。

Claims (10)

  1. 一种镜像的拉取方法,其特征在于,包括:
    响应于在AI集群部署镜像p2p服务,获取基于ansible编写的镜像p2p服务组件包以及所述AI集群的配置信息,并基于所述配置信息修改所述镜像p2p服务组件包中的部署配置文件,其中,所述镜像p2p服务组件包还包括镜像p2p服务组件;
    基于所述部署配置文件将镜像p2p服务组件的镜像和对应的镜像文件分发到AI集群中的各个节点,并将各个所述节点所服务的镜像推送到harbor仓库;
    编写yaml文件,并基于k8s以及所述yaml文件将各个所述节点分别部署到控制节点或工作节点,并将各个所述节点挂载存储系统路径;
    响应于AI集群接收到AI任务,基于所述控制节点将所述AI任务调度到工作节点,使所述工作节点从所述harbor仓库或提供p2p服务的节点拉取镜像。
  2. 根据权利要求1所述的方法,其特征在于,获取所述AI集群的配置信息,并基于所述配置信息修改所述部署配置文件包括:
    获取所述AI集群中各个所述节点的网络带宽、磁盘空间、镜像使用场景;
    基于所述网络带宽确定镜像p2p服务的带宽限制,并基于所述磁盘空间确定磁盘清理策略,并基于所述镜像使用场景确定镜像p2p服务策略;
    基于确定的镜像p2p服务的带宽限制、磁盘清理策略和镜像p2p服务策略修改所述部署配置文件。
  3. 根据权利要求2所述的方法,其特征在于,所述镜像p2p服务策略包括:
    镜像不频繁使用场景的镜像p2p服务策略、镜像频繁使用非高并发场景的镜像p2p服务策略、高并发使用不同镜像场景的镜像p2p服务策略、镜像高并发频繁使用场景的镜像p2p服务策略;
    基于所述镜像使用场景确定镜像p2p服务策略包括:
    基于所述镜像使用场景从所述镜像p2p服务策略中选取对应场景的镜像p2p服务策略。
  4. 根据权利要求1所述的方法,其特征在于,基于所述部署配置文件将镜像p2p服务组件的镜像和对应的镜像文件分发到AI集群中的各个节点包括:
    基于所述部署配置文件将镜像p2p服务组件的超级节点镜像和对应的镜像文件分发到所述AI集群中的指定节点以使所述指定节点提供镜像p2p服务,并将镜像p2p服务组件的客户端镜像和对应的客户端文件分发到AI集群中的其余节点以使其余所述节点提供镜像p2p服务。
  5. 根据权利要求1所述的方法,其特征在于,在将所述节点所服务的镜像推送到harbor仓库之后,方法进一步包括:为所述节点配置拦截策略;并且使所述工作节点从所述harbor仓库或提供p2p服务的节点拉取镜像包括:
    使所述工作节点基于所述拦截策略从所述harbor仓库或提供p2p服务的节点拉取镜像。
  6. 根据权利要求1所述的方法,其特征在于,进一步包括:
    基于心跳检测各个节点的镜像p2p服务是否正常,若是所述节点的镜像p2p服务不正常,则重启所述节点的镜像p2p服务以恢复所述节点的镜像p2p服务。
  7. 根据权利要求1所述的方法,其特征在于,进一步包括:
    响应于所述镜像有更新,将更新后的镜像推送到harbor仓库,基于k8s操作所述控制节点从所述harbor仓库拉取更新后的镜像。
  8. 一种镜像的拉取系统,其特征在于,包括:
    获取模块,所述获取模块配置为响应于在AI集群部署镜像p2p服务,获取基于ansible编写的镜像p2p服务组件包以及所述AI集群的配置信息,并基于所述配置信息修改所述镜像p2p服务组件包中的部署配置文件,其中,所述镜像p2p服务组件包还包括镜像p2p服务组件;
    分发模块,所述分发模块配置为基于所述部署配置文件将镜像p2p服务 组件的镜像和对应的镜像文件分发到AI集群中的各个节点,并将各个所述节点所服务的镜像推送到harbor仓库;
    部署模块,所述部署模块配置为编写yaml文件,并基于k8s以及所述yaml文件将各个所述节点分别部署到控制节点或工作节点,并将各个所述节点挂载存储系统路径;
    拉取模块,所述拉取模块配置为响应于AI集群接收到AI任务,基于所述控制节点将所述AI任务调度到工作节点,使所述工作节点从所述harbor仓库或提供p2p服务的节点拉取镜像。
  9. 一种计算机设备,包括:
    至少一个处理器;以及
    存储器,所述存储器存储有可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时执行如权利要求1至7任意一项所述的方法的步骤。
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时执行如权利要求1至7任意一项所述的方法的步骤。
PCT/CN2022/078481 2021-09-22 2022-02-28 一种镜像的拉取方法、系统、计算机设备及可读存储介质 WO2023045256A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111104441.2A CN113568624A (zh) 2021-09-22 2021-09-22 一种镜像的拉取方法、系统、计算机设备及可读存储介质
CN202111104441.2 2021-09-22

Publications (1)

Publication Number Publication Date
WO2023045256A1 true WO2023045256A1 (zh) 2023-03-30

Family

ID=78173884

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/078481 WO2023045256A1 (zh) 2021-09-22 2022-02-28 一种镜像的拉取方法、系统、计算机设备及可读存储介质

Country Status (2)

Country Link
CN (1) CN113568624A (zh)
WO (1) WO2023045256A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117033325A (zh) * 2023-10-08 2023-11-10 恒生电子股份有限公司 镜像文件的预热拉取方法及装置
CN117270886A (zh) * 2023-11-17 2023-12-22 浪潮通用软件有限公司 一种微服务系统开发部署方法、设备及介质
CN117369952A (zh) * 2023-12-08 2024-01-09 中电云计算技术有限公司 集群的处理方法、装置、设备及存储介质

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113568624A (zh) * 2021-09-22 2021-10-29 苏州浪潮智能科技有限公司 一种镜像的拉取方法、系统、计算机设备及可读存储介质
US11729051B2 (en) 2022-01-12 2023-08-15 Red Hat, Inc. Automated deployment of control nodes at remote locations
CN114390106B (zh) * 2022-03-24 2022-07-05 广州医科大学附属第五医院 基于Kubernetes容器资源的调度方法、调度器及调度系统
CN115051846B (zh) * 2022-06-07 2023-11-10 北京天融信网络安全技术有限公司 基于超融合平台的k8s集群的部署方法及电子设备
CN116614517B (zh) * 2023-04-26 2023-09-29 江苏博云科技股份有限公司 一种针对边缘计算场景的容器镜像预热及分发方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170124114A1 (en) * 2015-10-28 2017-05-04 Beijing Baidu Netcom Science And Technology, Ltd. Method and Device for Pulling Virtual Machine Mirror File
CN111736956A (zh) * 2020-06-29 2020-10-02 苏州浪潮智能科技有限公司 一种容器服务部署方法、装置、设备及可读存储介质
CN113568624A (zh) * 2021-09-22 2021-10-29 苏州浪潮智能科技有限公司 一种镜像的拉取方法、系统、计算机设备及可读存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20180115282A (ko) * 2016-02-23 2018-10-22 엔체인 홀딩스 리미티드 블록체인을 이용하는 피어 - 투 - 피어 분산 장부에서 개체들의 효율적인 전송을 위한 방법 및 시스템
US20190303187A1 (en) * 2018-03-29 2019-10-03 The United States Of America As Represented By The Secretary Of The Navy Methods, devices, and systems for distributing software to and deploying software in a target environment
CN108694053A (zh) * 2018-05-14 2018-10-23 平安科技(深圳)有限公司 基于Ansible工具自动搭建Kubernetes主节点的方法及终端设备
CN108809722B (zh) * 2018-06-13 2022-03-22 郑州云海信息技术有限公司 一种部署Kubernetes集群的方法、装置和存储介质
CN113064600B (zh) * 2021-04-20 2022-12-02 支付宝(杭州)信息技术有限公司 部署应用的方法和装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170124114A1 (en) * 2015-10-28 2017-05-04 Beijing Baidu Netcom Science And Technology, Ltd. Method and Device for Pulling Virtual Machine Mirror File
CN111736956A (zh) * 2020-06-29 2020-10-02 苏州浪潮智能科技有限公司 一种容器服务部署方法、装置、设备及可读存储介质
CN113568624A (zh) * 2021-09-22 2021-10-29 苏州浪潮智能科技有限公司 一种镜像的拉取方法、系统、计算机设备及可读存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: " p2p-How to Save the Achilles' Heel of k8s Image Distribution", BIG BROTHER FISH, 24 July 2018 (2018-07-24), XP093055546, Retrieved from the Internet <URL:www.cnblogs.com/goldenfish/p/9358908.html> [retrieved on 20230619] *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117033325A (zh) * 2023-10-08 2023-11-10 恒生电子股份有限公司 镜像文件的预热拉取方法及装置
CN117033325B (zh) * 2023-10-08 2023-12-26 恒生电子股份有限公司 镜像文件的预热拉取方法及装置
CN117270886A (zh) * 2023-11-17 2023-12-22 浪潮通用软件有限公司 一种微服务系统开发部署方法、设备及介质
CN117270886B (zh) * 2023-11-17 2024-02-06 浪潮通用软件有限公司 一种微服务系统开发部署方法、设备及介质
CN117369952A (zh) * 2023-12-08 2024-01-09 中电云计算技术有限公司 集群的处理方法、装置、设备及存储介质
CN117369952B (zh) * 2023-12-08 2024-03-15 中电云计算技术有限公司 集群的处理方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN113568624A (zh) 2021-10-29

Similar Documents

Publication Publication Date Title
WO2023045256A1 (zh) 一种镜像的拉取方法、系统、计算机设备及可读存储介质
CN108924217B (zh) 一种分布式云系统自动化部署方法
CN102521044B (zh) 一种基于消息中间件的分布式任务调度方法及系统
CN111290834B (zh) 一种基于云管理平台实现业务高可用的方法、装置及设备
RU2417416C2 (ru) Развертывание решений в ферме серверов
CN104935672B (zh) 负载均衡服务高可用实现方法和设备
US7937716B2 (en) Managing collections of appliances
US20090063650A1 (en) Managing Collections of Appliances
CN105607954A (zh) 一种有状态容器在线迁移的方法和装置
CN105959390A (zh) 微服务的统一管理系统及方法
CN108255592A (zh) 一种Quartz集群定时任务处理系统及方法
CN105630589A (zh) 分布式流程调度系统及流程调度、执行方法
CN112667362B (zh) Kubernetes上部署Kubernetes虚拟机集群的方法与系统
CN113742031A (zh) 节点状态信息获取方法、装置、电子设备及可读存储介质
CN111314212B (zh) 一种基于Netty与插件机制的API网关及控制方法
CN103973725A (zh) 一种分布式协同方法和协同器
CN105610947A (zh) 一种高可用分布式队列服务实现方法、装置和系统
WO2019076236A1 (zh) 数据同步方法、装置、超级控制器、域控制器及存储介质
CN113268337B (zh) Kubernetes集群中Pod调度的方法和系统
CN108667639A (zh) 一种私有云环境下的资源管理方法及管理服务器
CN108228393A (zh) 一种可扩展的大数据高可用的实现方法
CN109639773A (zh) 一种动态构建的分布式数据集群控制系统及其方法
CN113918281A (zh) 一种提升容器云资源扩展效率的方法
CN110764918A (zh) 一种容器集群中主节点管理方法
CN113835834A (zh) 一种基于k8s容器集群计算节点的扩容方法及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22871322

Country of ref document: EP

Kind code of ref document: A1