WO2021143590A1 - 一种分布式容器镜像构建调度系统及方法 - Google Patents
一种分布式容器镜像构建调度系统及方法 Download PDFInfo
- Publication number
- WO2021143590A1 WO2021143590A1 PCT/CN2021/070429 CN2021070429W WO2021143590A1 WO 2021143590 A1 WO2021143590 A1 WO 2021143590A1 CN 2021070429 W CN2021070429 W CN 2021070429W WO 2021143590 A1 WO2021143590 A1 WO 2021143590A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- node
- construction
- scheduler
- task
- build
- Prior art date
Links
- 238000010276 construction Methods 0.000 title claims abstract description 282
- 238000000034 method Methods 0.000 title claims abstract description 25
- 238000011161 development Methods 0.000 claims abstract description 25
- 230000008569 process Effects 0.000 claims abstract description 6
- 230000000903 blocking effect Effects 0.000 claims description 25
- 238000012360 testing method Methods 0.000 claims description 20
- 230000036541 health Effects 0.000 claims description 18
- 238000012544 monitoring process Methods 0.000 claims description 16
- 238000004891 communication Methods 0.000 claims description 15
- 238000001514 detection method Methods 0.000 claims description 7
- 238000005516 engineering process Methods 0.000 claims description 7
- 238000012423 maintenance Methods 0.000 claims description 6
- 238000003032 molecular docking Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 3
- 230000009471 action Effects 0.000 claims description 2
- 238000012216 screening Methods 0.000 claims description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 1
- 230000029087 digestion Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/547—Remote procedure calls [RPC]; Web services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/60—Software deployment
- G06F8/61—Installation
- G06F8/63—Image based installation; Cloning; Build to order
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5044—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45562—Creating, deleting, cloning virtual machine instances
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45587—Isolation or security of virtual machine instances
Definitions
- the invention relates to the technical field of cloud computing, in particular to a distributed container image construction scheduling system and method.
- Cloud computing services often have a wide variety of services. Different users need to build corresponding container images according to their own program dependencies. Image construction also needs to follow up and iteratively. Therefore, the task of building images will continue to appear over time. Build these images Sending tasks to the distributed mirroring construction system for digestion is also an urgent problem to be solved, especially any reasonable allocation of these tasks to the corresponding construction nodes to ensure the efficiency of mirroring construction needs to be optimized.
- the present invention provides a distributed container image construction scheduling system and method.
- the present invention provides a distributed container image construction scheduling system, including a management node and a construction node;
- the management node is used to manage the generation and scheduling of the construction task of the distributed container image;
- the management node includes a console and a scheduler;
- the console is used to obtain task parameters, create tasks and send them to the scheduler, and feed back the content of the mirror construction status feedback to the user;
- the scheduler is used to receive the task generation task message sent from the console; and send the task message to the corresponding construction node to run, and wait for and receive the task execution result on the corresponding construction node;
- the construction node is used to execute tasks issued by the management node, and each construction node includes an image builder;
- the image builder is used to receive and execute the task message sent from the scheduler, and return the execution result to the scheduler.
- the scheduler is a Web server; the image builder is a gRPC server.
- the scheduling system includes no less than 3 management nodes and multiple construction nodes.
- the embodiment of the present invention also provides a distributed container image construction scheduling method, including:
- the initialization of the management node and the construction node includes: setting the management node scheduler to the blocking monitoring state of the Web server, and setting the construction node image builder to the blocking monitoring state of the gRPC server;
- the console creates a mirroring construction task, and detects and generates task messages through a scheduler;
- the scheduler sends the task message to the best working node and generates the task object
- the best working node executes the task and sends the construction status feedback content to the management node;
- the management node informs the user of the content of the construction status feedback
- the management node and the construction node re-enter the initialization state.
- the setting the management node scheduler to the Web server blocking monitoring state includes:
- the scheduler is started on the management node, and the scheduler checks whether the scheduler configuration file exists, and if it exists, it reads the IP addresses and gRPC service port numbers of all construction nodes in the configuration file and saves these data in the cache; the scheduler
- the content of the configuration file is a list of the IP address and gRPC service port number of the construction node;
- the scheduler sets the number of parallel threads in the cache as workers; at the same time, sets the task list to be empty;
- the scheduler checks whether the TLS public key and private key used for Web and gRPC communication encryption on the current management node exist. If so, the scheduler calls the TLS public key and private key to open its own internal Web server with TLS authentication to block monitoring, and Open the console and wait for the user to create a task;
- the scheduler checks that the scheduler configuration file does not exist or the TLS public key and private key used for Web and gRPC communication encryption do not exist, it will record the failure reason and generate a message to the local, send the failure information to the operation and maintenance engineer, and schedule The program exits;
- the setting the construction node image builder to the gRPC server blocking monitoring state includes:
- the image builder starts the image builder on the build node, and the image builder tries to dock the container engine on the current build node. If the docking is successful, the image builder will set the build node state variable in the build node cache to true and the build node health variable to healthy , The variable of the number of mirroring builds is set to zero;
- the image builder detects whether the TLS public key and private key used for gRPC communication encryption on the current construction node exist. If they exist, the image builder calls the TLS public key and private key to open its own internal gRPC server with TLS authentication to block monitoring, and wait The scheduler obtains the information of its own node or issues tasks;
- console creates a mirroring construction task, and detects and generates task messages through the scheduler, including:
- the console serializes the task into a structure object and sends it to the scheduler;
- the scheduler checks whether the structure object is legal, including:
- the scheduler will continue to check whether the heterogeneous resource value in the structure object is true. If the heterogeneous resource value is set to another value or empty, it will generate an error message "Illegal construction of heterogeneous resource value" and send it to the console Notify the user and cancel the task generation, the scheduler enters the blocking state again and waits for the user to create a task from the console;
- heterogeneous resource value is set to true, continue to check whether the development framework in the structure object is empty, if it is empty, generate an error message "Building the image development framework is empty” and send it to the console to inform the user and cancel the task generation, the scheduler Enter the blocking state again and wait for the user to create a task from the console;
- test mode value in the structure object is true. If the test mode value is set to another value or empty, an error message of "Illegal test mode value" will be generated and sent to the console to inform the user and cancel the task generation. The scheduler enters the blocking state again and waits for the user to create a task from the console;
- test mode value is set to true, the task detection passes, and the scheduler generates a task message from the task content.
- construction node that determines the task message can be sent includes:
- the scheduler on the management node creates a table of active construction nodes in the cache, which records the content of the construction node’s IP, gRPC service port and communication status;
- All construction nodes in the scheduler cache communicate with the gRPC services of these construction nodes through the construction node IP and gRPC service port, and save each successfully communicated construction node to the active construction node table in the cache;
- the scheduler Determine whether the number of active construction nodes in the active construction node table is greater than workers, if it is greater than workers, do nothing, if it is less than workers, the scheduler will set workers as the number of active construction nodes, start multi-threading, and obtain construction nodes in the active construction node table in parallel Resource information and construction node status information;
- the construction node resource information includes construction node CPU number, construction node CPU frequency, total construction node memory, construction node memory idle amount, and system load;
- said construction node status information includes construction node State variables, build node health variables and mirror build number variables;
- the scheduler sequentially judges whether the state variable of the construction node in the active construction node table is true, if it is true, do nothing, if it is false, delete the construction node from the active construction node table;
- the scheduler sequentially judges whether the health variables of the construction nodes in the active construction node table are healthy. If it is set to be healthy, no action is taken. If it is not set to be healthy, the construction node will be constructed from the active Delete from the node table;
- the construction nodes reserved in the active construction node table are the construction nodes that can be sent by the task message.
- the screening of the best working node from the determined construction nodes includes:
- Perform Score is the performance score
- Task Score is the task score
- CpuWeight is the CPU weight
- LoadWeight is the load weight
- MemoryWeight is the memory weight
- TaskWeight is the task weight
- CpuCores is the number of CPUs to build the node
- CpuFrequency is the CPU frequency of the build node
- TotalMemory represents the total amount of build node memory
- FreeMemory represents the amount of free memory of the build node
- BuildNumber represents the variable of the number of mirroring builds
- SystemLoad represents the system load
- MaxCpuCores represents the number of CPUs corresponding to the build node with the largest number of CPUs among all the build nodes in the active build node table
- MaxCpuFrequency representss the CPU frequency corresponding to the build node with the highest CPU frequency among all the build nodes in the active build node table.
- MaxFreeMemory represents the memory free amount corresponding to the build node with the largest memory free amount among all the build nodes in the active build node table.
- MaxBuildNumber represents active The number of mirroring constructions corresponding to the largest number of mirroring constructions among all construction nodes in the construction node table,
- the scheduler records the sum of the performance score and the task score as the final total score of the construction node in the cache
- the scheduler selects the construction node with the highest total score and sets it as the best working node.
- the scheduler sends the task message to the best working node and generates the task object, including:
- the scheduler obtains the IP address and gRPC service port of the best working node
- the scheduler uses the TLS public key and private key of the management node to encrypt and send the task message to the best working node in gRPC mode according to the IP address and gRPC service port of the best working node; at the same time, the scheduler generates the task object asynchronously.
- the content of the task object includes task message, scheduling node and construction status. Among them, the scheduling node is the best working node, and the construction status is set to false;
- the scheduler enters the blocking state and waits for the best working node to send the construction status feedback content.
- the optimal working node executes the task and sends the construction status feedback content to the management node, including:
- the image builder of the best working node judges whether the construction node state variable of the best working node in the construction node cache is set to true and the construction node health variable is set to healthy; if the construction node state variable is false or the construction node health variable is If it is not healthy, use the two-variable content as the construction status feedback content and mark the error label, and then reduce the mirror construction number variable in the construction node cache by 1, and then send the construction status feedback content to the management node;
- the mirror builder adds 1 to the mirror build number variable in the build node cache, and then extracts the mirror name, GPU value used, development framework name used, development framework version, third-party dependencies, and heterogeneous resource values in the task message.
- the test mode value call the container engine to build the relevant image according to these parameters; after the image is built, the result of the container engine is used as the build status feedback content, and the completion tag is marked, and then the number of image builds in the build node cache Decrease the variable by 1, and then send the construction status feedback content to the management node;
- the error message of the container engine is used as the build status feedback content, and the error label is marked, and then the number of image builds in the build node cache is variable Subtract 1 and send the construction status feedback content to the management node.
- the management node notifies the user of the content of the construction status feedback, including:
- the scheduler on the management node After receiving the content of the construction status feedback, the scheduler on the management node first checks whether the construction status feedback contains an error tag. If the error tag is not included, the construction status in the task object is set to true, indicating that the construction is complete;
- the scheduler checks whether the error message in the construction status feedback is a problem with the health variables of the construction node, and if so, reschedules the task; if not, it sends the construction status feedback content to the console to inform the user of the error message and content.
- the present invention discloses the following technical effects:
- the present invention provides a distributed container image construction system, which uses a distributed system to improve the construction speed of container images, alleviates the problem of performance bottlenecks in the batch construction of container images realized by a single node and is difficult to maintain.
- the scheduler realizes the construction task Reasonable allocation and construction of node task load balance, allowing efficient use of construction node resources, ensuring the stability of the system, and improving the efficiency of container image construction.
- FIG. 1 is an architecture diagram of a distributed container image construction system according to an embodiment of the present invention
- FIG. 2 is a working flowchart of a management node in a distributed container image construction system according to an embodiment of the present invention
- Fig. 3 is a working flow chart of the best working node in the distributed container image construction system according to the embodiment of the present invention.
- the present invention provides a distributed container image construction scheduling system, referring to Figure 1, including a construction node and a management node, wherein the management node is used to manage the generation and scheduling of the construction task of the distributed container image, and the management node includes: Console and scheduler.
- the console is used to obtain the parameters required by the user, including the development dependency library, development framework, version, heterogeneous resource value, and test mode value, and send these parameter generation tasks to the scheduler, and feedback the image construction status The content is fed back to the user.
- the scheduler which is a web server, is responsible for receiving tasks from the console, filtering these tasks, and sending them to the corresponding construction node for operation, and waiting and receiving the task execution result on the corresponding construction node.
- the construction node is used to execute tasks issued by the management node, and each construction node includes a mirror image builder.
- the mirror builder is a gRPC server, which is used to receive tasks from the scheduler, execute these tasks, and return the result to the scheduler after the execution is completed.
- no less than 3 management nodes and multiple construction nodes are set, and the number of nodes is expanded and contracted as needed.
- the heartbeat detection technology and virtual VIP technology (Virtual IP) in the existing cloud computing technology can realize seamless switching of management nodes to ensure the efficient availability of the system. Avoid single points of failure.
- Another aspect of the present invention provides a distributed container image construction scheduling method including:
- the image builder starts the image builder on the build node.
- the image builder first tries to dock the container engine on the current build node. If the docking is successful, then the image builder sets the build node state variable in the build node cache to true and the build node health variable to Healthy, the number of mirroring builds variable is set to zero.
- the image builder detects whether the TLS public key and private key used for gRPC communication encryption on the current building node exist. If they exist, the image builder calls the public key and private key to open its own internal gRPC server with TLS authentication to block monitoring, and wait The scheduler obtains the information of its own node or issues tasks.
- the scheduler first checks whether the scheduler configuration file exists.
- the content of the configuration file is the IP address of the construction node and the list of gRPC service port numbers. If it exists, it reads the IP addresses and IP addresses of all construction nodes in the configuration file. gRPC service port number and save these data in the cache; then the scheduler sets the number of parallel threads in the cache, denoted as workers, and set the initial value to 5 (the production environment expands and shrinks as needed); at the same time, set the task list to null.
- the scheduler checks whether the TLS public key and private key used for Web and gRPC communication encryption on the current management node exist. If so, the scheduler calls the public key and private key to open its own internal Web server with TLS authentication to block monitoring, and Open the console and wait for the user to create a task.
- the scheduler checks that the scheduler configuration file does not exist or the TLS public key and private key used for Web and gRPC communication encryption do not exist, the failure reason is recorded and a message is generated to the local, and the failure information is sent to the operation and maintenance in the form of email Engineer, the scheduler program exits.
- the user creates a task on the management node through the console. Before creation, set the name of the built image, the GPU value used, the name of the development framework used, the development framework version, third-party dependencies, heterogeneous resource values, and test mode value writing. Import and create tasks.
- the console serializes the task into a structure object and sends it to the scheduler, and the scheduler will check whether the structure object is legal. Specifically:
- the scheduler will continue to check whether the heterogeneous resource value in the structure object is true. If the heterogeneous resource value is set to another value or empty, it will generate an error message "Illegal construction of heterogeneous resource value" and send it to the console Notify the user and cancel the task generation, the scheduler enters the blocking state again and waits for the user to create a task from the console;
- heterogeneous resource value is set to true, continue to check whether the development framework in the structure object is empty, if it is empty, generate an error message "Building the image development framework is empty” and send it to the console to inform the user and cancel the task generation, the scheduler Enter the blocking state again and wait for the user to create a task from the console;
- test mode value in the structure object is true. If the test mode value is set to another value or empty, an error message of "Illegal test mode value" will be generated and sent to the console to inform the user and cancel the task generation. The scheduler enters the blocking state again and waits for the user to create a task from the console;
- test mode value is set to true, the task detection is passed, the scheduler generates a task message from the task content, and the scheduler executes step (3).
- the scheduler on the management node first creates a table of active construction nodes in the cache.
- the table records the construction node IP and gRPC service ports and their communication status, and then asynchronously tries to communicate with the scheduler described in step (1). All construction nodes in the cache communicate with the gRPC service test of these construction nodes through IP and gRPC service ports, and save each construction node that can successfully communicate to the active construction node table in the cache.
- the scheduler will set workers as the number of active construction nodes, and then turn on multiple Thread, obtain the construction node resource information in the active construction node table and the construction node status information in the cache described in step (1) in parallel, where the construction node resource information includes the number of construction node CPUs, the construction node CPU frequency, and the construction node memory total Build node memory idle amount, system load; build node status information includes build node state variables, build node health variables, and mirror build number variables; among them, the number of parallel threads used to obtain active build node information in parallel is the number of cache parallel threads Workers (that is, use the number of threads of workers to consume all the construction nodes that can communicate and obtain construction node resource information and construction node status information), and store this information in the cache;
- the scheduler sequentially determines whether the construction node state variable in the construction node status information in the active construction node table is false, if it is true, do nothing, if it is false, it will The construction node is deleted from the active construction node table; after all construction nodes are judged, the scheduler will then determine in turn whether the construction node health variable in the active construction node table is healthy. If it is set to healthy, do nothing, if it is not set to If it is healthy, the construction node is deleted from the active construction node table. After all construction nodes are judged, the scheduler executes step (4).
- the scheduler first obtains the number of CPUs corresponding to the build node with the largest number of CPUs among all the build nodes in the active build node table, the CPU frequency corresponding to the build node with the highest CPU frequency among all the build nodes in the active build node table, The amount of free memory corresponding to the build node with the largest amount of free memory among all build nodes in the active build node table, and the number of mirror builds corresponding to the largest number of mirror builds among all build nodes in the active build node table are recorded as MaxCpuCores, MaxCpuFrequency, MaxFreeMemory, respectively , MaxBuildNumber;
- the number of build node CPUs, the main frequency of the build node, the total amount of build node memory, the amount of free memory of the build node, the number of mirror build variables, and the system load of the build node in the active build node table obtained in step (3) are recorded as CpuCores.
- Perform Score is the performance score
- Task Score is the task score
- CpuWeight is the CPU weight
- LoadWeight is the load weight
- MemoryWeight is the memory weight
- TaskWeight is the task weight
- CpuCores is the number of CPUs to build the node
- CpuFrequency is the CPU frequency of the build node
- TotalMemory represents the total amount of build node memory
- FreeMemory represents the amount of free memory of the build node
- BuildNumber represents the variable of the number of mirroring builds
- SystemLoad represents the system load
- MaxCpuCores represents the number of CPUs corresponding to the build node with the largest number of CPUs among all the build nodes in the active build node table
- MaxCpuFrequency representss the CPU frequency corresponding to the build node with the highest CPU frequency among all the build nodes in the active build node table.
- MaxFreeMemory represents the memory free amount corresponding to the build node with the largest amount of memory free among all the build nodes in the active build node table.
- MaxBuildNumber represents active The number of mirror builds corresponding to the build node with the largest number of mirror builds among all build nodes in the build node table.
- CpuWeight 2; the default value of LoadWeight is 2; the default value of MemoryWeight is 3; the default value of TaskWeight is 3.
- the scheduler After calculating the performance score and task score, the scheduler records the sum of the two as the final total score in the cache.
- the scheduler selects the construction node with the highest total score from the active construction node table and sets it as the best working node, and the scheduler executes step (5).
- the scheduler first obtains the IP address and gRPC service port of the best working node from the cache generated in step (1) according to the best working node set in step (4).
- the scheduler encrypts the task message generated by the scheduler in step (2) according to the IP address and gRPC service port of the best working node, using the TLS public key and private key of the management node in step (1) to send to the most
- the scheduler generates the task object asynchronously.
- the content of the task object includes task message, scheduling node and construction status.
- the scheduler sets the task message to the task message generated by the scheduler in step (2), and sets the scheduling node For the best working node in step (4), set the build status to false. Then the scheduler enters the blocking state and waits for the best working node to send the construction status feedback content.
- the mirror builder on the best working node executes step (6).
- the image builder After the image builder receives the task message from the management node, the image builder first determines whether the construction node state variable of the best working node in the construction node cache is set to true and the construction node health variable is set to healthy ; If the construction node state variable is false or the construction node health variable is unhealthy, the content of these two variables will be used as the construction state feedback content, and the error label will be marked, and then the mirror builder will execute step (7).
- the image builder adds 1 to the image build number variable in the build node cache, and then the image builder extracts the image name, the GPU value, the name of the development framework used, the version of the development framework, the third-party dependency, and the difference in the task message. Construct resource values and test mode values, and call the container engine to build related images based on these parameters.
- the image builder executes step (7).
- the image builder executes step (7).
- the mirror builder first reduces the number of mirror construction variables in the construction node cache by 1, and then the mirror builder sends the construction status feedback content to the management node.
- the scheduler on the management node After the scheduler on the management node receives the content of the build status feedback, it first checks whether the build status feedback contains an error label. If it does not contain an error label, it sets the build status in the task object in step (5) to true to indicate The construction is complete; otherwise, the scheduler checks whether the error message in the construction status feedback indicates that there is a problem with the health variables of the construction node. If so, the task will be skipped to step (3) for rescheduling; if not, the construction status feedback content will be sent to the control The station informs the user of the error message and content.
- the mirror builder on the construction node continues to jump to the blocking monitoring state described in step (1), waiting for new tasks to be issued, and the scheduler on the management node continues to jump to the blocking monitoring state described in step (1) Status, waiting for the generation of a new task.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
Claims (12)
- 一种分布式容器镜像构建调度系统,其特征在于,包括管理节点和构建节点;所述管理节点用于管理分布式容器镜像的构建任务的生成和调度;所述管理节点包括控制台和调度器;所述控制台用于获取任务参数,创建任务并送入调度器,以及将镜像构建状态反馈内容反馈给用户;所述调度器用于接收从控制台发送的任务生成任务消息;以及将任务消息发送至相应的构建节点上运行,等待并接收相应构建节点上的任务执行结果;所述构建节点用于执行管理节点下发的任务,每个构建节点包括镜像构建器;所述镜像构建器用于接收从调度器发送的任务消息并执行,以及将执行结果返回至调度器。
- 根据权利要求1所述的一种分布式容器镜像构建调度系统,其特征在于,所述调度器为Web服务器;所述镜像构建器为gRPC服务器。
- 根据权利要求1所述的一种分布式容器镜像构建调度系统,其特征在于,所述调度系统包括不少于3个的管理节点以及多个构建节点。
- 根据权利要求3所述的一种分布式容器镜像构建调度系统,其特征在于,同一时刻只有一个管理节点处于工作状态,通过心跳检测技术和VIP技术进行管理节点无缝切换。
- 一种分布式容器镜像构建调度方法,其特征在于,包括:权利要求1至4任意一项所述的管理节点和构建节点初始化,包括:将管理节点调度器置为Web服务器阻塞监听状态,以及将构建节点镜像构建器置为gRPC服务器阻塞监听状态;权利要求1至4任意一项所述的控制台创建镜像构建任务,并通过调度器进行检测生成任务消息;确定任务消息能够发送的构建节点;从所确定的构建节点中筛选最佳工作节点;调度器将任务消息发送至最佳工作节点,并生成任务对象;最佳工作节点执行任务,并将构建状态反馈内容发送至管理节点;管理节点将构建状态反馈内容告知用户;管理节点和构建节点重新进入初始化状态。
- 根据权利要求5所述的一种分布式容器镜像构建调度方法,其特征在于,所述将管理节点调度器置为Web服务器阻塞监听状态,包括:管理节点上启动调度器,调度器检查调度器配置文件是否存在,若存在,则读取配置文件中所有构建节点的IP地址和gRPC服务端口号并将这些数据保存至缓存中;所述调度器配置文件内容为构建节点的IP地址和gRPC服务端口号列表;调度器设置缓存中的并行线程数记为workers;同时,设置任务列表为空;调度器检查当前管理节点上用于Web和gRPC通信加密的TLS公钥和私钥是否存在,若存在,调度器调用该TLS公钥和私钥开启自身内部带TLS认证的Web服务器阻塞监听,并开启控制台等待用户创建任务;若调度器检查调度器配置文件不存在或用于Web和gRPC通信加密的TLS公钥和私钥不存在,则将失败原因记录并生成消息记录到本地,将故障信息发送给运维工程师,调度器程序退出;所述将构建节点镜像构建器置为gRPC服务器阻塞监听状态包括:构建节点上启动镜像构建器,镜像构建器尝试对接当前构建节点上的容器引擎,若对接成功,则镜像构建器将构建节点缓存中的构建节点状态变量设置为真,构建节点健康变量设置为健康,镜像构建数变量设置为零;镜像构建器检测当前构建节点上用于gRPC通信加密的TLS公钥和私钥是否存在,若存在,镜像构建器调用该TLS公钥和私钥开启自身内部带TLS认证的gRPC服务器阻塞监听,等待调度器获取自身节点的信息或下发任务;若对接容器引擎失败或用于gRPC通信加密的TLS公钥和私钥不存在,则将失败原因记录并生成消息记录到本地,将故障信息发送给运维工程师,镜像构建器程序退出。
- 根据权利要求5所述的一种分布式容器镜像构建调度方法,其特征在于,所述控制台创建镜像构建任务,并通过调度器进行检测生成任务消息,包括:管理节点上通过控制台创建任务,创建前设置构建的镜像名称、使用GPU值、使用的开发框架名称、开发框架版本、第三方依赖、异构资源值和测试模式值写入并创建任务;控制台将所述任务序列化为结构对象送入调度器;调度器检测结构对象是否合法性,包括:检测镜像名称是否为空,若为空,则生成“构建镜像名为空”的错误消息送至控制台告知用户并取消任务生成,调度器再次进入阻塞状态等待用户从控制台创建任务;若镜像名称非空,则调度器继续检测结构对象中异构资源值是否为真,若异构资源值设置为其他值或空则生成“构建异构资源值非法”的错误消息送至控制台告知用户并取消任务生成,调度器再次进入阻塞状态等待用户从控制台创建任务;若异构资源值设置为真,则继续检测结构对象中开发框架是否为空,若为空则生成“构建镜像开发框架为空”的错误消息送至控制台告知用户并取消任务生成,调度器再次进入阻塞状态等待用户从控制台创建任务;若开发框架非空则继续检测结构对象中测试模式值是否为真,若测试模式值设置为其他值或空则生成“测试模式值非法”的错误消息送至控制台告知用户并取消任务生成,调度器再次进入阻塞状态等待用户从控制台创建任务;若测试模式值设置为真,则任务检测通过,调度器将任务内容生成任务消息。
- 根据权利要求5所述的一种分布式容器镜像构建调度方法,其特征在于,所述确定任务消息能够发送的构建节点,包括:管理节点上的调度器在缓存中创建活跃构建节点表,该表记录内容为构建节点IP和gRPC服务端口及通信状态;调度器缓存中的所有构建节点通过构建节点IP和gRPC服务端口与这些构建节点的gRPC服务测试通信,并将每个成功通信的构建节点保存至缓存中的活跃构建节点表;判断活跃构建节点表中的活跃构建节点数是否大于workers,若大于workers不作任何操作,若小于workers则调度器将workers设置为活跃构 建节点数,开启多线程,并行获取活跃构建节点表中构建节点资源信息和构建节点状态信息;所述构建节点资源信息包括构建节点CPU数量、构建节点CPU主频、构建节点内存总量、构建节点内存空闲量、系统负载;所述构建节点状态信息包括构建节点状态变量、构建节点健康变量和镜像构建数变量;调度器依次判断活跃构建节点表中构建节点的状态变量是否为真,若为真不做任何操作,若为假则将该构建节点从活跃构建节点表中删除;所有构建节点判断完毕后,调度器依次判断活跃构建节点表中构建节点的健康变量是否为健康,若被设置为健康不做任何操作,若没有被设置为健康,则将该构建节点从活跃构建节点表中删除;所有构建节点判断完毕后,活跃构建节点表中保留的构建节点即为任务消息能够发送的构建节点。
- 根据权利要求8所述的一种分布式容器镜像构建调度方法,其特征在于,所述从所确定的构建节点中筛选最佳工作节点,包括:计算活跃构建节点表中所有构建节点的性能得分和任务得分:其中,PerformScore为性能得分,TaskScore为任务得分,CpuWeight为CPU权重,LoadWeight为负载权重,MemoryWeight为内存权重,TaskWeight为任务权重,CpuCores表示构建节点CPU数量,CpuFrequency表示构建节点CPU主频,TotalMemory表示构建节点内存总量,FreeMemory表示构建节点内存空闲量,BuildNumber表示镜像构建数变量,SystemLoad表示系统负载,MaxCpuCores表示活跃构建节点表中所有构建节点里CPU数量最多的构建节点对应的CPU数量,MaxCpuFrequency表示活跃构建节点表中所有构建节点里CPU主频最高 的构建节点对应的CPU主频,MaxFreeMemory表示活跃构建节点表中所有构建节点里内存空闲量最大的构建节点对应的内存空闲量,MaxBuildNumber表示活跃构建节点表中所有构建节点里镜像构建数最多的对应的镜像构建数;调度器将性能得分和任务得分之和作为该构建节点的最终总分记录至缓存中;调度器选择总分最高的构建节点设置为最佳工作节点。
- 根据权利要求9所述的一种分布式容器镜像构建调度方法,其特征在于,所述调度器将任务消息发送至最佳工作节点,并生成任务对象,包括:调度器获取最佳工作节点的IP地址和gRPC服务端口;调度器将任务消息根据最佳工作节点的IP地址和gRPC服务端口,采用管理节点的TLS公钥和私钥以gRPC方式加密发送至最佳工作节点;同时,调度器异步生成任务对象,所述任务对象的内容包括任务消息、调度节点和构建状态,其中,调度节点为最佳工作节点,设置构建状态为假;调度器进入阻塞状态,等待最佳工作节点发送构建状态反馈内容。
- 根据权利要求10所述的一种分布式容器镜像构建调度方法,其特征在于,所述最佳工作节点执行任务,并将构建状态反馈内容发送至管理节点,包括:最佳工作节点的镜像构建器判断构建节点缓存中该最佳工作节点的构建节点状态变量是否设置为真且构建节点健康变量是否设置为健康;若构建节点状态变量为假或构建节点健康变量为非健康,则将二变量内容作为构建状态反馈内容,并打上错误标签,然后将所述构建节点缓存中的镜像构建数变量减1,再将构建状态反馈内容发送至管理节点;否则镜像构建器将所述构建节点缓存中的镜像构建数变量加1,再提取任务消息中的镜像名称、使用GPU值、使用的开发框架名称、开发框架版本、第三方依赖、异构资源值和测试模式值,调用容器引擎根据这些参数构建相关镜像;镜像构建完毕后,将容器引擎的构建完毕的结果作为构建状态反馈内容,打上完成标签,然后将所述构建节点缓存中的镜像构建数变量减1,再将构建状态反馈内容发送至管理节点;在镜像构建过程中,若容器引擎构建该镜像出现错误并终止镜像构建,则将容器引擎的报错信息作为构建状态反馈内容,并打上错误标签,然后将所述构建节点缓存中的镜像构建数变量减1,再将构建状态反馈内容发送至管理节点。
- 根据权利要求11所述的一种分布式容器镜像构建调度方法,其特征在于,所述管理节点将构建状态反馈内容告知用户,包括:管理节点上的调度器收到构建状态反馈内容后,首先检查构建状态反馈中是否包含错误标签,若不包含错误标签,则将任务对象中的构建状态设置为真,表示构建完成;否则调度器检查构建状态反馈中错误信息是否为构建节点健康变量出现问题,若是,则将该任务重新调度;若否,则将构建状态反馈内容送至控制台告知用户错误消息和内容。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/597,890 US11656902B2 (en) | 2020-01-14 | 2021-01-06 | Distributed container image construction scheduling system and method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010034706.5A CN111240806B (zh) | 2020-01-14 | 2020-01-14 | 一种分布式容器镜像构建调度方法 |
CN202010034706.5 | 2020-01-14 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021143590A1 true WO2021143590A1 (zh) | 2021-07-22 |
Family
ID=70876163
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/070429 WO2021143590A1 (zh) | 2020-01-14 | 2021-01-06 | 一种分布式容器镜像构建调度系统及方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US11656902B2 (zh) |
CN (1) | CN111240806B (zh) |
WO (1) | WO2021143590A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113760453A (zh) * | 2021-08-04 | 2021-12-07 | 南方电网科学研究院有限责任公司 | 容器镜像分发系统及容器镜像推送、拉取和删除方法 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111240806B (zh) * | 2020-01-14 | 2022-04-01 | 南京邮电大学 | 一种分布式容器镜像构建调度方法 |
CN112114834A (zh) * | 2020-09-27 | 2020-12-22 | 北京深之度科技有限公司 | 一种系统镜像构建方法、系统、计算设备及存储介质 |
CN113377665B (zh) * | 2021-06-25 | 2024-08-06 | 北京百度网讯科技有限公司 | 基于容器技术的测试方法、装置、电子设备及存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106790483A (zh) * | 2016-12-13 | 2017-05-31 | 武汉邮电科学研究院 | 基于容器技术的Hadoop集群系统及快速构建方法 |
CN108737548A (zh) * | 2018-05-24 | 2018-11-02 | 南京邮电大学 | 分布式web微服务容器集群架构系统及其实现方法 |
CN110275761A (zh) * | 2018-03-16 | 2019-09-24 | 华为技术有限公司 | 调度方法、装置和主节点 |
US20190347121A1 (en) * | 2018-05-11 | 2019-11-14 | International Business Machines Corporation | Distributed container image repository service |
CN110647580A (zh) * | 2019-09-05 | 2020-01-03 | 南京邮电大学 | 分布式容器集群镜像管理主节点、从节点、系统及方法 |
CN111240806A (zh) * | 2020-01-14 | 2020-06-05 | 南京邮电大学 | 一种分布式容器镜像构建调度系统及方法 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5983386B2 (ja) * | 2012-12-14 | 2016-08-31 | 富士通株式会社 | コンピュータ、通信制御方法および通信制御プログラム |
US9715416B2 (en) * | 2015-06-03 | 2017-07-25 | Intel Corporation | Adaptive queued locking for control of speculative execution |
CN105245373B (zh) * | 2015-10-12 | 2017-08-04 | 天津市普迅电力信息技术有限公司 | 一种容器云平台系统的搭建及运行方法 |
CN105245369B (zh) * | 2015-10-12 | 2017-05-17 | 天津市普迅电力信息技术有限公司 | 一种支持多传输协议的组件发布容器方法 |
CN106453492B (zh) * | 2016-08-30 | 2019-05-14 | 浙江大学 | 基于模糊模式识别的Docker容器云平台下的容器调度方法 |
CN108259418B (zh) * | 2016-12-28 | 2021-08-24 | 中移(苏州)软件技术有限公司 | 一种函数托管服务的系统和方法 |
CN107797767B (zh) * | 2017-09-30 | 2019-07-26 | 南京卓盛云信息科技有限公司 | 一种基于容器技术部署分布式存储系统及其存储方法 |
US10977066B2 (en) * | 2018-04-06 | 2021-04-13 | Red Hat, Inc. | Virtual machine to container conversion and optimization |
WO2019134323A1 (zh) * | 2018-05-10 | 2019-07-11 | 深圳晶泰科技有限公司 | 科学计算流程管理系统 |
WO2020014712A1 (en) * | 2018-07-13 | 2020-01-16 | Pubwise, LLLP | Digital advertising platform with demand path optimization |
CN112408006A (zh) | 2020-12-03 | 2021-02-26 | 安化孚达泰科智能装备有限公司 | 一种板链烘干机小车移动式自动布料机及其方法 |
-
2020
- 2020-01-14 CN CN202010034706.5A patent/CN111240806B/zh active Active
-
2021
- 2021-01-06 WO PCT/CN2021/070429 patent/WO2021143590A1/zh active Application Filing
- 2021-01-06 US US17/597,890 patent/US11656902B2/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106790483A (zh) * | 2016-12-13 | 2017-05-31 | 武汉邮电科学研究院 | 基于容器技术的Hadoop集群系统及快速构建方法 |
CN110275761A (zh) * | 2018-03-16 | 2019-09-24 | 华为技术有限公司 | 调度方法、装置和主节点 |
US20190347121A1 (en) * | 2018-05-11 | 2019-11-14 | International Business Machines Corporation | Distributed container image repository service |
CN108737548A (zh) * | 2018-05-24 | 2018-11-02 | 南京邮电大学 | 分布式web微服务容器集群架构系统及其实现方法 |
CN110647580A (zh) * | 2019-09-05 | 2020-01-03 | 南京邮电大学 | 分布式容器集群镜像管理主节点、从节点、系统及方法 |
CN111240806A (zh) * | 2020-01-14 | 2020-06-05 | 南京邮电大学 | 一种分布式容器镜像构建调度系统及方法 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113760453A (zh) * | 2021-08-04 | 2021-12-07 | 南方电网科学研究院有限责任公司 | 容器镜像分发系统及容器镜像推送、拉取和删除方法 |
CN113760453B (zh) * | 2021-08-04 | 2024-05-28 | 南方电网科学研究院有限责任公司 | 容器镜像分发系统及容器镜像推送、拉取和删除方法 |
Also Published As
Publication number | Publication date |
---|---|
CN111240806B (zh) | 2022-04-01 |
US20220171652A1 (en) | 2022-06-02 |
US11656902B2 (en) | 2023-05-23 |
CN111240806A (zh) | 2020-06-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021143590A1 (zh) | 一种分布式容器镜像构建调度系统及方法 | |
CN111796908B (zh) | 一种资源自动弹性伸缩的系统、方法及云平台 | |
CN103442049B (zh) | 一种面向构件的混合型云操作系统体系结构及其通信方法 | |
Castilhos et al. | Distributed resource management in NoC-based MPSoCs with dynamic cluster sizes | |
WO2018006676A1 (zh) | 加速资源处理方法、装置及网络功能虚拟化系统 | |
CN110888719A (zh) | 一种基于web服务的分布式任务调度系统及方法 | |
CN114138434B (zh) | 一种大数据任务调度系统 | |
WO2013104217A1 (zh) | 基于云基础设施的针对应用系统维护部署的管理系统和方法 | |
CN105786603B (zh) | 一种基于分布式的高并发业务处理系统及方法 | |
CN102081554A (zh) | 云计算操作系统及其内核控制系统及方法 | |
CN113382077B (zh) | 微服务调度方法、装置、计算机设备和存储介质 | |
US10498817B1 (en) | Performance tuning in distributed computing systems | |
US12026536B2 (en) | Rightsizing virtual machine deployments in a cloud computing environment | |
CN117608760A (zh) | 应用于Kubernetes的云上应用混合部署方法 | |
CN117149382A (zh) | 虚拟机调度方法、装置、计算机设备和存储介质 | |
US11366648B2 (en) | Compiling monoglot function compositions into a single entity | |
JP4887223B2 (ja) | 情報処理システム、情報処理方法、およびプログラム | |
Li et al. | Rpbg: Intelligent orchestration strategy of heterogeneous docker cluster based on graph theory | |
US20240354149A1 (en) | Rightsizing virtual machine deployments in a cloud computing environment | |
US20240354150A1 (en) | Rightsizing virtual machine deployments in a cloud computing environment | |
Selvi et al. | Scheduling In Virtualized Grid Environment Using Hybrid Approach | |
CN109104497A (zh) | 一种基于云平台的业务处理方法及装置 | |
Daaif et al. | An efficient multi-agent computationnal model for massively distribution of independent and heterogeneous tasks | |
Tzolas et al. | Virtual job management and scheduling for media services | |
Huang et al. | A Task-Oriented General-Purpose Distributed Computing System Based on CLTS Scheduling Algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21740891 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21740891 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21740891 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 15/03/2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21740891 Country of ref document: EP Kind code of ref document: A1 |