CN113407355A - Method, system, equipment and storage medium for process cooperation in cluster - Google Patents

Method, system, equipment and storage medium for process cooperation in cluster Download PDF

Info

Publication number
CN113407355A
CN113407355A CN202110952191.1A CN202110952191A CN113407355A CN 113407355 A CN113407355 A CN 113407355A CN 202110952191 A CN202110952191 A CN 202110952191A CN 113407355 A CN113407355 A CN 113407355A
Authority
CN
China
Prior art keywords
host
cluster
broadcast
resource consumption
resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110952191.1A
Other languages
Chinese (zh)
Inventor
张宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202110952191.1A priority Critical patent/CN113407355A/en
Publication of CN113407355A publication Critical patent/CN113407355A/en
Priority to PCT/CN2022/078097 priority patent/WO2023019904A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • G06F9/5088Techniques for rebalancing the load in a distributed system involving task migration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Hardware Redundancy (AREA)

Abstract

The invention provides a method, a system, equipment and a storage medium for process cooperation in a cluster, wherein the method comprises the following steps: all the hosts are configured in the same network segment, the communication between each host and other hosts is established, and the resource configuration condition of each host is monitored; in response to the existence of a process with the resource configuration of the first host exceeding a threshold, calculating the resource consumption of each process in the first host, and determining the process with the maximum resource consumption; initiating a migration broadcast to a cluster, and judging whether a second host exists in the cluster to respond to the broadcast; and responding to the broadcast of the second host existing in the cluster, and sending the process with the maximum resource consumption to the second host. According to the invention, by setting the resource allocation threshold, the server reaching the threshold can be immediately processed, the problem of server downtime and even avalanche can not be caused, and the usability of the server is improved; and the broadcast communication is carried out through the network built between the hosts, so that the interaction frequency is reduced.

Description

Method, system, equipment and storage medium for process cooperation in cluster
Technical Field
The present invention relates to the field of distributed operating systems, and more particularly, to a method, system, device, and storage medium for process collaboration in a cluster.
Background
Operating systems currently run on a separate host. Management is performed for the current host operating system process. The independent host can uniformly manage different processes according to the process state of the current operating system and the system resource condition. In the process of scheduling, the process mainly performs uniform resource allocation for the process of the local machine, and allocates process resources of different CPUs. The scheduling of the process is also based on the scheduling of the physical architecture for uniform resource allocation. When the resources of the system are exhausted, the operating system selects a process consuming a large amount of system resources to kill the process in order to ensure that the operating system can operate and work normally. Therefore, under the premise of limited resources, the single operating system cannot allocate and schedule resources any more and can only select to kill part of processes under the condition of resource exhaustion.
Therefore, in response to the above problems, there are many different solutions within the industry, such as: the method mainly adopts the implementation scheme that the whole application is evaluated, the peak performance consumption of the service is calculated, the whole evaluation is carried out on the application and the framework of the operating system after the calculation is finished, the application is guaranteed not to be killed in the peak time, the normal operation can be carried out, and once the application exceeds the evaluation value of the evaluation, the whole application still has the possibility of being killed.
The main existing solution is an operating system high availability scheme, that is, a plurality of operating systems form an operating system cluster, the operating system cluster is configured as a high availability cluster, the operating mode is a host elected inside the plurality of operating systems, the host is responsible for coordinating the operating mode inside the whole cluster, and first, the hosts communicate with each other by using a heartbeat mechanism, so as to ensure that whether each host monitors each other to survive. Once a certain host is offline or down, the host which cannot normally work is offline. And simultaneously, the host transfers the service operated by the offline server to a new host for operation. The resources among the servers are mainly: 1. file sharing: adopting NFS file system format, based on network file system; 2. computer network resources: and a plurality of Linux hosts mutually interact in the local subnet together. Thus, in the above manner, the availability of the entire cluster is ensured.
Through the technology, elections are carried out by the hosts in the whole cluster, and the hosts have absolute control rights. Therefore, the following problems arise: 1. if the host fails. The election will be made by the other machine. If other machines cannot communicate with each other due to host failure at this time, a "split" situation occurs, i.e., one host stops, forming a plurality of clusters, each cluster having one host, but normally, the remaining servers should elect a new host. Once this happens, the whole cluster will be down; 2. if a host fails due to overhigh load, a new host can take over the tasks of the original host with overhigh load, the load of the new host can be suddenly increased, and the probability can also fail, so that the tasks are migrated to a third host, and the failure of the hosts one by one can cause the failure of the whole cluster, which is generally called as an avalanche problem; 3. relying too much on the heartbeat mechanism, which can only ensure that the host is still alive, but cannot ensure that the service is still working properly, for example: if a process is deadlocked, the server can still work normally, but the process cannot work normally, and although the whole cluster still lives, the service is stopped, and the process is completely out of order.
Disclosure of Invention
In view of this, an object of the embodiments of the present invention is to provide a method, a system, a computer device, and a computer-readable storage medium for process collaboration in a cluster, in which the whole operating system cluster is regarded as a whole, and no host needs to be elected or assigned, so as to reduce the risk of "brain split"; by setting the threshold value, the process with the highest resource consumption in the server reaching the threshold value is transferred to other hosts for operation, and when the process cannot be operated by any other host, the resource access amount of the process is limited, the problem of shutdown or even avalanche of the server is avoided, and the availability of the server is improved; interaction is carried out in a broadcasting mode, and interaction frequency is reduced.
Based on the above object, an aspect of the embodiments of the present invention provides a method for process cooperation in a cluster, including the following steps: all the hosts are configured in the same network segment, the communication between each host and other hosts is established, and the resource configuration condition of each host is monitored; in response to the existence of a process with the resource configuration of the first host exceeding a threshold, calculating the resource consumption of each process in the first host, and determining the process with the maximum resource consumption; initiating a migration broadcast to a cluster, and judging whether a second host exists in the cluster to respond to the broadcast; and responding to the broadcast of the second host existing in the cluster, and sending the process with the maximum resource consumption to the second host.
In some embodiments, the method further comprises: in response to no second host in the cluster responding to the broadcast, limiting resource usage by a process in the first host having a greatest resource consumption.
In some embodiments, the sending the process with the largest resource consumption amount to the second host includes: and responding to the broadcast responded by a plurality of second hosts, acquiring the current resource utilization rate of each second host, and sending the process with the maximum resource consumption to the second host with the lowest current resource utilization rate.
In some embodiments, the sending the process with the largest resource consumption amount to the second host includes: and suspending the process with the maximum resource consumption, compressing the memory state of the process, and sending the compressed process memory mirror image and the process executable file to the second host.
In some embodiments, the monitoring the resource configuration of each host includes: acquiring the CPU utilization rate, the memory utilization rate and the file system utilization rate of the host, carrying out weighted calculation according to the preset CPU utilization rate weight, the preset memory utilization rate weight and the preset file system utilization rate weight to obtain the resource configuration of the host, and comparing the resource configuration with the threshold.
In some embodiments, the method further comprises: and setting a standby host in the cluster, configuring the host which is currently used into a host list, and setting the standby host into a dynamic increasing mode.
In some embodiments, the method further comprises: and responding to the broadcast that no second host exists in the cluster, and sending the process with the maximum resource consumption to the standby host for execution.
In another aspect of the embodiments of the present invention, a system for process cooperation in a cluster is provided, including: the configuration module is configured to configure all the hosts to the same network segment, establish communication between each host and other hosts, and monitor the resource configuration condition of each host; the computing module is configured to respond to the condition that the resource configuration of a first host exceeds a threshold value, compute the resource consumption of each process in the first host and determine the process with the maximum resource consumption; the broadcasting module is configured to initiate a migration broadcast to a cluster and judge whether a second host exists in the cluster to respond to the broadcast; and a sending module configured to send, in response to a second host existing in the cluster responding to the broadcast, a process with the largest resource consumption to the second host.
In another aspect of the embodiments of the present invention, there is also provided a computer device, including: at least one processor; and a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of the method as above.
In a further aspect of the embodiments of the present invention, a computer-readable storage medium is also provided, in which a computer program for implementing the above method steps is stored when the computer program is executed by a processor.
The invention has the following beneficial technical effects: the whole operating system cluster is regarded as a whole, and a host does not need to be elected or assigned, so that the risk of brain split is reduced; by setting the resource configuration threshold, the process with the highest resource consumption in the server reaching the threshold is transferred to other hosts for operation, and when the process cannot be operated by any other host, the resource access amount of the process is limited, the problem of shutdown or even avalanche of the server is avoided, and the availability of the server is improved; interaction is carried out in a broadcasting mode, and interaction frequency is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
FIG. 1 is a diagram illustrating an embodiment of a method for process cooperation in a cluster according to the present invention;
FIG. 2 is a diagram illustrating an embodiment of a system for process collaboration in a cluster according to the present invention;
FIG. 3 is a schematic hardware structure diagram of an embodiment of a computer device for process cooperation in a cluster according to the present invention;
FIG. 4 is a schematic diagram of an embodiment of a computer storage medium for process cooperation in a cluster.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
In a first aspect of the embodiments of the present invention, an embodiment of a method for process cooperation in a cluster is provided. Fig. 1 is a schematic diagram illustrating an embodiment of a method for process cooperation in a cluster according to the present invention. As shown in fig. 1, the embodiment of the present invention includes the following steps:
s1, configuring all hosts to the same network segment, establishing communication between each host and other hosts, and monitoring resource configuration condition of each host;
s2, responding to the fact that the resource configuration of the first host exceeds a threshold value, calculating the resource consumption of each process in the first host, and determining the process with the maximum resource consumption;
s3, initiating a migration broadcast to the cluster, and judging whether a second host computer exists in the cluster to respond to the broadcast;
s4, responding to the broadcast of the second host existing in the cluster, and sending the process with the maximum resource consumption to the second host; and
and S5, configuring the cluster working mode into three working modes of dynamic detection, a fixed host list and a main/standby host list.
All the hosts are configured in the same network segment, the communication between each host and other hosts is established, and the resource configuration condition of each host is monitored.
All the hosts are configured into a cluster by building a local area network. The network segments of the hosts can be configured to be the same local area network, for example, after the cluster is configured to 192.168.11.0/24 network segments, the hosts configured to the network segments are all cluster hosts. The hosts can ping each other. Configure by modifying the configuration/etc/rpcha. The configuration contents include: cluster network segment address, cluster working mode and cluster resource allocation threshold. The cluster network segment address configuration mode is network segment address + subnet mask; the cluster working mode can be configured into three working modes of dynamic detection/fixed host list/main/standby host list; the resource configuration threshold of the cluster is the lowest resource configuration value of the operating system for process migration.
The servers are configured in the same network segment, and the clusters are defined in a network segment mode, so that the hosts in the network segment can communicate with each other. And simultaneously, main and auxiliary machines are not designated, the status of each machine is equal, and all hosts in the network segment are the machines of the cluster.
In the context of the present application, the network IP is externally consistent: after configuration is completed, an out-of-cluster IP may be configured. The IP address points to the cluster. The host of a particular cluster response is determined internally by the cluster. For example, an IP points to an a host in cluster ABC, which IP may point to other hosts immediately after host a goes down. Therefore, the appearance is consistent, and all the IP accesses are normal. The user can not sense that the switching of the host computer is generated inside in the using process, and the user satisfaction degree is improved. Further, the IP may be organized in the form of a floating IP.
There are three cluster operating modes, and each operating mode determines the interaction mode of the servers in the cluster. The three working modes are respectively as follows:
dynamic monitoring: dynamic detection is the dynamic discovery of hosts within a clustered network. Because only the hosts configured in the network segment are regarded as nodes in the cluster, the main implementation scheme of dynamic detection is that each time a new host joins the cluster, the new host initiates broadcasting into the local area network. The broadcast content is the address of the new host, and the name of the new host. And after receiving the broadcast message sent by the new host in other existing clusters, automatically recording the basic information of the new host node. For example, there are 10 hosts in the current cluster, and when a host node needs to be added to the cluster, only the server needs to be configured into the corresponding network, and after configuration is completed, the server is added to the cluster through an initialization command (rpcha-init) of the cluster. The command initiates a local area network to broadcast and inform other clusters, each host records ip and name information of a newly added node, and the clusters become 11 hosts.
List of fixed hosts: fixed host lists are the reverse of dynamic detection. The fixed host list cannot dynamically configure the host nodes, and only some nodes can be fixedly configured as host list nodes. If the host node needs to be changed, such as online or offline, the configuration file needs to be modified uniformly. The configuration mode is relatively fixed in operation, cannot dynamically increase or decrease the host, but has stronger control granularity. The method is suitable for a relatively clear and stable host cluster scene.
List of master and slave hosts: the above two configurations are combined. The working mode of the main and standby host computer lists is a mechanism combining main and standby operation. The host computer adopts a basic configuration mode, the standby computer adopts a dynamic increasing mode, for example, a certain cluster adopts 5 servers as the host computer to work, and the general condition can be met. The standby machines are in the same network segment but are not configured in the host list, so the standby machines adopt a dynamic increasing mode. When a new host needs to be added to the server, the server is firstly configured into the network segment and then initialized by using an initialization command in accordance with the dynamic detection scheme.
In some embodiments, the monitoring the resource configuration of each host includes: acquiring the CPU utilization rate, the memory utilization rate and the file system utilization rate of the host, carrying out weighted calculation according to the preset CPU utilization rate weight, the preset memory utilization rate weight and the preset file system utilization rate weight to obtain the resource configuration of the host, and comparing the resource configuration with the threshold.
When the resource allocation of a host reaches a threshold, new processes cannot be executed continuously. The threshold is configured as a percentage, e.g., 80% configured as 80. The threshold value of the resource allocation is calculated by adopting a weighted calculation weight value mode. It is possible, for example: the CPU utilization weight is 5, the memory utilization weight is 3, the file system utilization weight is 2, the CPU utilization is 100%, the memory utilization is 60%, and the file system utilization is 90%, the weight (5 × 100+3 × 60+2 × 90)/10 =86 is calculated. At this time, process migration is required. The most computationally expensive process may continue to be computed using the algorithm. The weighted consumption of each process is calculated. And selecting the highest process for process migration.
In response to the existence of a process having a resource configuration of the first host that exceeds a threshold, calculating a resource consumption amount of each process in the first host, and determining a process having a maximum resource consumption amount.
And initiating a migration broadcast to the cluster, and judging whether a second host exists in the cluster or not to respond to the broadcast. And after processes needing migration are elected, broadcasting is initiated into the cluster, and if the server receiving the broadcasting can process new processes, the broadcasting is responded.
And responding to the broadcast of the second host in the cluster, and sending the process with the maximum resource consumption to the second host.
In some embodiments, the sending the process with the largest resource consumption amount to the second host includes: and responding to the broadcast responded by a plurality of second hosts, acquiring the current resource utilization rate of each second host, and sending the process with the maximum resource consumption to the second host with the lowest current resource utilization rate.
In some embodiments, the sending the process with the largest resource consumption amount to the second host includes: and suspending the process with the maximum resource consumption, compressing the memory state of the process, and sending the compressed process memory mirror image and the process executable file to the second host. After receiving the response, the host initiating the broadcast suspends the process, compresses the memory state, sends the compressed process memory mirror image and the process executable file to the new host in a mode of copying the cluster local area network file, and the new host continues to run the process.
In some embodiments, the method further comprises: in response to no second host in the cluster responding to the broadcast, limiting resource usage by a process in the first host having a greatest resource consumption. If no host is capable of running the process. Then the resource limit strategy (cgroup) is used to limit the resource usage of the current process. The CPU utilization and the memory file system utilization are limited, and a warning is given to a user. And warning that the cluster resources are currently in a low-speed running state, and waiting for human intervention to modify the cluster resources or uniformly configure the processes.
In some embodiments, the method further comprises: and setting a standby host in the cluster, configuring the host which is currently used into a host list, and setting the standby host into a dynamic increasing mode.
In some embodiments, the method further comprises: and responding to the broadcast that no second host exists in the cluster, and sending the process with the maximum resource consumption to the standby host for execution.
The invention considers the whole operating system cluster as a whole and does not need to elect a host or distribute tasks by the host. Each operating system, by default, runs on its own physical device while performing tasks. Only when a certain condition is reached, the server can select the process which cannot be continuously executed by the local machine to pause, stop the current state of the process and give the current state to other servers with resources to execute the task; in order to ensure that the process can be continuously executed, if all other hosts can not run the process, a sandbox mechanism is started, and the local machine continuously runs the process, but the resource access amount of the process is limited through a resource limitation scheme (cgroup) of an operating system, and an early warning is sent to a server manager to warn that the server is in a limited state and the host needs to be added for running, so that the problem that the server cannot generate avalanche is solved, and the server can be ensured to run at least without being crashed by using a resource access limiting mode; the invention adopts the broadcasting mode for interaction, does not depend on the heartbeat mechanism, and the reason for using the broadcasting mode for interaction is that the heartbeat mechanism needs to establish permanent connection and needs to interact more frequently, while the broadcasting mode is infrequent, communication is not needed under normal conditions, when interaction is needed, the operating systems respond to each other, the interaction frequency is reduced, and interaction is carried out only when necessary.
It should be particularly noted that, the steps in the embodiments of the method for process cooperation in a cluster described above may be mutually intersected, replaced, added, or deleted, and therefore, these methods for process cooperation in a cluster transformed by reasonable permutation and combination shall also belong to the scope of the present invention, and shall not limit the scope of the present invention to the embodiments.
Based on the above object, a second aspect of the embodiments of the present invention provides a system for process cooperation in a cluster. As shown in fig. 2, the system 200 includes the following modules: the configuration module is configured to configure all the hosts to the same network segment, establish communication between each host and other hosts, and monitor the resource configuration condition of each host; the computing module is configured to respond to the condition that the resource configuration of a first host exceeds a threshold value, compute the resource consumption of each process in the first host and determine the process with the maximum resource consumption; the broadcasting module is configured to initiate a migration broadcast to a cluster and judge whether a second host exists in the cluster to respond to the broadcast; a sending module configured to send, in response to a second host existing in the cluster responding to the broadcast, a process with the largest resource consumption to the second host; and the cluster working mode configuration module is used for configuring the cluster working mode into three working modes, namely a dynamic detection working mode, a fixed host list working mode and a main host and standby host list working mode.
In some embodiments, the system further comprises a restriction module configured to: in response to no second host in the cluster responding to the broadcast, limiting resource usage by a process in the first host having a greatest resource consumption.
In some embodiments, the sending module is configured to: and responding to the broadcast responded by a plurality of second hosts, acquiring the current resource utilization rate of each second host, and sending the process with the maximum resource consumption to the second host with the lowest current resource utilization rate.
In some embodiments, the sending module is configured to: and suspending the process with the maximum resource consumption, compressing the memory state of the process, and sending the compressed process memory mirror image and the process executable file to the second host.
In some embodiments, the configuration module is configured to: acquiring the CPU utilization rate, the memory utilization rate and the file system utilization rate of the host, carrying out weighted calculation according to the preset CPU utilization rate weight, the preset memory utilization rate weight and the preset file system utilization rate weight to obtain the resource configuration of the host, and comparing the resource configuration with the threshold.
In some embodiments, the system further comprises a backup module configured to: and setting a standby host in the cluster, configuring the host which is currently used into a host list, and setting the standby host into a dynamic increasing mode.
In some embodiments, the system further comprises a second sending module configured to: and responding to the broadcast that no second host exists in the cluster, and sending the process with the maximum resource consumption to the standby host for execution.
The embodiment of the invention can be implemented by an operating system command tool, a cluster synchronization device, a mirror image compression and backup device and a configuration device. The operating system command tool is responsible for interacting with a user, and the user is responsible for calling the operating system command tool to configure the cluster; the cluster synchronization device is responsible for communication among clusters, and is mainly responsible for sending and receiving broadcast, mutual transmission of messages in the clusters and the like; the mirror image compression and backup device is responsible for compressing the process to be migrated, and after the synchronization is completed through the cluster synchronization device, backing up the mirror image and starting a new process in other servers in the cluster; the configuration device is responsible for reading configuration and analyzing, storing the analyzed result, interacting through the synchronization device, and checking whether the configuration of other servers in the cluster is consistent.
A virtual subnet may be configured to configure servers within a cluster in the same network segment. After configuration is completed, a server in the cluster is found at any time, and initialization is performed using the operating system command tool rpc _ tools — init. The command tool first reads the configuration via the configuration device. And after the reading is finished, broadcasting is initiated to the cluster through the cluster synchronization device. The machines receiving the broadcast respond to the initialization operation and all initialize local configuration through the local configuration device. And after the initialization of each server is completed, the broadcast is initiated through the server synchronization device. And finally, completing the complete configuration.
After configuration is complete, the process may be initiated at the server cluster. And selecting a server to log in. The process is started using a command line tool. For example to start the demo process. And the process is started in the local computer by default, if the server reaches the configured threshold performance, the process is selected, a process with the largest consumption performance is selected, the broadcast is initiated, and the host computer capable of being hosted is found. After receiving the broadcast, the other servers reply whether the servers can host or not through the synchronization device. A hosting host is selected by the server that originated the hosting broadcast. After the selection is finished, the current process and the server state are compressed through the mirror image compression and backup device and transmitted to the new host, and the process of the local computer is killed. And if the host cannot be found, limiting the resource use of the process in a resource limiting mode and sending out alarm information.
In view of the above object, a third aspect of the embodiments of the present invention provides a computer device, including: at least one processor; and a memory storing computer instructions executable on the processor, the instructions being executable by the processor to perform the steps of: s1, configuring all hosts to the same network segment, establishing communication between each host and other hosts, and monitoring resource configuration condition of each host; s2, responding to the fact that the resource configuration of the first host exceeds a threshold value, calculating the resource consumption of each process in the first host, and determining the process with the maximum resource consumption; s3, initiating a migration broadcast to the cluster, and judging whether a second host computer exists in the cluster to respond to the broadcast; and S4, responding to the second host computer existing in the cluster responding to the broadcast, and sending the process with the maximum resource consumption to the second host computer.
In some embodiments, the steps further comprise: in response to no second host in the cluster responding to the broadcast, limiting resource usage by a process in the first host having a greatest resource consumption.
In some embodiments, the sending the process with the largest resource consumption amount to the second host includes: and responding to the broadcast responded by a plurality of second hosts, acquiring the current resource utilization rate of each second host, and sending the process with the maximum resource consumption to the second host with the lowest current resource utilization rate.
In some embodiments, the sending the process with the largest resource consumption amount to the second host includes: and suspending the process with the maximum resource consumption, compressing the memory state of the process, and sending the compressed process memory mirror image and the process executable file to the second host.
In some embodiments, the monitoring the resource configuration of each host includes: acquiring the CPU utilization rate, the memory utilization rate and the file system utilization rate of the host, carrying out weighted calculation according to the preset CPU utilization rate weight, the preset memory utilization rate weight and the preset file system utilization rate weight to obtain the resource configuration of the host, and comparing the resource configuration with the threshold.
In some embodiments, the steps further comprise: and setting a standby host in the cluster, configuring the host which is currently used into a host list, and setting the standby host into a dynamic increasing mode.
In some embodiments, the steps further comprise: and responding to the broadcast that no second host exists in the cluster, and sending the process with the maximum resource consumption to the standby host for execution.
Fig. 3 is a schematic hardware structural diagram of an embodiment of a computer device for process cooperation in a cluster according to the present invention.
Taking the device shown in fig. 3 as an example, the device includes a processor 301 and a memory 302.
The processor 301 and the memory 302 may be connected by a bus or other means, such as the bus connection in fig. 3.
The memory 302 is used as a non-volatile computer readable storage medium for storing non-volatile software programs, non-volatile computer executable programs, and modules, such as program instructions/modules corresponding to the method for process cooperation in a cluster in the embodiment of the present application. The processor 301 executes various functional applications of the server and data processing by running nonvolatile software programs, instructions, and modules stored in the memory 302, that is, implements the method of process cooperation in a cluster of the above-described method embodiments.
The memory 302 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the method of process cooperation in the cluster, and the like. Further, the memory 302 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 302 optionally includes memory located remotely from processor 301, which may be connected to a local module via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Computer instructions 303 corresponding to a method for process cooperation in one or more clusters are stored in the memory 302 and when executed by the processor 301, perform the method for process cooperation in a cluster in any of the above-described method embodiments.
Any embodiment of the computer device executing the method for process cooperation in a cluster can achieve the same or similar effects as any corresponding embodiment of the method.
The invention also provides a computer readable storage medium storing a computer program which, when executed by a processor, performs the method as above.
FIG. 4 is a schematic diagram of an embodiment of a computer storage medium for cooperating processes in a cluster according to the present invention. Taking the computer storage medium as shown in fig. 4 as an example, the computer readable storage medium 401 stores a computer program 402 which, when executed by a processor, performs the method as described above.
Finally, it should be noted that, as one of ordinary skill in the art can appreciate that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the program of the method for process cooperation in a cluster can be stored in a computer-readable storage medium, and when executed, the program can include the processes of the embodiments of the methods described above. The storage medium of the program may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (9)

1. A method for process cooperation in a cluster is characterized by comprising the following steps:
all the hosts are configured in the same network segment, the communication between each host and other hosts is established, and the resource configuration condition of each host is monitored;
in response to the existence of a process with the resource configuration of the first host exceeding a threshold, calculating the resource consumption of each process in the first host, and determining the process with the maximum resource consumption;
initiating a migration broadcast to a cluster, and judging whether a second host exists in the cluster to respond to the broadcast;
in response to a second host existing in the cluster responding to the broadcast, sending the process with the largest resource consumption to the second host, wherein the sending the process with the largest resource consumption to the second host comprises: suspending the process with the maximum resource consumption, carrying out mirror image compression on the memory state of the process, and sending the compressed process memory mirror image and the process executable file to the second host; and
the cluster working mode is configured into three working modes of dynamic detection, a fixed host list and a main host and standby host list.
2. The method of claim 1, further comprising:
in response to no second host in the cluster responding to the broadcast, limiting resource usage by a process in the first host having a greatest resource consumption.
3. The method according to claim 1, wherein the sending the process with the largest resource consumption amount to the second host comprises:
and responding to the broadcast responded by a plurality of second hosts, acquiring the current resource utilization rate of each second host, and sending the process with the maximum resource consumption to the second host with the lowest current resource utilization rate.
4. The method of claim 1, wherein monitoring the resource configuration of each host comprises:
acquiring the CPU utilization rate, the memory utilization rate and the file system utilization rate of the host, carrying out weighted calculation according to the preset CPU utilization rate weight, the preset memory utilization rate weight and the preset file system utilization rate weight to obtain the resource configuration of the host, and comparing the resource configuration with the threshold.
5. The method of claim 1, further comprising:
and setting a standby host in the cluster, configuring the host which is currently used into a host list, and setting the standby host into a dynamic increasing mode.
6. The method of claim 5, further comprising:
and responding to the broadcast that no second host exists in the cluster, and sending the process with the maximum resource consumption to the standby host for execution.
7. A system for process collaboration in a cluster, comprising:
the configuration module is configured to configure all the hosts to the same network segment, establish communication between each host and other hosts, and monitor the resource configuration condition of each host;
the computing module is configured to respond to the condition that the resource configuration of a first host exceeds a threshold value, compute the resource consumption of each process in the first host and determine the process with the maximum resource consumption;
the broadcasting module is configured to initiate a migration broadcast to a cluster and judge whether a second host exists in the cluster to respond to the broadcast;
a sending module, configured to send, in response to a second host existing in the cluster and in response to the broadcast, the process with the largest resource consumption to the second host, where the sending the process with the largest resource consumption to the second host includes: suspending the process with the maximum resource consumption, carrying out mirror image compression on the memory state of the process, and sending the compressed process memory mirror image and the process executable file to the second host; and
the cluster working mode configuration module is used for configuring the cluster working mode into three working modes, namely a dynamic detection working mode, a fixed host list and a main host and standby host list.
8. A computer device, comprising:
at least one processor; and
a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of the method of any one of claims 1 to 6.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
CN202110952191.1A 2021-08-19 2021-08-19 Method, system, equipment and storage medium for process cooperation in cluster Pending CN113407355A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110952191.1A CN113407355A (en) 2021-08-19 2021-08-19 Method, system, equipment and storage medium for process cooperation in cluster
PCT/CN2022/078097 WO2023019904A1 (en) 2021-08-19 2022-02-25 Method and system for process cooperation in cluster, device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110952191.1A CN113407355A (en) 2021-08-19 2021-08-19 Method, system, equipment and storage medium for process cooperation in cluster

Publications (1)

Publication Number Publication Date
CN113407355A true CN113407355A (en) 2021-09-17

Family

ID=77688792

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110952191.1A Pending CN113407355A (en) 2021-08-19 2021-08-19 Method, system, equipment and storage medium for process cooperation in cluster

Country Status (2)

Country Link
CN (1) CN113407355A (en)
WO (1) WO2023019904A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023019904A1 (en) * 2021-08-19 2023-02-23 苏州浪潮智能科技有限公司 Method and system for process cooperation in cluster, device, and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101924693A (en) * 2009-04-01 2010-12-22 威睿公司 Be used for method and system in migrating processes between virtual machines
CN104035823A (en) * 2014-06-17 2014-09-10 华为技术有限公司 Load balancing method and device
CN105260241A (en) * 2015-10-23 2016-01-20 南京理工大学 Mutual cooperation method for processes in cluster system
US9621643B1 (en) * 2015-07-31 2017-04-11 Parallels IP Holdings GmbH System and method for joining containers running on multiple nodes of a cluster
CN111614746A (en) * 2020-05-15 2020-09-01 北京金山云网络技术有限公司 Load balancing method and device of cloud host cluster and server
CN112416520A (en) * 2020-11-21 2021-02-26 广州西麦科技股份有限公司 Intelligent resource scheduling method based on vSphere

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8489744B2 (en) * 2009-06-29 2013-07-16 Red Hat Israel, Ltd. Selecting a host from a host cluster for live migration of a virtual machine
CN105528330B (en) * 2014-09-30 2019-05-28 杭州华为数字技术有限公司 The method, apparatus of load balancing is gathered together and many-core processor
CN105955809B (en) * 2016-04-25 2020-06-26 深圳市万普拉斯科技有限公司 Thread scheduling method and system
CN111813521A (en) * 2020-07-01 2020-10-23 Oppo广东移动通信有限公司 Thread scheduling method and device, storage medium and electronic equipment
CN113407355A (en) * 2021-08-19 2021-09-17 苏州浪潮智能科技有限公司 Method, system, equipment and storage medium for process cooperation in cluster

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101924693A (en) * 2009-04-01 2010-12-22 威睿公司 Be used for method and system in migrating processes between virtual machines
CN104035823A (en) * 2014-06-17 2014-09-10 华为技术有限公司 Load balancing method and device
US9621643B1 (en) * 2015-07-31 2017-04-11 Parallels IP Holdings GmbH System and method for joining containers running on multiple nodes of a cluster
CN105260241A (en) * 2015-10-23 2016-01-20 南京理工大学 Mutual cooperation method for processes in cluster system
CN111614746A (en) * 2020-05-15 2020-09-01 北京金山云网络技术有限公司 Load balancing method and device of cloud host cluster and server
CN112416520A (en) * 2020-11-21 2021-02-26 广州西麦科技股份有限公司 Intelligent resource scheduling method based on vSphere

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
蒋文康: "集群环境下自主负载均衡的研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023019904A1 (en) * 2021-08-19 2023-02-23 苏州浪潮智能科技有限公司 Method and system for process cooperation in cluster, device, and storage medium

Also Published As

Publication number Publication date
WO2023019904A1 (en) 2023-02-23

Similar Documents

Publication Publication Date Title
US10609159B2 (en) Providing higher workload resiliency in clustered systems based on health heuristics
CN106331098B (en) Server cluster system
CN108430116B (en) Disconnected network reconnection method, medium, device and computing equipment
CN106302565B (en) Scheduling method and system of service server
US10728099B2 (en) Method for processing virtual machine cluster and computer system
CN109151045B (en) Distributed cloud system and monitoring method
US8732312B2 (en) Computing system and computing system management method
US11150946B2 (en) Method and system for processing communication channel
CN111800285B (en) Instance migration method and device and electronic equipment
CN110764963A (en) Service exception handling method, device and equipment
CN110928637A (en) Load balancing method and system
CN111935244B (en) Service request processing system and super-integration all-in-one machine
CN113407355A (en) Method, system, equipment and storage medium for process cooperation in cluster
CN113765690A (en) Cluster switching method, system, device, terminal, server and storage medium
CN115280288A (en) Server system and method of managing server system
US8036105B2 (en) Monitoring a problem condition in a communications system
US9973569B2 (en) System, method and computing apparatus to manage process in cloud infrastructure
Das et al. LIMOCE: live migration of containers in the edge
CN116192885A (en) High-availability cluster architecture artificial intelligent experiment cloud platform data processing method and system
JP2017027166A (en) Operation management unit, operation management program, and information processing system
CN115314361A (en) Server cluster management method and related components thereof
CN108234215B (en) Gateway creating method and device, computer equipment and storage medium
CN115145782A (en) Server switching method, mooseFS system and storage medium
WO2020103627A1 (en) Service self-healing method and device based on virtual machine disaster recovery, and storage medium
CN111901421A (en) Data processing method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210917