CN111614701B

CN111614701B - Distributed cluster and container state switching method and device

Info

Publication number: CN111614701B
Application number: CN201910131705.XA
Authority: CN
Inventors: 陶琪
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2019-02-22
Filing date: 2019-02-22
Publication date: 2022-09-02
Anticipated expiration: 2039-02-22
Also published as: CN111614701A

Abstract

The embodiment of the invention provides a distributed cluster and a container state switching method and device, wherein in the distributed cluster, a container engine is configured in each node, the state of the container engine comprises an available state and a non-available state, the execution flow of the container engine in the non-available state is similar, for example, a container engine in the non-available state is used for monitoring whether the container engine in the available state fails or not, if the container engine in the available state fails, the container engine in the available state is judged whether to have a container engine in the available state which fails or not and is not replaced based on a specified record in a preset storage area, and if the container engine in the available state exists, the container engine in the available state is switched to the available state. Therefore, according to the scheme, manual intervention is not needed, the container engine in the unavailable state automatically switches the state of the container engine, and the switching time is shortened.

Description

Distributed cluster and container state switching method and device

Technical Field

The present invention relates to the field of distributed cluster technologies, and in particular, to a distributed cluster and a method and an apparatus for switching a container status.

Background

The related distributed cluster generally includes a master node and a standby node, and the master node may provide services such as data query and data storage. If the main node fails, manual intervention is usually required to switch the service in the main node to the standby node, and the standby node provides the service.

However, in this scheme, the service is switched by manual intervention, which results in a long time-consuming switching.

Disclosure of Invention

Embodiments of the present invention provide a distributed cluster, and a method and an apparatus for switching a container status, so as to shorten time consumed for switching.

In order to achieve the above object, an embodiment of the present invention provides a distributed cluster, where the cluster includes multiple nodes, and each node is configured with a container engine, where the container engine includes a container engine in an available state and a container engine in a non-available state;

the non-available container engine is used for monitoring whether the available container engine fails or not; if the failure occurs, judging whether a container engine in an unsuccessfully failed available state exists or not based on a specified record in a preset storage area; and if so, switching the state of the container engine in the non-available state into an available state.

Optionally, the container engine in the non-available state is further configured to determine whether a container engine identifier exists in the record, and if the container engine identifier does not exist in the record, switch the state of the container engine in the non-available state to an available state, and add the identifier of the container engine switched to the available state to the record.

Optionally, the container engines in the non-available state are further configured to determine whether the number of container engine identifiers in the record is smaller than the number of container engines in the available state that have failed; and if the number of the container engines in the non-available state is less than the number of the container engines in the available state, switching the state of the container engines in the non-available state into the available state, and adding the self identification of the container engines switched into the available state into the record.

Optionally, the container engine in the available state is configured to perform load balancing processing on the container engines in the available states in the distributed cluster.

Optionally, the container engine in the non-available state is further configured to access the specified record in the preset storage area through a probe in the container engine.

Optionally, the cluster further includes a sharing device, the preset storage area is located in the sharing device, and each container engine is in communication connection with the sharing device;

the non-available container engine is further configured to access a specified record in the shared device when the failure of the available container engine is monitored.

To achieve the above object, an embodiment of the present invention further provides a container state switching method, which is applied to a first node in a distributed cluster, where a first container engine is configured in the first node, and a state of the first container engine is: an available state or a non-available state; the method comprises the following steps:

monitoring whether a container engine in an available state in the cluster fails when the state of the first container engine is in a non-available state;

if the failure occurs, judging whether a container engine in an unsuccessfully failed available state exists or not based on a specified record in a preset storage area;

if so, switching the state of the first container engine to an available state.

Optionally, the determining whether there is an unsuccessfully failed container engine in an available state based on the specified record in the preset storage area includes:

judging whether a container engine identifier exists in a specified record in a preset storage area or not;

if not, indicating that there is a failed container engine that is not superseded;

the method further comprises the following steps:

adding the identifier of the first container engine in the record if the container engine identifier does not exist in the record.

judging whether the number of the container engine identifications in the specified record in the preset storage area is less than the number of the container engines in the available state with faults;

if so, indicating that there is a failed container engine that is not superseded;

the method further comprises the following steps:

in case the number of container engine identifications in the record is smaller than the number of container engines in a failed available state, adding the identification of the first container engine in the record.

Optionally, the method further includes:

and when the state of the first container engine is an available state, performing load balancing processing on the container engines in the available states in the distributed cluster.

Optionally, the first container engine includes a first probe therein; the method further comprises the following steps:

and accessing the specified record in the preset storage area through the first probe.

To achieve the above object, an embodiment of the present invention further provides a container state switching apparatus, which is applied to a first node in a distributed cluster, where a first container engine is configured in the first node, and a state of the first container engine is: an available state or a non-available state; the device comprises:

a monitoring module, configured to monitor whether a container engine in an available state in the cluster fails when a state of the first container engine is a non-available state; if the fault occurs, triggering a judgment module;

the judging module is used for judging whether a container engine in an unsuccessfully failed available state exists or not based on the specified record in the preset storage area; if not, triggering a switching module;

and the switching module is used for switching the state of the first container engine into an available state.

Optionally, the determining module is specifically configured to:

judging whether a container engine identifier exists in a specified record in a preset storage area or not; if not, indicating that there is a failed container engine that is not superseded;

the device further comprises:

a first adding module, configured to add, in the record, an identifier of the first container engine if the identifier of the container engine does not exist in the record.

Optionally, the determining module is specifically configured to:

judging whether the number of the container engine identifications in the specified record in the preset storage area is less than the number of the container engines in the available state with faults; if so, indicating that there is a failed container engine that is not superseded;

the device further comprises:

a second adding module, configured to add, in the record, an identifier of the first container engine if the number of container engine identifiers in the record is smaller than the number of container engines in the failed available state.

Optionally, the apparatus further comprises:

and the processing module is used for carrying out load balancing processing on the container engines in the available states in the distributed cluster under the condition that the state of the first container engine is the available state.

Optionally, the first container engine includes a first probe therein; the device further comprises:

and the access module is used for accessing the specified record in the preset storage area through the first probe.

In order to achieve the above object, an embodiment of the present invention further provides an electronic device, including a processor and a memory;

a memory for storing a computer program;

and the processor is used for realizing any container state switching method when executing the program stored in the memory.

In order to achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the computer-readable storage medium implements any one of the above container state switching methods.

By applying the distributed cluster provided by the embodiment of the invention, the container engine is configured in each node, the state of the container engine comprises an available state and a non-available state, the execution flow of the container engine in each non-available state is similar, for example, a container engine in a non-available state is taken as an example, the container engine in an available state is monitored whether to have a fault, if the fault occurs, the container engine in a failed state which is not replaced is judged whether to exist based on a specified record in a preset storage area, and if the fault occurs, the self state is switched to the available state. Therefore, according to the scheme, manual intervention is not needed, the container engine in the unavailable state automatically switches the state of the container engine, and the switching time is shortened.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1a is a schematic structural diagram of a distributed cluster according to an embodiment of the present invention;

fig. 1b is a schematic structural diagram of another distributed cluster according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a container status switching method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a container state switching device according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to solve the foregoing technical problem, embodiments of the present invention provide a distributed cluster, and a method and an apparatus for switching a container status, where first, the distributed cluster provided in embodiments of the present invention is described in detail below.

Fig. 1a is a schematic structural diagram of a distributed cluster provided in an embodiment of the present invention, where the distributed cluster includes a plurality of nodes: node 1, node 2 … …, node N. Each node is configured with a container engine, which may also be referred to as a container, which may be understood as a lightweight, stand-alone executable software package containing data, such as code, system tools, system libraries, etc., necessary to run the software. For example, the container engine may be Docker, rkt of CoreOS (an operating system), or the like, and is not limited in particular.

For convenience of description, the container engine configured in the node 1 is referred to as container 1, the container engine configured in the node 2 is referred to as container 2 … …, and the container engine configured in the node N is referred to as container N. The container engines include a container engine in an available state and a container engine in a non-available state, or the state of the container engine is divided into two states, namely an available state and a non-available state. In one case, a container engine in an available state may exist in the distributed cluster, and one container engine provides services such as data reading and writing, so that the situation of repeated data writing can be reduced, and the situation of data reading and writing errors is also reduced. Or, in another case, a plurality of container engines in available states may also exist in the distributed cluster, and the plurality of container engines provide services such as data reading and writing, so that on one hand, the processing efficiency may be improved, and on the other hand, the stability of the cluster may be improved, for example, when one container engine fails, other container engines may still provide data reading and writing services.

In this embodiment, the execution flow of each container engine in the unavailable state is similar, and a container engine in the unavailable state is taken as an example for description below:

a container engine in a non-available state for monitoring whether the container engine in the available state fails; if the failure occurs, judging whether a container engine in an unsuccessfully failed available state exists or not based on a specified record in a preset storage area; and if so, switching the state of the container engine in the non-available state into an available state.

In one case, only the non-available container engine monitors the available container engine for a failure; in another case, the container engines can monitor each other to monitor whether the other side has a fault.

For example, each container engine may include a probe, and the container engines may be monitored by the probe. Alternatively, a monitoring service may be included in each container engine, and the monitoring service may be a process, and the container engines monitor each other through the monitoring service. In one case, the container engines configured in each node are the same container engine, so that mutual monitoring among the container engines is facilitated. If the container engine that detects the available status fails, the other container engines may start a contention mode, i.e., contend for the container engine to become available.

As an embodiment, if there is only one available container engine in the distributed cluster, the non-available container engine may determine whether there is a new available container engine based on a specified record in the preset storage area. For example, it may be determined whether a container engine identifier exists in a specified record in the preset storage area, and if not, it indicates that there is no container engine in a new available state, that is, there is a container engine in a failed available state that is not taken over; in this case, the state of the container engine in the non-available state may be switched to the available state, and the self identifier of the container engine switched to the available state may be added to the record.

In one embodiment, the container engine in the non-available state may access a specified record in a preset storage area in the case where it is monitored that the container engine in the available state fails, and determine whether there is a failed container engine in the available state that has not been taken over based on the record.

In another embodiment, each container engine may periodically or aperiodically access a designated record in a preset storage area; in this embodiment, the container engine in the non-available state may determine whether there is a container engine in the failed available state that is not taken over according to the latest record obtained by accessing when the container engine in the available state is monitored to be failed.

Referring to fig. 1a, assuming that a container 2 is a container engine in a usable state, other containers monitor whether the container 2 is malfunctioning. Taking container 1 as an example, assuming that container 1 detects a failure of container 2, container 1 accesses a specified record in a preset storage area. If the identity of container 3 already exists in the record, it indicates that container 3 has been preemptively made available to the container engine. If the container engine identification does not exist in the record, the container 1 switches the self state into the available state, and adds the identification of the container 1 in the record, so that when other containers access the record, the other containers can know that the container 1 is the container engine in the available state in advance.

The container engine identifier may be an ID, an IP address, or the like of the container, and is not particularly limited.

As another embodiment, if there are multiple container engines in available states in the distributed cluster, the container engine in a non-available state may determine whether the number of container engine identifiers in the designated record of the preset storage area is less than the number of container engines in a failed available state; if so, indicating that there is a failed container engine that is not superseded; in this case, the state of the container engine in the non-available state is switched to the available state, and the self identifier of the container engine switched to the available state is added to the record.

Still referring to FIG. 1a, assuming that containers 2 and 4 are available status container engines, other containers monitor whether containers 2 and 4 are malfunctioning. Taking container 1 as an example, assuming that container 1 detects that container 2 is faulty, that is, the number of container engines in a faulty available state is 1, container 1 accesses a specified record in a preset storage area. If the identification of the container 3 already exists in the record, that is, the number 1 of container engine identifications in the record is not less than the number 1 of container engines in the failed available state, or the container 3 has already taken over the container engine in which the container 2 (the failed available state container engine) was preemptively made available.

If the container engine identifications do not exist in the record, the number 0 of the container engine identifications in the record is smaller than the number 1 of the container engines in the available state with faults, the container 1 switches the state of the container 1 into the available state, and the identifications of the container 1 are added in the record, so that when other containers access the record, the container 1 can be known to replace the container engines in which the container 2 is in the available state in advance.

As another example, assuming that containers 2 and 4 are available container engines, other containers monitor whether containers 2 and 4 are malfunctioning. Taking container 1 as an example, assuming that container 1 detects that both container 2 and container 4 are malfunctioning, that is, the number of container engines in the malfunctioning available state is 2, container 1 accesses a specified record in a preset storage area. If the identification of the container 3 already exists in the record, that is, the number 1 of container engine identifications in the record is smaller than the number 2 of container engines in the failed available state, so that there still exists a container engine in the failed available state that has not been taken over, the container 1 switches its state to the available state, and the identification of the container 1 is added to the record. In this way, when other containers access the record, it can be known that container 3 and container 1 have superseded the container engine that container 2 and container 4 were preemptively made available.

If the identification of container 3 and container 5 is present in the record, the number of container engine identifications 2 in the record is not less than the number of failed container engines in available state 2, indicating that container 3 and container 5 have superseded container engines in which container 2 and container 4 were preemptively made available.

As described above, in one case, each container engine may include a probe therein, and each container engine may access the specified record in the preset storage area through the probe in its own container engine.

Alternatively, a service for accessing the preset storage area may be included in each container engine, and the service may be a process, and each container engine accesses the specified record in the preset storage area through the service.

The available state container engine may process service requests of the user device. For example, the service request may be a query request, a storage request, and the like, which is not limited in particular. For example, the user equipment may send a query request to the distributed cluster, in which case, the available state container engine queries data corresponding to the query request in the distributed cluster, and returns the found data to the user equipment.

For another example, the user device may send a storage request to the distributed cluster, in which case the available state container engine allocates a storage address for the data to be stored, stores the data to be stored to the storage address, and returns the storage address to the user device.

If a plurality of available state container engines are included in the distributed cluster, the available state container engines can also perform load balancing processing on the available state container engines in the distributed cluster. For example, taking an available state container engine as an example, when receiving a service request from a user, it (the available state container engine) can determine the memory occupation amounts of itself and other available state container engines, and if the memory occupation amount of itself is minimum, it processes the service request. Alternatively, it may determine the number of service requests it and other available state container engines are currently processing, and if it is the smallest number of service requests it is currently processing, it processes the service requests.

As an implementation manner, referring to fig. 1b, the distributed cluster may further include a shared device, the preset storage area is located in the shared device, and each container engine is communicatively connected to the shared device. For example, the sharing device may be a hard disk and a File System shared by the plurality of nodes, for example, NFS (Network File System), and is not limited specifically.

In this embodiment, the container engine in the non-available state may access a specified record in the shared device when the container engine in the available state is monitored to be in failure, and determine whether there is a failed container engine in the failed state that is not taken over based on the record.

Alternatively, the container engine in the non-available state may also access the specified records in the shared device periodically or aperiodically; the container engine in the non-available state can judge whether the container engine in the available state which is not succeeded in the container engine in the available state which is failed exists according to the latest record obtained by accessing under the condition that the container engine in the available state is monitored to be failed.

In one case, for a container engine, it can continuously monitor other container engines, and can continuously access the designated records in the preset storage area, and the order in which the container engine executes the steps is not limited.

By applying the distributed cluster provided by the embodiment of the invention, the container engine is configured in each node, the state of the container engine comprises an available state and a non-available state, the execution flow of the container engine in each non-available state is similar, for example, a container engine in a non-available state is taken as an example, the container engine in an available state is monitored whether to have a fault, if the fault occurs, the container engine in a failed state which is not replaced is judged whether to exist based on a specified record in a preset storage area, and if the fault occurs, the self state is switched to the available state. Therefore, in the scheme, on the first aspect, manual intervention is not needed, the container engine in the unavailable state automatically switches the state of the container engine, the switching time is shortened, and a user basically has no perception in the switching process.

In the second aspect, if a secondary failure occurs, for example, the container 2 in the available state fails, the container 1 switches its own state to the available state, and then the container 1 also fails, in which case, other container engines continue to compete for the container engine in the new available state, manual intervention is still not needed, and the switching time is short.

In a third aspect, in some related schemes, a master application program is installed in a node, and master-slave switching is realized through the master application program.

Corresponding to the distributed cluster embodiment, the embodiment of the invention also provides a container state switching method and a device. The method and the device are applied to a first node in the distributed cluster, the first node can be any node in the distributed cluster, and for convenience of description, an execution main body is called the first node. A container engine is configured in each node of the distributed cluster, and for convenience of description, the container engine configured in the first node is referred to as a first container engine. The state of the first container engine is: an available state or an unavailable state.

As shown in fig. 2, the method may include the steps of:

s201: monitoring whether a container engine in an available state in the cluster fails or not under the condition that the state of the first container engine is in a non-available state; if a failure occurs, S202 is performed.

The container engine may also be referred to as a container, which may be understood as a lightweight independently executable software package containing data required to run software, such as code, system tools, system libraries, and the like. For example, the container engine may be Docker, rkt of CoreOS (an operating system), or the like, and is not limited in particular.

In one case, a container engine in an available state may exist in the distributed cluster, and one container engine provides services such as data reading and writing, so that the situation of repeated data writing can be reduced, and the situation of data reading and writing errors is also reduced. Or, in another case, multiple container engines in available states may also exist in the distributed cluster, and the multiple container engines provide services such as data read-write, so that on one hand, processing efficiency may be improved, and on the other hand, stability of the cluster may be improved, for example, when one container engine fails, other container engines may still provide data read-write services.

In one case, only the non-available container engine monitors the available container engine for a failure; in another case, the container engines can monitor each other to monitor whether the other side fails.

For example, each container engine may include a probe, and the container engines may be monitored by the probe. Alternatively, a monitoring service may be included in each container engine, and the monitoring service may be a process, and the container engines monitor each other through the monitoring service. In one case, the container engines configured in each node are the same container engine, so that mutual monitoring among the container engines is facilitated.

If the first container engine detects a failure of the available container engine, S202 is performed.

S202: based on the specified record in the preset storage area, it is determined whether there is a container engine in a failed available state that has not been taken over. If so, S203 is performed.

In one embodiment, the first container engine may access a specified record in a preset storage area in the case where it is monitored that the container engine in the available state fails, and determine whether there is a failed container engine in the available state that has not been taken over based on the record.

In another embodiment, each container engine may periodically or aperiodically access a designated record in a preset storage area; in this embodiment, the first container engine may determine, when it is detected that the container engine in the available state fails, whether there is a container engine in the failed available state that is not replaced, according to the latest record obtained by accessing.

In one embodiment, the first container engine includes a first probe therein; the first container engine may access the specified records in the preset storage area through the first probe.

As an implementation manner, referring to fig. 1b, the cluster may further include a sharing device, the preset storage area is located in the sharing device, and each container engine is communicatively connected to the sharing device. For example, the sharing device may be a hard disk and a File System shared by the plurality of nodes, for example, NFS (Network File System), and is not limited specifically.

In this embodiment, the first container engine may access a specified record in the shared device when the failure of the container engine in the available state is monitored, and determine whether there is a failed container engine in the available state that is not taken over based on the record.

Alternatively, the first container engine may also access specified records in the shared device periodically or aperiodically; the container engine in the non-available state can judge whether the unsuccessfully-replaced container engine in the available state exists according to the latest record obtained by accessing under the condition that the container engine in the available state is monitored to be in the fault state.

S203: the state of the first container engine is switched to an available state.

As an embodiment, if there is only one available state container engine in the distributed cluster, S202 may include: based on the specified record in the preset storage area, it is determined whether there is a new available state container engine. For example, it may be determined whether a container engine identifier exists in the specified record, and if not, it indicates that there is no container engine in a new available state, or there is a container engine in a failed available state that is not taken over; in this case, the state of the first container engine may be switched to an available state and the identity of the first container engine may be added to the record.

Referring to fig. 1a, assuming that a container 2 is a container engine in a usable state, other containers monitor whether the container 2 is malfunctioning. Assuming that container 1 is the first container engine, container 1 detects a failure of container 2 and container 1 accesses a specified record in a preset storage area. If the identity of container 3 already exists in the record, it indicates that container 3 has been preemptively made available to the container engine. If the container engine identification does not exist in the record, the container 1 switches the self state into the available state, and adds the identification of the container 1 in the record, so that when other containers access the record, the other containers can know that the container 1 is the container engine in the available state in advance.

As another embodiment, if there are multiple container engines of available state in the distributed cluster, S202 may include: and judging whether the number of the container engine identifications in the specified record of the preset storage area is less than the number of the container engines in the available state with faults. If so, indicating that there is a failed container engine that is not superseded; in this case, the state of the first container engine is switched to the available state and the identity of the first container engine is added to the record.

Still referring to FIG. 1a, assuming that containers 2 and 4 are available status container engines, other containers monitor whether containers 2 and 4 are malfunctioning. Assuming that the container 1 is the first container engine, the container 1 detects that the container 2 is out of order, that is, the number of available container engines in the out-of-order state is 1, and the container 1 accesses a specified record in a preset storage area. If the identification of the container 3 already exists in the record, that is, the number 1 of container engine identifications in the record is not less than the number 1 of container engines in the failed available state, or the container 3 has already taken over the container engine in which the container 2 (the failed available state container engine) was preemptively made available.

As another example, assuming that containers 2 and 4 are available container engines, other containers monitor whether containers 2 and 4 are malfunctioning. Assuming that container 1 is the first container engine, container 1 detects that both container 2 and container 4 are malfunctioning, that is, the number of available container engines in the malfunctioning state is 2, and container 1 accesses a specified record in a preset storage area. If the identification of the container 3 already exists in the record, that is, the number 1 of container engine identifications in the record is smaller than the number 2 of container engines in the failed available state, so that there still exists a container engine in the failed available state that has not been taken over, the container 1 switches its state to the available state, and the identification of the container 1 is added to the record. In this way, when other containers access the record, it can be known that container 3 and container 1 have superseded the container engine that container 2 and container 4 were preemptively made available.

The first container engine may process the service request of the user device if the state of the first container engine is an available state. For example, the service request may be a query request, a storage request, and the like, which is not limited in particular. For example, the user equipment may send an inquiry request to the distributed cluster, in which case the first container engine inquires data corresponding to the inquiry request in the distributed cluster, and returns the found data to the user equipment.

For another example, the user device may send a storage request to the distributed cluster, in which case the first container engine allocates a storage address for the data to be stored, stores the data to be stored to the storage address, and returns the storage address to the user device.

If the state of the first container engine is an available state and the distributed cluster comprises a plurality of container engines in available states, the first container engine can also perform load balancing processing on the container engines in the available states in the distributed cluster. For example, when a service request of a user is received, the first container engine may determine the memory occupation amount of the container engine itself and other available states, and if the memory occupation amount of the first container engine is minimum, the service request is processed by the first container engine. Alternatively, the first container engine may determine the number of service requests it and other available state container engines are currently processing, and if the first container engine is currently processing the smallest number of service requests it is processing, the first container engine processes the service requests.

In one case, the first container engine may continuously monitor other container engines, may continuously access a specified record in a predetermined storage area, and may perform steps in a non-limited order.

By applying the embodiment of the invention, if the state of the first container engine is in a non-available state, the first container engine monitors whether the container engine in the available state in the cluster has a fault; if the failure occurs, judging whether a container engine in a failure available state which is not replaced exists or not based on a specified record in a preset storage area; if so, the state of the first container engine is switched to an available state. Therefore, in the scheme, on the first aspect, manual intervention is not needed, the first container engine is automatically switched to the available state, the switching time is shortened, and a user basically has no perception in the switching process.

In the second aspect, if a secondary failure occurs, for example, after the first container engine switches its own state to an available state, the first container engine also fails, in which case, other container engines continue to compete for the container engine in the new available state, manual intervention is still not needed, and the switching time is relatively short.

Corresponding to the foregoing method embodiment, an embodiment of the present invention further provides a container state switching apparatus, which is applied to a first node in a distributed cluster, where the first node is configured with a first container engine, and a state of the first container engine is: an available state or an unavailable state. As shown in fig. 3, the apparatus includes:

a monitoring module 301, configured to monitor whether a container engine in an available state in the cluster fails when the state of the first container engine is a non-available state; if a fault occurs, the judging module 302 is triggered;

a judging module 302, configured to judge whether there is an unsuccessfully failed container engine in an available state based on a specified record in a preset storage area; if not, triggering a switching module;

a switching module 303, configured to switch a state of the first container engine to an available state.

As an embodiment, the determining module 302 may specifically be configured to: judging whether a container engine identifier exists in a specified record in a preset storage area or not; if not, indicating that there is a failed container engine that is not superseded;

in this embodiment, the apparatus may further include: a first adding module (not shown in the figure) for adding the identifier of the first container engine in the record if the identifier of the container engine does not exist in the record.

As an embodiment, the determining module 302 may specifically be configured to: judging whether the number of the container engine identifications in the specified record in the preset storage area is less than the number of the container engines in the available state with faults; if so, indicating that there is a failed container engine that is not superseded;

in this embodiment, the apparatus may further include: a second adding module (not shown in the figure) for adding the identification of the first container engine in the record in case the number of container engine identifications in the record is smaller than the number of container engines in the failed available state.

As an embodiment, the apparatus further comprises:

a processing module (not shown in the figure), configured to perform load balancing processing on the container engines in the available states in the distributed cluster when the state of the first container engine is an available state.

In one embodiment, the first container engine includes a first probe therein; the device further comprises:

and an accessing module (not shown in the figure) for accessing the specified record in the preset storage area through the first probe.

By applying the embodiment of the invention, if the state of the first container engine is in a non-available state, the first container engine monitors whether the container engine in the available state in the cluster has a fault; if the failure occurs, judging whether a container engine in an unsuccessfully failed available state exists or not based on a specified record in a preset storage area; if so, the state of the first container engine is switched to an available state. Therefore, in the scheme, on the first hand, manual intervention is not needed, the first container engine is automatically switched to the available state, and the switching time is shortened.

An embodiment of the present invention further provides an electronic device, as shown in fig. 4, including a processor 401 and a memory 402;

a memory 402 for storing a computer program;

the processor 401 is configured to implement any of the above-described container state switching methods when executing the program stored in the memory 402.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

The embodiment of the invention also provides a computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the computer program realizes any one of the container state switching methods.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the method embodiment, the apparatus embodiment, the device embodiment, and the computer-readable storage medium embodiment, since they are substantially similar to the distributed cluster embodiment, the description is relatively simple, and the relevant points can be referred to the partial description of the distributed cluster embodiment.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A distributed cluster is characterized in that the cluster comprises a plurality of nodes, each node is provided with a container engine, and the container engines comprise a container engine in an available state and a container engine in a non-available state;

the non-available container engine is used for monitoring whether the available container engine fails or not; if the failure occurs, judging whether a container engine in an unsuccessfully failed available state exists or not based on a specified record in a preset storage area; if so, switching the state of the container engine in the non-available state into an available state;

wherein, the container engine identifier of the container engine of the new available state is recorded in the specified record.

2. The distributed cluster according to claim 1, wherein the container engine in the unavailable state is further configured to determine whether a container engine identifier exists in the record, and if not, switch the state of the container engine in the unavailable state to the available state, and add the identifier of the container engine switched to the available state to the record.

3. The distributed cluster of claim 1, wherein the unavailable container engine is further configured to determine whether the number of container engine identifiers in the record is less than the number of failed available container engines; and if the number of the container engines in the non-available state is less than the number of the container engines in the available state, switching the state of the container engines in the non-available state into the available state, and adding the self identification of the container engines switched into the available state into the record.

4. The distributed cluster of claim 3, wherein the available state container engines are configured to load balance the available state container engines in the distributed cluster.

5. The distributed cluster of claim 1, wherein the unavailable container engine is further configured to access the specified records in the predetermined storage area through a probe in its own container engine.

6. The distributed cluster of claim 1, wherein the cluster further comprises a shared device, wherein the predetermined storage area is located in the shared device, and wherein each container engine is communicatively connected to the shared device;

the non-available state container engine is further configured to access a specified record in the shared device when the failure of the available state container engine is monitored.

7. A container state switching method is applied to a first node in a distributed cluster, wherein a first container engine is configured in the first node, and the state of the first container engine is as follows: an available state or a non-available state; the method comprises the following steps:

monitoring whether a container engine in an available state in the cluster fails if the state of the first container engine is in a non-available state;

if so, switching the state of the first container engine to an available state;

wherein, the container engine identification of the container engine with the new available state is recorded in the specified record.

8. The method of claim 7, wherein determining whether there is a failed container engine that is not superseded based on a specified record in a preset storage area comprises:

the method further comprises the following steps:

adding the identity of the first container engine in the record if the container engine identity is not present in the record.

9. The method of claim 7, wherein determining whether there is a failed container engine that is not superseded based on a specified record in a preset storage area comprises:

if so, indicating that there is a failed container engine in an unsubscribed state;

the method further comprises the following steps:

10. The method of claim 9, further comprising:

11. The method of claim 7, wherein the first container engine includes a first probe therein; the method further comprises the following steps:

12. A container state switching device is applied to a first node in a distributed cluster, wherein a first container engine is configured in the first node, and a state of the first container engine is as follows: an available state or a non-available state; the device comprises:

a switching module for switching the state of the first container engine to an available state;

13. The apparatus of claim 12, wherein the determining module is specifically configured to:

the device further comprises:

14. The apparatus of claim 12, wherein the determining module is specifically configured to:

the device further comprises:

a second adding module, configured to add the identifier of the first container engine to the record if the number of container engine identifiers in the record is less than the number of container engines in the failed available state.

15. The apparatus of claim 14, further comprising:

16. The apparatus of claim 12, wherein the first container engine comprises a first probe therein; the device further comprises:

17. An electronic device comprising a processor and a memory;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 7 to 11 when executing a program stored in the memory.

18. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any of the claims 7-11.