CN117914675A

CN117914675A - Method and device for constructing distributed cache system

Info

Publication number: CN117914675A
Application number: CN202410023209.3A
Authority: CN
Inventors: 朱宏; 邓程; 孙钰; 宋文杰; 赵诣欣; 侍晶晶; 兰婷; 杨明
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2024-01-05
Filing date: 2024-01-05
Publication date: 2024-04-19

Abstract

The application provides a method and a device for constructing a distributed cache system, relates to the technical field of distributed system clusters, and can safely start the distributed cache system and support dynamic expansion and contraction nodes. The method comprises the following steps: acquiring node change information in a root directory in a first time period, wherein the root directory is used for recording node information after initialization is completed; node change information is sent to all nodes under the root directory, so that each node under the root directory changes a corresponding cluster state set, and the changed cluster state set comprises node information identical to the node information under the root directory; when the number of node information in the cluster state set of each node under the root directory is larger than or equal to the first preset number, constructing a distributed cache system by using the nodes in the cluster state set.

Description

Method and device for constructing distributed cache system

Technical Field

The present application relates to the field of distributed system clusters, and in particular, to a method and an apparatus for constructing a distributed cache system.

Background

The Zookeeper is an open-source distributed application coordination service, supports the creation of catalogs and data, and can inform clients in real time of the changes of the catalogs and the data. The directory of the Zookeeper can be divided into a temporary directory and a persistent directory, wherein the temporary directory has a fixed life cycle, and the temporary directory can be automatically deleted when a client is disconnected. In the prior art, the cluster mode mechanism is simple, the cluster nodes do not actually interact with each other, each cluster node fixedly accesses partial data, only a local cache is used, generally only a plurality of cluster nodes can be fixed, and dynamic capacity expansion and contraction of the nodes are difficult. Failure to be resilient to fault tolerance, traffic cannot drift to another node after a node is abnormal.

Disclosure of Invention

The application provides a method and a device for constructing a distributed cache system, which can safely start the distributed cache system and support dynamic expansion and contraction nodes.

In order to achieve the above purpose, the application adopts the following technical scheme:

In a first aspect, the present application provides a method for constructing a distributed cache system, the method comprising: acquiring node change information in a root directory in a first time period, wherein the root directory is used for recording node information after initialization is completed; transmitting the node change information to all nodes under the root directory, so that each node under the root directory changes a corresponding cluster state set, wherein the changed cluster state set comprises the same node information as the node information under the root directory; when the number of node information in the cluster state set of each node under the root directory is larger than or equal to a first preset number, constructing a distributed cache system by using the nodes in the cluster state set.

Based on the method provided in the first aspect, each node contains information of all nodes under the root directory, and when the node information changes, other nodes can be notified rapidly, so that dynamic expansion and contraction can be realized. The method can support the setting of the minimum cluster size when the cluster is started, the node information is firstly added into the root directory, the number of nodes in the root directory is judged, and when the number is larger than or equal to the first preset number, the distributed cache system is started, so that the safe starting of the cluster can be well ensured.

A possible implementation manner, the method further includes: initializing a target node; after the initialization of the target node is completed, the node information of the target node is written into the root directory; the node change information includes node information of the target node, and the modified cluster state set includes node information of the target node.

Based on the method provided in the first aspect, each node may be notified according to the node addition condition, so as to support the dynamic addition of the node.

A possible implementation manner, the method further includes: deleting node information of the target node under the root directory from the root directory under the condition that one target node fails and the root directory comprises the node information of the target node; the node change information includes node information of the target node, and the modified cluster state set does not include node information of the target node.

Based on the method provided in the first aspect, each node may be notified according to the node reduction condition, and the dynamic reduction of the node may be supported.

A possible implementation manner, the node information further includes a cache state of each node under the root directory and a cluster state of each node under the root directory, and after writing the node information of the target node into the root directory, the method further includes: setting the caching state of the node as an unbuffering state, and setting the cluster state of the node as unfinished; the constructing a distributed cache system by using the nodes in the cluster state set comprises the following steps: constructing an initial distributed key value storage system, and adding all nodes under the root directory into the initial distributed key value storage system to obtain the distributed cache system; the method further comprises the steps of: and changing the caching state of the node from the non-caching state to the normal caching state.

Based on the method provided in the first aspect, built-in distributed caching capability can be provided, and the distributed caching is bound with a cluster mechanism, so that cluster availability is ensured.

A possible implementation manner, the method further includes: if the number of the nodes in the distributed cache system is greater than or equal to a second preset number and the cache states of the nodes in the distributed cache system are all normal cache states, setting the cluster states of the nodes in the distributed cache system as complete; and carrying out distributed caching on the data by utilizing the nodes in the distributed caching system.

Based on the method provided in the first aspect, the caching opportunity can be determined according to the node preparation condition and the node number, so that the cluster caching availability is improved.

A possible implementation manner, after the data is cached in a distributed manner by using the node in the distributed caching system, further includes: when a node is newly added in the distributed cache system, setting the cache state of all nodes in the distributed cache system to be a non-cache state; and after the data stored by each node in the distributed cache system are restored to the nodes in the distributed cache system, setting the cache states of all the nodes in the distributed cache system to be normal cache states.

One possible implementation manner, setting the cluster state of the nodes in the distributed cache system to be complete further includes: after the nodes in the distributed cache system exit, setting the cache states of all the nodes in the distributed queue to be non-cache states; and after the data stored by each node in the distributed cache system are restored to the nodes in the distributed cache system, setting the cache states of all the nodes in the distributed cache system to be normal cache states.

Based on the method provided in the first aspect, the method has a flexible clustering mechanism, and the service data drift capability of the abnormal node is easily expanded.

In a second aspect, the present application provides an apparatus for implementing the method for constructing a distributed cache system. The device for constructing the distributed cache system comprises a module, a unit or means (means) for realizing the method, wherein the module, the unit or the means can be realized by hardware, software or realized by executing corresponding software by hardware. The hardware or software includes one or more modules or units corresponding to the functions described above.

In a third aspect, the present application provides an apparatus for building a distributed cache system, the apparatus comprising: a processor and a communication interface; the communication interface is coupled to a processor for running a computer program or instructions to implement a method of construction of a distributed cache system as described in any one of the possible implementations of the first aspect and the first aspect.

In a fourth aspect, the present application provides a computer readable storage medium having instructions stored therein which, when run on a terminal, cause the terminal to perform a method of construction of a distributed cache system as described in any one of the possible implementations of the first aspect and the first aspect.

In a fifth aspect, embodiments of the present application provide a computer program product comprising instructions which, when run on an apparatus for the construction of a distributed cache system, cause the apparatus for the construction of a distributed cache system to perform a method of the construction of a distributed cache system as described in any one of the possible implementations of the first aspect and the first aspect.

In a sixth aspect, embodiments of the present application provide a chip comprising a processor and a communication interface, the communication interface and the processor being coupled, the processor being for running a computer program or instructions to implement a method of construction of a distributed cache system as described in any one of the possible implementations of the first aspect and the first aspect.

Specifically, the chip provided in the embodiment of the application further includes a memory, which is used for storing a computer program or instructions.

Drawings

FIG. 1 is a schematic diagram of a distributed cache system according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a method for constructing a distributed cache system according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a method for constructing another distributed cache system according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a method for constructing a distributed cache system according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of an apparatus for constructing a distributed cache system according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a device hardware structure for constructing a distributed cache system according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a chip according to an embodiment of the present application.

Detailed Description

The method and apparatus for constructing a distributed cache system according to the embodiments of the present application are described in detail below with reference to the accompanying drawings.

The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone.

The terms "first" and "second" and the like in the description and in the drawings are used for distinguishing between different objects or between different processes of the same object and not for describing a particular order of objects.

Furthermore, references to the terms "comprising" and "having" and any variations thereof in the description of the present application are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed but may optionally include other steps or elements not listed or inherent to such process, method, article, or apparatus.

It should be noted that, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

The following describes in detail the implementation of the embodiment of the present application with reference to the drawings.

As described in the background art, the current cluster nodes do not actually interact with each other, each cluster node fixedly accesses partial data, only a local cache is used, the cluster mode mechanism is simple, a plurality of nodes are generally fixed, each node regularly writes own state into an external system such as a Zookeeper, other node states do not need to be read during starting, other nodes do not need to be waited, the fault tolerance is not realized, the service cannot drift to other nodes after one node is abnormal, and the capacity expansion and contraction of the nodes are not easy. If the local cache needs to be changed into the split cache, an external cache system which is independently operated, such as redis, is generally used, and independent installation and operation are needed, so that additional dependence exists. In order to solve the above problems, an embodiment of the present application provides a method for constructing a distributed cache system, where the method includes: acquiring node change information in a root directory in a first time period, wherein the root directory is used for recording node information after initialization is completed; node change information is sent to all nodes under the root directory, so that each node under the root directory changes a corresponding cluster state set, and the changed cluster state set comprises node information identical to the node information under the root directory; when the number of node information in the cluster state set of each node under the root directory is larger than or equal to the first preset number, constructing a distributed cache system by using the nodes in the cluster state set.

It will be appreciated that the above-described method may be performed by any computing-capable device. For example, the processing may be performed by a server, a computer, a computing device, or the like (hereinafter, referred to as a means for constructing a distributed cache system). Wherein the server includes, but is not limited to: tower servers, blade servers, rack servers, physical servers, virtual hosts, virtual private servers (Virtual PRIVATE SERVER, VPS), cloud servers, home servers, enterprise servers, and the like.

The method for constructing the distributed cache system according to the embodiment of the present application is described in detail below with reference to the accompanying drawings.

As shown in fig. 1, fig. 1 is a schematic diagram of a distributed cache system according to an embodiment of the present application. The distributed cache system includes a Zookeeper and cluster component.

The Zookeeper is an open-source distributed application coordination service, and is used for supporting the creation of catalogues and data, wherein the change of the catalogues and the data can be notified to a client in real time. The directory of the Zookeeper can be divided into a temporary directory and a persistent directory, wherein the temporary directory has a fixed life cycle, and the temporary directory can be automatically deleted when a client is disconnected.

The cluster component is an application component with the number of running nodes equal to or more than 2 in the system, the nodes cooperate together to complete the processing of service data, and the cluster generally has load balancing and fault tolerance capabilities. The cluster assembly comprises a plurality of nodes DW, and the nodes are used for caching data in a constructed distributed caching system. As shown in fig. 1, includes nodes DW1, DW2, … … DWn.

Specifically, the root directory in the Zookeeper includes a plurality of subdirectories storing node information of the node DW, such as subdirectories '/DW/clusterNodes/DW1' storing node information of the node DW 1. The node DW comprises a cluster state set, wherein the cluster state set comprises node information of all nodes in the root directory. The node DW may listen for changes in the Zookeeper's subdirectories and changes in the Zookeeper's directory data. And constructing a distributed cache system by using an initialization method and a cluster coordination starting method.

As shown in fig. 2, fig. 2 is a method for constructing a distributed cache system according to an embodiment of the present application, where the method for constructing a distributed cache system includes the following steps:

s201: and acquiring node change information in the root directory in the first time period.

The root directory is arranged in the Zookeeper and is used for recording node information of the initialized nodes.

The node change information is used for representing the change condition of the node initialization state, when one node initialization state is not completed to be completed, the node information of the node is added under the root directory, and when one node initialization state is not completed to be completed, the node information of the node is reduced under the root directory.

The user obtains the node information change condition under the root directory, namely the node change information.

S202: and sending node change information to all nodes under the root directory, so that each node under the root directory changes the corresponding cluster state set.

Each node has a respective cluster state set, and the cluster state set includes node information of all nodes under the root directory, where the node information includes a node name of each node under the root directory, a cache state of each node under the root directory, a cluster state of each node under the root directory, and so on.

And when the node information in the root directory changes, sending the node change information to all nodes under the root directory. All nodes under the root directory change the corresponding cluster state sets according to the node change information, and the node information included in the changed cluster state sets is the same as the node information under the root directory. Specifically, when the number of nodes under the root directory increases, node information is increased in the cluster state sets of all the nodes, and when the number of nodes under the root directory decreases, node information is decreased in the cluster state sets of all the nodes.

In one implementation, a target node is initialized; after the initialization of the target node is completed, the node information of the target node is written into a root directory, the node change information in the root directory comprises the node information of the target node, and each node under the root directory increases the node information of the target node in the corresponding cluster state set.

In another implementation manner, when one target node fails and the root directory includes node information of the target node, node information of the directory node is deleted from the root directory, node change information in the root directory includes node information of the target node, and each node under the root directory deletes node information of the target node in the corresponding cluster state set, that is, the modified cluster state set does not include node information of the target node.

S203: when the number of node information in the cluster state set of each node under the root directory is larger than or equal to the first preset number, constructing a distributed cache system by using the nodes in the cluster state set.

The number of the node information in the cluster state set characterizes the number of the nodes which are initialized and added to the root directory, and when the number of the nodes in the root directory is greater than or equal to the first preset number, the nodes in the root directory can be used for carrying out cache initialization, so that the distributed cache system is built by using the nodes in the cluster state set. For example, when the first preset number is 5 and the number of node information in the cluster state set of each node under the root directory is greater than or equal to 5, a distributed cache system is constructed to store data by using the distributed cache system.

In the method shown in fig. 2, node information is added under the root directory first, the number of nodes in the root directory is judged, and when the number is greater than or equal to a first preset number, the distributed cache system is started, so that the safe starting of the cluster can be well ensured. Meanwhile, the built-in distributed caching capability is provided, and the distributed caching is bound with a cluster mechanism, so that the availability of clusters is ensured.

In one implementation, the node information further includes a cache state of each node under all root directories and a cluster state of each node under the root directory. And in the cluster state set of each node under the root directory, setting the cache state of the node as an unbuffered state, and constructing a distributed cache system by using the nodes in the cluster state set. As shown in fig. 3, fig. 3 is a schematic diagram of a method for constructing a distributed cache system according to an embodiment of the present application, where the method for constructing a distributed cache system includes the following steps:

S301: and constructing an initial distributed key value storage system, and adding all nodes under the root directory into the initial distributed key value storage system to obtain a distributed cache system.

S302: and changing the cache state of the node from the non-cache state to the normal cache state.

The node is used for representing whether the node can perform caching, and after the node is added into the distributed caching system, the caching state of the node is changed from an unbuffered state to a normal caching state, which indicates that the node is ready to perform data caching.

S303: if the number of the nodes in the distributed cache system is greater than or equal to the second preset number and the cache states of the nodes in the distributed cache system are all normal cache states, setting the cluster states of the nodes in the distributed cache system as complete.

The second preset number is the number of nodes required by the preset data caching, and the cluster state is used for representing whether the cluster formed by all the nodes can perform the data caching.

When the number of the nodes in the distributed cache system is greater than or equal to the second preset number, judging whether the nodes in the distributed cache system are accurate or not to perform data caching, and when the cache states of all the nodes are normal cache states, setting the cluster states of the nodes to be complete, and representing the cluster where the nodes are located to perform data caching.

S304: and carrying out distributed caching on the data by using nodes in the distributed caching system.

In the process of performing distributed caching, there may be a situation that nodes increase and nodes decrease, and when the nodes increase and decrease, nodes in all distributed caching systems pause distributed caching.

In one implementation, when a node is newly added in the distributed cache system, the cache state of all nodes in the distributed cache system is set to be a non-cache state, the data stored by each node in the distributed cache system is redistributed and stored in all nodes in the distributed cache system, and then the cache state of all nodes in the distributed cache system is set to be a normal cache state. On this basis, the step of S303 is performed again, and it is determined whether or not caching is possible.

In another implementation manner, after a node in the distributed cache system exits, the cache state of all nodes in the distributed cache system is set to be an unbuffered state, the data stored by each node in the distributed cache system is redistributed and stored in all nodes in the distributed cache system, and then the cache state of all nodes in the distributed cache system is set to be a normal cache state. On this basis, the step of S303 is performed again, and it is determined whether or not caching is possible.

The local cache is also called a single machine cache, and the cache data is only put in the memory of the local process. The hash table of the java program is a local cache. With respect to a single machine local cache, the data of the distributed cache is distributed over a plurality of nodes of the cluster. For example, 10000 pieces of data are available, 10000 pieces of data are all locally cached by a single machine, and the distributed cache distributes the data to all nodes of the cluster according to a certain algorithm. For example, the total number of cluster nodes is 2, with 5000 on node 1 and 5000 on node 2. Meanwhile, the caching technology does not use an external caching technology such as redis which runs independently, but is cached in all node processes of the cluster, so that the caching technology is also called embedded distributed caching. A third party open source assembly infinispan may be used for support.

The alarm data is generated by network equipment of a communication network or a related IT system, and the alarm data is collected and processed by an alarm monitoring system in a stipulated format. Each piece of alarm data has a plurality of service attributes, such as home city, home equipment manufacturer, network type, secondary specialty, severity level, alarm title, alarm content, dispatch status, etc. The data volume processed by the alarm monitoring system is relatively large, and an application component generally needs to adopt a cluster mode to deal with the data volume.

As shown in fig. 4, fig. 4 is a schematic diagram of a method for constructing a distributed cache system according to an embodiment of the present application, where the method for constructing a distributed cache system includes the following steps:

S401: and initializing the node by using a node initialization method.

Firstly, a basic environment of a distributed cache system is built, the basic environment of the distributed cache system is built by using an x86 server or a k8s container, and a Zookeeper software package is installed under the server. Wherein the Zookeeper comprises a root directory which is used for storing the cluster and the state of the single node in the cluster.

The application component name is called DW, also called node, multiple nodes form application cluster, the first preset number of DW application cluster is 5, namely the number of minimum cluster nodes is 5, the number of cluster scale can be 10, and the cluster constructed by the application can dynamically expand capacity according to the size of the processed data volume in the running process, namely the number of nodes is increased or reduced. Wherein 10 nodes have a sequential start-up sequence. The application component DW can start 10 nodes in a pod mode through k8s scheduling, and can also start 10 ordinary process nodes on 10 physical machines or virtual machines through a script mode.

When node 1 starts up, the cluster configuration of the node is read, wherein the cluster configuration includes root directory clusterStatePath of minimum cluster size initClusterSize, zookeeper, the node name currentNodeName.

And initializing the node by using an initialization method. Specifically, first, a minimum cluster size initClusterSize =5, and a root directory clusterStatePath = '/DW/clusterNodes', currentNodeName of the zookeeper is set as a node name, where node 1 is denoted as DW1. The cluster state set of nodes is then initialized and node information is added to the root directory.

Initializing a node cluster state set. The state variables of the nodes are isClusterReady, isClusterReady in cluster state, which is used for indicating whether the cluster is initially finished, default to false, nodesStatusMap is node state set, default to null, cachesStatusMap is buffer state set, default to null, and other initialization variables include initClusterSize =5, clusteritatepoth= '/DW/clusterNodes', currentNodeName =dw1.

The node information is added to the root directory. Judging whether the root directory clusterStatePath on the Zookeeper exists or not, and if the root directory clusterStatePath on the Zookeeper does not exist, creating a Zookeeper root directory of a persistence type. The number of node information in the root directory clusterStatePath is read, specifically, the sub-directory size in the root directory clusterStatePath is obtained, one sub-directory represents one DW node, and if the number of sub-directories is 0, the cluster state isClusterReady is written in the root directory clusterStatePath as false, which indicates that the cluster is in an unfinished state. Next, it is determined whether a directory of node names currentNodeName exists in the root directory clusterStatePath, when the roll name of the target node is DW1, it is determined whether '/DW/clusterNodes/DW1' exists, if not, a temporary type Zookeeper directory is created, and the cluster state isClusterReady of the target node is written as false, which indicates that the node is in an incomplete state.

By the method, the node 2, the node 3 and the … …, the node 10 are initialized, and node names currentNodeName of the node 2, the node 3 and the … …, namely DW2, DW3 and … … DW10 are respectively set.

S402: and sending node change information to all nodes under the root directory, so that each node under the root directory changes the corresponding cluster state set.

The DW node creates a subdirectory under the root directory clusterStatePath that represents the temporary type of the node itself, and the cluster management component listens to the node change information under the root directory clusterStatePath of the Zookeeper, because each subdirectory of the root directory clusterStatePath represents a node of the DW cluster, the node change information is the change information of the subdirectory, and the change of the subdirectory represents that the DW cluster node is increased or decreased.

Each time a node listens for a change, all subdirectories on the Zookeeper are compared to the local set of node states NodesStatusMap. When a new node is found, a record is added to the node state set NodesStatusMap, and when a node is deleted, a record NodesStatusMap is deleted to the node state set NodesStatusMap as a key. Wherein, key is the name of the subdirectory, namely the node name, value is the state data in this subdirectory, when there is state data, value is tru, when there is no state data, value defaults to false.

When a node is added, the Zookeeper adds a subdirectory representing the node in the root directory clusterStatePath, and sends node addition information to all running nodes under the root directory, and when the node is added, records are added in NodesStatusMap and register and monitor the data change of the subdirectory, which means that a new node is added into the cluster. When a node is down, the Zookeeper deletes the subdirectory representing the node under the root directory clusterStatePath, and sends node deletion information to all running nodes under the root directory, if the deletion is found, the record is deleted from NodesStatusMap, and meanwhile, the registration is removed to monitor the data change of the subdirectory, which means that a node exits from the cluster.

The cluster management component registers and monitors the data change of the subdirectory under the root directory clusterStatePath, wherein each subdirectory data change means that the state of a certain node is changed, the state data is true or false, if the state of the node is true, the initialization of the certain node can be completed, and if the state of the node is false, the initialization of the certain node is not completed or the fault occurs.

On the basis of the above steps, subdirectories representing DW1, DW2, DW3, etc. are created in the root directory clusterStatePath, respectively, and the initial node states are false. After DW1 through DW10 are added to the root directory, each node has 10 records NodesStatusMap, keys are DW1, DW2, … … DW10, respectively, and all nodes have values in NodesStatusMap of false. In the operation process, when node DW9 has not monitored the node information of node DW10 after node DW10 creates the subdirectory, there may be only 9 records in NodesStatusMap of node DW9, and when node DW9 monitors that the update is completed, 10 records in NodesStatusMap of node DW9 are changed.

In the above step, since the Zookeeper correlation method multi-client is secure, there is no case where the correlation methods of multiple DW nodes have a conflict.

S403: when the number of node information in the cluster state set of each node under the root directory is larger than or equal to the first preset number, constructing a distributed cache system by using the nodes in the cluster state set.

And after the initialization of the node DW1 is completed, calling the cluster coordination start of the cluster management component to construct a distributed cache system. The cluster coordination starting method is used for judging whether all nodes meet the starting of the nodes, if not, the nodes always wait in a circulating mode, and the nodes do not exit until the requirements of the distributed cache system are met.

Specifically, it is necessary to determine whether the number of nodes in the root directory clusterStatePath is greater than or equal to initClusterSize a first preset number. By way of example, a distributed cache system may be constructed when there are 5 nodes in root directory clusterStatePath.

Using infinispan to create an initial distributed key value storage system named cache_alarm, the nodes in root directory clusterStatePath are added to the cache_alarm, resulting in a distributed cache system. Wherein infinispan controls the security of the creation of multiple DW nodes, ensuring that cache_alarm is created correctly.

In one embodiment, the node DW1 and the node DW2 are created at the same time, and DW1 and DW2 are added to the cache_alarm of the distributed cache system as members of the cache_alarm of the distributed cache system, and after the subsequent DW3 and … … DW10 are created, they are also added to the cache_alarm list of the distributed cache system.

S404: if the number of the nodes in the distributed cache system is greater than or equal to the second preset number and the cache states of the nodes in the distributed cache system are all normal cache states, setting the cluster states of the nodes in the distributed cache system as complete, and carrying out distributed cache on the data by using the nodes in the distributed cache system.

The cluster management component registers for state changes of the snoop distributed cache system cache_alarm. Judging whether the member list size of the cache_alarm is larger than or equal to a second preset number through an API (application program interface) of infinispan, wherein the second preset number is initClusterSize; whether the cache states of the cache_alarm in CachesStatusMap are all normal cache states or not is judged. When the number of the nodes in the distributed cache system is greater than or equal to the second preset number, and the cache states of the nodes in the distributed cache system are all normal cache states, the state preparation of the nodes is indicated to be completed, the state value true is written into the Zookeeper directory '/DW/clusterNodes/DW1', the nodes in the distributed cache system can be utilized for carrying out distributed cache on the data, otherwise, the state value false is written into the Zookeeper directory '/DW/clusterNodes/DW1', whether the clusters are ready to be completed is judged every five seconds, judgment is carried out once, and the value of isClusterReady is modified.

In the process of data caching, when a node joins or exits the distributed cache system, the distributed cache system pauses caching.

In one implementation, a cache_started event of the cache_alarm of the distributed cache system is monitored, where the cache_started event indicates that the cache_alarm of the distributed cache system is started successfully, a status record is written in CachesStatusMap, and key= 'cache_alarm', value = true.

In another implementation, the DATAREHASHEDEVENT event of the cache_alarm of the distributed cache system is monitored, and the DATAREHASHEDEVENT event is triggered when a node joins the cache_alarm of the distributed cache system or exits the member list of the cache_alarm of the distributed cache system, indicating that the cache_alarm of the distributed cache needs to redistribute data due to a change of the node.

Specifically, when a node is newly added in the distributed cache system, the cache state of all nodes in the distributed cache system is set to be a non-cache state, the originally cached data is redistributed in all nodes in the distributed cache system, and then the cache state of all nodes in the distributed cache system is set to be a normal cache state. When a node in the distributed cache system exits, setting the cache state of all nodes in the distributed cache system to be a non-cache state, redistributing originally cached data in all nodes in the distributed cache system, and then setting the cache state of all nodes in the distributed cache system to be a normal cache state. For example, if the state value of DATAREHASHEDEVENT events is not complete, the value of the record of key= 'cache_alarm' in CachesStatusMap is modified to false, and the state value of the record of key= 'cache_alarm' in CachesStatusMap is modified to true.

On this basis, the status of the sequential clusters is printed every 30 seconds, and the cluster status mainly includes the node names and node statuses of the root directory CachesStatusMap, the status values of the distributed cache system cache_alarm and the distributed cache system cache_alarm, and the node names and node statuses of the node status set NodesStatusMap.

After the distributed cache system has been running for a period of time, a node DW11 is added. During the start-up of the node DW11, a '/DW/clusterNodes/DW11' directory is created on the Zookeeper after the initialization is completed, and the initial value is false. Other DWs 1 to 10 that are already operating normally will listen to each update the node state set NodesStatusMap, add a record in the node state set NodesStatusMap, key= 'DW11', value=false. At this time, the processing of the existing cluster node is not affected. After the DW11 node completes the cluster coordination starting method, one member is added to the cache_alarm of the distributed cache system, and infinispan redistributes the original 10-node data to 11 nodes. The DW11 modifies the state value of the '/DW/clusterNodes/DW11' directory on the Zookeeper to true. The other DWs 1 through 10 will listen to each value of 'DW11' in the updated node state set NodesStatusMap as true.

After the distributed cache system has been running for a period of time, one node is deleted, e.g., DW9 down, the Zookeeper deletes the '/DW/clusterNodes/DW9' directory, and the remaining nodes remove records in the node state set NodesStatusMap whose key is DW 9. At the same time infinispan removes DW9 members and redistributes the data of the distributed cache system cache_alarm. The dynamic deleting of the node is completed. After the cluster is started, the number of cluster nodes is not limited, and the cluster can continue to run even if only one node exists. However, when the number of nodes in the cluster is less than initClusterSize, the newly added nodes cannot be started normally, and the newly added nodes are directly continued, so that the number of all the nodes is greater than or equal to initClusterSize.

In a specific implementation manner, the alarm service data volume is increased from 1000W/day to 4000W/day, the application component DW with the cluster capability can be upgraded, and according to the gradual increase of the data volume, the nodes of the application component DW are dynamically expanded, from 5 nodes to 7 nodes and then to 10 nodes, so that the service processing capability is seamlessly increased. The application component itself need not be modified.

In the embodiment shown in fig. 4, the application cluster mechanism can well ensure the safe starting of the cluster and the dynamic expansion and contraction capacity of the cluster, provide the built-in distributed caching capability, bind the distributed caching with the cluster mechanism and ensure the availability of the cluster. On the basis, the business data processing can be easily expanded to new nodes according to the change of the clusters, or the business data of abnormal nodes can be drifted to the existing healthy nodes.

The embodiment of the application can divide the functional modules or functional units of the device of the distributed cache system according to the method example, for example, each functional module or functional unit can be divided corresponding to each function, and two or more functions can be integrated in one processing module. The integrated modules may be implemented in hardware, or in software functional modules or functional units. The division of the modules or units in the embodiment of the present application is schematic, which is merely a logic function division, and other division manners may be implemented in practice.

Fig. 5 is a schematic structural diagram of an apparatus for constructing a distributed cache system according to an embodiment of the present application, where the apparatus includes an obtaining module 501 and a processing module 502.

The obtaining module 501 is configured to obtain node change information in a root directory in a first period, where the root directory is configured to record node information after initialization is completed; for example, the obtaining module 501 is configured to perform S201 described above.

The processing module 502 is configured to send node change information to all nodes under the root directory, so that each node under the root directory changes its corresponding cluster state set, where the changed cluster state set includes node information identical to the node information under the root directory; for example, the processing module 502 is configured to execute S202 described above.

The processing module 502 is configured to construct a distributed cache system by using the nodes in the cluster state set when the number of node information in the cluster state set of each node in the root directory is greater than or equal to the first preset number. For example, the processing module 502 is configured to execute S203 described above.

A possible implementation manner, the processing module 502 is specifically configured to initialize a target node; the processing module 502 is specifically configured to write node information of the target node into the root directory after the initialization of the target node is completed; the processing module 502 is specifically configured to include node information of the target node, and the modified cluster state set includes node information of the target node.

In a possible implementation manner, the processing module 502 is specifically configured to delete node information of a target node under the root directory from the root directory when one target node fails and the root directory includes node information of the target node; the processing module 502 is specifically configured to include node information of the target node, and the modified cluster state set does not include node information of the target node.

In one possible implementation manner, the node information further includes a cache state of each node under all root directories and a cluster state of each node under the root directory, and the processing module 502 is specifically configured to write the node information of the target node into the root directory, and further includes: setting the caching state of the node as a non-caching state, and setting the cluster state of the node as unfinished; the processing module 502 is specifically configured to construct a distributed cache system by using nodes in the cluster state set, and includes: constructing an initial distributed key value storage system, and adding all nodes under a root directory into the initial distributed key value storage system to obtain a distributed cache system; the processing module 502 is specifically configured to change the cache state of the node from an uncacheable state to a normal cache state.

A possible implementation manner, the processing module 502 is specifically configured to set the cluster state of the nodes in the distributed cache system to be complete if the number of the nodes in the distributed cache system is greater than or equal to a second preset number and the cache states of the nodes in the distributed cache system are all normal cache states; and carrying out distributed caching on the data by using nodes in the distributed caching system.

In one possible implementation manner, the processing module 502 is specifically configured to, after performing distributed caching on data by using a node in the distributed cache system, further include: when a node is newly added in the distributed cache system, setting the cache state of all nodes in the distributed cache system as a non-cache state; the processing module 502 is specifically configured to set the cache states of all the nodes in the distributed cache system to be normal cache states after the data stored by each node in the distributed cache system is restored to the nodes in the distributed cache system.

In one possible implementation manner, the processing module 502 is specifically configured to set the cluster state of the node in the distributed cache system to be complete, and further includes: after the nodes in the distributed cache system exit, setting the cache state of the nodes in the distributed queue to be a non-cache state; the processing module 502 is specifically configured to set the cache states of all the nodes in the distributed cache system to be normal cache states after the data stored by each node in the distributed cache system is restored to the nodes in the distributed cache system.

It will be appreciated that the above-described apparatus for constructing a distributed cache system may also be implemented in hardware. For example, when implemented in hardware, the acquisition module 501 in embodiments of the present application may be integrated on a communication interface and the processing module 502 may be integrated on a processor. For another example, when implemented in hardware, the processing module 502 and the acquisition module 501 in embodiments of the present application are integrated on a processor. The hardware configuration may be as shown in fig. 6.

Fig. 6 shows a schematic diagram of one possible hardware configuration of the apparatus for constructing the distributed cache system according to the above embodiment. The device for constructing the distributed cache system comprises: a processor 602. Optionally, the apparatus for constructing the distributed cache system further includes: a communication interface 603, a memory 601 and a bus 604.

The processor 602 is configured to control and manage actions of the devices of the distributed cache system construction, e.g., to perform the steps performed by the processing module 502 described above, and/or to perform other processes of the techniques described herein. Optionally, the processor 602 may also perform the steps performed by the acquisition module 501 and the processing module 502 described above. The processor 602 may be implemented or realized with the various illustrative logical blocks, modules, and circuits described in connection with the present disclosure. The processor may be a central processing unit, a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor may also be a combination that performs the function of a computation, e.g., a combination comprising one or more microprocessors, a combination of a DSP and a microprocessor, etc.

The communication interface 603 is used to support communication between the device for building a distributed cache system and other network entities, for example, to perform the steps performed by the acquisition module 501.

The memory 601 is used to store program codes and data of the constructed devices of the distributed cache system. For example, the memory 601 may be a memory or the like in a device of the construction of the distributed cache system, which may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, hard disk or solid state disk; the memory may also comprise a combination of the above types of memories.

Bus 604 may be an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus or the like. The bus 604 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 6, but not only one bus or one type of bus.

Fig. 7 is a schematic structural diagram of a chip 70 according to an embodiment of the present application. Chip 70 includes one or more (including two) processors 701. Optionally, the chip 70 further comprises a communication interface 703, a bus 702 and a memory 704.

Wherein the processor 701 may implement or execute the various exemplary logic blocks, elements, and circuits described in connection with the present disclosure. The processor may be a central processing unit, a general purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, units and circuits described in connection with this disclosure. The processor may also be a combination that performs the function of a computation, e.g., a combination comprising one or more microprocessors, a combination of a DSP and a microprocessor, etc.

Memory 704 may include read-only memory and random access memory, and provides operating instructions and data to processor 701. A portion of memory 704 may also include non-volatile random access memory (non-volatile random access memory, NVRAM).

In some implementations, the memory 704 stores elements, execution modules or data structures, or a subset thereof, or an extended set thereof.

In an embodiment of the present application, the corresponding operation is performed by calling an operation instruction stored in the memory 704 (the operation instruction may be stored in an operating system).

Memory 704 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, hard disk or solid state disk; the memory may also comprise a combination of the above types of memories.

Bus 702 may be an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus or the like. Bus 702 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one line is shown in fig. 7, but not only one bus or one type of bus.

From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be implemented by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to implement all or part of the functions described above. The specific working processes of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which are not described herein.

Embodiments of the present application provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the method embodiments described above.

The embodiment of the application also provides a computer readable storage medium, wherein instructions are stored in the computer readable storage medium, and when the instructions run on a computer, the computer is caused to execute the method in the method flow shown in the method embodiment.

The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (Random Access Memory, RAM), a read-only memory (ROM), an erasable programmable read-only memory (Erasable Programmable Read Only Memory, EPROM), a register, a hard disk, an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing, or any other form of computer readable storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an Application SPECIFIC INTEGRATED Circuit (ASIC). In embodiments of the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Since the apparatus, the computer readable storage medium, and the computer program product in the embodiments of the present application can be applied to the above-mentioned method, the technical effects obtained by the method can also refer to the above-mentioned method embodiments, and the embodiments of the present application are not described herein again.

In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interface, indirect coupling or communication connection of devices or units, electrical, mechanical, or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The present application is not limited to the above embodiments, and any changes or substitutions within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

1. A method of constructing a distributed cache system, the method comprising:

acquiring node change information in a root directory in a first time period, wherein the root directory is used for recording node information after initialization is completed;

Transmitting the node change information to all nodes under the root directory, so that each node under the root directory changes a corresponding cluster state set, wherein the changed cluster state set comprises the same node information as the node information under the root directory;

when the number of node information in the cluster state set of each node under the root directory is larger than or equal to a first preset number, constructing a distributed cache system by using the nodes in the cluster state set.

2. The method according to claim 1, wherein the method further comprises:

initializing a target node;

After the initialization of the target node is completed, the node information of the target node is written into the root directory;

The node change information includes node information of the target node, and the modified cluster state set includes node information of the target node.

3. The method according to claim 1, wherein the method further comprises:

deleting node information of the target node under the root directory from the root directory under the condition that one target node fails and the root directory comprises the node information of the target node;

The node change information includes node information of the target node, and the modified cluster state set does not include node information of the target node.

4. The method of claim 2, wherein the node information further includes a cache state of each node under the root directory and a cluster state of each node under the root directory, and wherein the writing the node information of the target node to the root directory further comprises: setting the caching state of the node as an unbuffering state, and setting the cluster state of the node as unfinished;

The constructing a distributed cache system by using the nodes in the cluster state set comprises the following steps:

constructing an initial distributed key value storage system, and adding all nodes under the root directory into the initial distributed key value storage system to obtain the distributed cache system;

the method further comprises the steps of:

And changing the caching state of the node from the non-caching state to the normal caching state.

5. The method according to claim 4, wherein the method further comprises:

If the number of the nodes in the distributed cache system is greater than or equal to a second preset number and the cache states of the nodes in the distributed cache system are all normal cache states, setting the cluster states of the nodes in the distributed cache system as complete;

And carrying out distributed caching on the data by utilizing the nodes in the distributed caching system.

6. The method of claim 4, wherein after the data is distributed by the node in the distributed cache system, further comprising:

when a node is newly added in the distributed cache system, setting the cache state of all nodes in the distributed cache system to be a non-cache state;

and after the data stored by each node in the distributed cache system are restored to the nodes in the distributed cache system, setting the cache states of all the nodes in the distributed cache system to be normal cache states.

7. The method of claim 4, wherein setting the cluster state of the nodes in the distributed cache system to complete further comprises:

After the nodes in the distributed cache system exit, setting the cache states of all the nodes in the distributed queue to be non-cache states;

8. An apparatus for constructing a distributed cache system, wherein the apparatus comprises an acquisition module and a processing module;

The acquisition module is used for acquiring node change information in a root directory in a first time period, wherein the root directory is used for recording node information after initialization is completed;

The processing module is further configured to send the node change information to all nodes under the root directory, so that each node under the root directory changes a corresponding cluster state set, where the changed cluster state set includes node information that is the same as the node information under the root directory;

and the processing module is further configured to construct a distributed cache system by using the nodes in the cluster state set when the number of node information in the cluster state set of each node under the root directory is greater than or equal to a first preset number.

9. An apparatus for constructing a distributed cache system, comprising: a processor and a communication interface; the communication interface being coupled to the processor for running a computer program or instructions to implement a method of construction of a distributed cache system as claimed in any one of claims 1 to 7.

10. A computer readable storage medium having instructions stored therein, which when executed by a computer, performs a method of constructing a distributed cache system as claimed in any one of claims 1 to 7.