CN108829720B

CN108829720B - Data processing method and device

Info

Publication number: CN108829720B
Application number: CN201810428264.5A
Authority: CN
Inventors: 孙蔚; 李涛
Original assignee: Qilin Hesheng Network Technology Inc
Current assignee: Qilin Hesheng Network Technology Inc
Priority date: 2018-05-07
Filing date: 2018-05-07
Publication date: 2022-01-14
Anticipated expiration: 2038-05-07
Also published as: CN108829720A

Abstract

The embodiment of the application provides a data processing method and a data processing device, wherein the method comprises the following steps: determining a first master node and a first slave node in a plurality of first nodes included in a data storage cluster by using a distributed consistency algorithm; determining a second master node and a second slave node in a plurality of second nodes included in the data cache cluster by using a distributed consistency algorithm; updating data in the first main node; after the data of the first main node is updated, the first slave node automatically synchronizes the data in the first main node; updating data in a second main node according to the data updating result of the first main node; and the second slave node automatically synchronizes the data in the second master node after the data update of the second master node is completed. By the embodiment of the application, the consistency of the data stored by each data node can be ensured, the single-point risk of the data is avoided, and the data access experience of a user is improved.

Description

Data processing method and device

Technical Field

The present application relates to the field of data processing, and in particular, to a data processing method and apparatus.

Background

At present, in order to deal with the problem of data access in global high-concurrency scenes and ensure that the same data read by users in various regions has the same value, a data storage node of one country is generally used as a main data node, the main data node is responsible for writing data, and the data is synchronized to data nodes of other regions through a message mechanism or a task mechanism, so that the data access in the global high-concurrency scenes is realized, and the consistency of the data is ensured.

However, in the above process, when the main data node fails to cause the stored data to be incorrect, the data synchronized by other data nodes may also be incorrect, that is, there is a single point risk of the data. Therefore, a technical solution is needed to be provided to ensure consistency of data stored in each data node, avoid a single point risk of the data, and improve data access experience of a user.

Disclosure of Invention

The embodiment of the application aims to provide a data processing method and a data processing device, so that the consistency of data stored by each data node is ensured, the single-point risk of the data is avoided, and the data access experience of a user is improved.

To achieve the above purpose, the embodiments of the present application are implemented as follows:

in a first aspect, an embodiment of the present application provides a data processing method, including:

determining a first master node and a first slave node in a plurality of first nodes included in a data storage cluster by using a distributed consistency algorithm; wherein the data storage cluster comprises a plurality of first nodes, the first nodes being configured to store data;

determining a second master node and a second slave node in a plurality of second nodes included in the data cache cluster by using the distributed consistency algorithm; the data cache cluster comprises a plurality of second nodes, and the second nodes are used for caching data;

updating data in the first main node; after the data of the first main node is updated, the first slave node automatically synchronizes the data in the first main node;

updating data in the second main node according to the data updating result of the first main node; and the second slave node automatically synchronizes the data in the second master node after the data update of the second master node is completed.

In a second aspect, an embodiment of the present application provides a data processing apparatus, including:

a first determining unit, configured to determine, by using a distributed consistency algorithm, a first master node and a first slave node from among a plurality of first nodes included in a data storage cluster; wherein the data storage cluster comprises a plurality of first nodes, the first nodes being configured to store data;

a second determining unit, configured to determine, by using the distributed consistency algorithm, a second master node and a second slave node in a plurality of second nodes included in the data cache cluster; the data cache cluster comprises a plurality of second nodes, and the second nodes are used for caching data;

a first updating unit, configured to perform data updating in the first master node; after the data of the first main node is updated, the first slave node automatically synchronizes the data in the first main node;

a second updating unit, configured to update data in the second host node according to a data update result of the first host node; and the second slave node automatically synchronizes the data in the second master node after the data update of the second master node is completed.

In a third aspect, an embodiment of the present application provides a data processing apparatus, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the data processing method as described in the first aspect above.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the data processing method according to the first aspect.

In the embodiment of the application, a first master node and a first slave node are determined in a data storage cluster, a second master node and a second slave node are determined in a data cache cluster, when data updating is performed, data updating is performed in the first master node firstly, then data updating is performed in the second master node according to a data updating result of the first master node, wherein the data in the first master node is automatically synchronized by the first slave node after the data updating of the first master node is completed, and the data in the second master node is automatically synchronized by the second slave node after the data updating of the second master node is completed. Therefore, by the embodiment of the application, the data can be automatically synchronized to other slave nodes after the data is updated, so that the consistency of the data stored by each data node is ensured, and the data access experience of a user is improved. In addition, the first slave node can automatically synchronize the data in the first master node, so that multiple copies of the data can exist, single-point risks of the data can be avoided, a user is prevented from reading wrong data, and data access experience of the user is improved. In addition, in the embodiment of the application, data updating operation does not need to be actively executed in each node, and the slave nodes can automatically synchronize data in the master node, so that the data updating mode is simple, and the data updating efficiency is high.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

Fig. 1 is a schematic view of an application scenario of a data processing method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a data processing method according to an embodiment of the present application;

fig. 3 is a schematic diagram of determining a master node and a slave node according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a data modification process provided in an embodiment of the present application;

FIG. 5 is a schematic view of a data modification process according to another embodiment of the present application;

FIG. 6 is a schematic diagram of a data query process according to an embodiment of the present application;

fig. 7 is a schematic block diagram of a data processing apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a data processing device according to an embodiment of the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In order to ensure consistency of data stored in each data node, avoid a single point risk of the data, and improve data access experience of a user, embodiments of the present application provide a data processing method and apparatus, which are described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic view of an application scenario of a data processing method according to an embodiment of the present application, as shown in fig. 1, the scenario includes a data update cluster 10, a data storage cluster 20, a data cache cluster 30, and a read record cache cluster 40, in fig. 1, the data update cluster 10 may include at least one node for performing data update, where the node may be a server or a server cluster. The data storage cluster 20 comprises at least one first sub-cluster 201, each first sub-cluster 201 comprises at least one first node 202, the first node 202 is used for storing data, and the first node 202 may be a server or a server cluster. The data caching cluster 30 comprises at least one second sub-cluster 301, each second sub-cluster 301 comprises at least one second node 302, the second node 302 is used for caching data, and the second node 302 can be a server or a server cluster. The read record caching cluster 40 comprises at least one third sub-cluster 401, each third sub-cluster 401 comprises at least one third node 402, the third node 402 is used for caching data read records, and the third node 402 may be a server or a server cluster.

It should be noted that the first sub-cluster 201, the second sub-cluster 301, and the third sub-cluster 401 may be sub-clusters of each region, and the region may be a country or a province. The first node 202 may be permanently storing data and the data cached in the second node 302 and the third node 402 may have a fixed refresh period, such as refreshing the cache every other day.

The data processing method in the embodiment of the present application may be executed by a node in the data update cluster 10 in fig. 1, where the node is used for performing data update. Fig. 2 is a schematic flow chart of a data processing method according to an embodiment of the present application, and as shown in fig. 2, the method includes:

step S202, determining a first master node and a first slave node in a plurality of first nodes included in a data storage cluster by using a distributed consistency algorithm; the data storage cluster comprises a plurality of first nodes, and the first nodes are used for storing data;

step S204, determining a second main node and a second slave node in a plurality of second nodes included in the data cache cluster by using a distributed consistency algorithm; the data caching cluster comprises a plurality of second nodes, and the second nodes are used for caching data;

step S206, updating data in the first main node; after the data of the first main node is updated, the first slave node automatically synchronizes the data in the first main node;

step S208, updating data in the second main node according to the data updating result of the first main node; and the second slave node automatically synchronizes the data in the second master node after the data update of the second master node is completed.

In the embodiment of the present application, the data update includes, but is not limited to, data addition, data deletion, data modification, and the like.

In the step S202, a first master node and a first slave node may be determined in all first nodes included in the data storage cluster by using a raft distributed consistency algorithm, and in the step S204, a second master node and a second slave node may be determined in all second nodes included in the data cache cluster by using a raft distributed consistency algorithm. Further, a third master node and a third slave node may also be determined among all third nodes included in the read record cache cluster by using a raft distributed consistency algorithm.

Based on the principle of the raft distributed consistency algorithm, the first master node can control each first slave node to perform data updating operation, the second master node can control each second slave node to perform data updating operation, and the third master node can control each third slave node to perform data updating operation. In this embodiment and the following description, the master node is a leader in the raft algorithm, and the slave node is a follower in the raft algorithm.

Fig. 3 is a schematic diagram of determining master and slave nodes according to an embodiment of the present application, as shown in fig. 3, a first sub-cluster and a second sub-cluster are both sub-clusters corresponding to different regions, a first master node is determined in the first sub-cluster in the region a according to a raft algorithm, a first node in the first sub-cluster in the region B and the region C is a first slave node, similarly, a second master node is determined in the second sub-cluster in the region a according to a raft algorithm, a second node in the second sub-cluster in the region B and the region C is a second slave node, a third master node is determined in a certain third sub-cluster according to a raft algorithm, and a third node in other third sub-clusters is a third slave node.

In step S206, data update is performed in the first master node, where the data update includes data deletion, addition, modification, and the like. Specifically, some data stored in the first host node may be deleted or modified, and some data may be added to the first host node.

Since the first master node and the first slave node are determined based on a distributed consistency algorithm, the first master node may control part or all of the first slave nodes to synchronize part or all of the data in the first master node. Therefore, in this embodiment, in step S206, the first slave node automatically synchronizes the data in the first master node after the data update of the first master node is completed based on the control function of the first master node. The first slave node may automatically synchronize all data in the first master node, or synchronize updated data in the first master node.

In a specific embodiment, three to five first slave nodes automatically synchronize all data in the first master node, so that the data in the first master node is stored in multiple copies, thereby avoiding a single point of risk of data.

In this embodiment of the application, after the first slave node completes synchronization of data in the first master node, it is determined that data update of the data storage cluster is completed, and then, in step S208, data update is performed in the second master node according to a data update result of the first master node, where the data update result of the first master node is the same as the data update result of the second master node, that is, the same data is added, modified, or deleted. In this embodiment, it is preferable that the data cached by each second node (including the second master node and the second slave node) is the same.

In the embodiment, when data is updated, the data is updated in the data cache cluster and the data storage cluster respectively, and the data updating results of the data cache cluster and the data storage cluster are the same, so that a user can conveniently and directly obtain required data from the data cache cluster, the data query experience is improved, and the cluster response speed is optimized to the greatest extent.

In the advertisement scene, when data is updated, the data is updated in the data cache cluster and the data storage cluster respectively, and the data updating results of the data cache cluster and the data storage cluster are the same, so that the reading performance of the advertisement data in the concurrent scene, particularly the reading performance of the hotspot advertisement data, can be improved, and the response speed of the clusters is optimized to the maximum extent.

Based on the foregoing discussion, since the second master node and the second slave node are determined based on a distributed consistency algorithm, the second master node may control part or all of the second slave nodes to synchronize part or all of the data in the second master node. Therefore, in this embodiment, in step S208, the second slave node automatically synchronizes the data in the second master node after the data update of the second master node is completed based on the control function of the second master node. The second slave node may automatically synchronize all data in the second master node, or synchronize updated data in the second master node.

In a specific embodiment, all the second slave nodes automatically synchronize all the data in the second master node, so that the user can query any one of the second slave nodes to obtain the required data.

The Raft algorithm is a consistency algorithm in a distributed system, and is an algorithm with higher speed, convenient implementation and lower learning cost. The raft algorithm selects the main node from the nodes by using a voting mechanism, and periodically votes the main node through a heartbeat communication mechanism so as to ensure that the main node is determined again after the main node fails and ensure the normal operation of the cluster. In the embodiment of the application, the first master node and the second master node are determined based on the raft algorithm, so that the first master node controls each first slave node to perform data updating operation, and the second master node controls each second slave node to perform data updating operation.

In this embodiment of the present application, updating data in the first master node includes: modifying data in the first main node; correspondingly, the method in this embodiment further includes:

before data modification is carried out in a first host node, determining a read record of data to be modified, wherein the uncached read time in a read record cache cluster is within a specified time range;

the read record cache cluster is used for caching data read records, and the specified time range comprises a preset time range from the current time to the previous time.

Specifically, each node in the read record cache cluster is used for caching data read records, taking the data as a key-value data format as an example, when data is read from the data storage cluster or the data cache cluster, the key of the read data cached by the record cache cluster is read, a corresponding read timestamp is cached, and the key and the corresponding read timestamp jointly form the data read record.

In this embodiment, before data modification is performed in the first master node, it is determined whether a read record of data to be modified is cached in the read record cache cluster, and the read time of the data to be modified is within a preset time range from a current time to the current time. The data to be modified may be in a key-value data format, the preset time range may be 3 seconds or 5 seconds, preferably, the preset time range is less than or equal to 5 seconds, the reading time of the data to be modified may be the latest reading time, whether the read record of the data to be modified is cached in the read record cache cluster may be determined by querying the read record of the key data in the read record cache cluster, and the latest reading time of the data to be modified is located in the preset time range from the current time to the current time.

For example, the data to be modified is a user name, and the specific structure is key "name 1", value "zhangsan", and the data modification is specifically to modify the value of value from "zhangsan" to "listet". And inquiring the read record with the key being "name 1" in the read record cache cluster, and if the read record with the key being "name 1" is inquired, judging whether the latest read time of the read record with the key being "name 1" is before the current time and is within 3 seconds of the current time.

If the read record of the data to be modified with the read time within the specified time range is determined to be cached in the read record cache cluster through the process, the data to be modified within a short time is determined to be read, and the data updating action of the first main node is cancelled in order to avoid the influence of data query on data modification.

If the read record of the data to be modified, the uncached read time of which is in the specified time range, in the read record cache cluster is determined through the process, the data to be modified in a short time is determined not to be read, and the data is modified in the first main node.

In a specific embodiment, a plurality of data to be modified corresponding to the first master node are provided, for example, the data to be modified includes key 1, "zhang san," name2, "and value" wang wu, "and before each data to be modified is modified in the first master node, the above-mentioned determining process is performed to determine whether the data to be modified has been read in a short time until each data to be modified corresponding to the first master node is determined, wherein based on a determination result, part or all of the data to be modified may be modified in the first master node, and for the data to be modified that is not modified this time, the data may be modified again after a period of time elapses.

Fig. 4 is a schematic diagram of a data modification process provided in an embodiment of the present application, and as shown in fig. 4, the process includes:

step S402, determining data to be modified and a modification mode corresponding to the data to be modified;

step S404, judging whether a reading record of the data to be modified with the reading time within the appointed time range is cached in the reading record caching cluster;

if yes, the data modification process is ended, otherwise, step S406 is executed,

step S406, modifying the data to be modified in the first host node according to the above modification manner.

The essence of the data modification operation is data writing operation, and because the reading operation and the writing operation executed on the same data in adjacent and relatively close time may interfere with each other, in the embodiment of the present application, before the data modification is performed in the first master node, it is determined whether a reading record of the data to be modified whose reading time is within a specified time range is cached in the reading record cache cluster, if the reading record is cached, the data modification action of the first master node is cancelled, and if the reading record is not cached, the data modification is performed in the first master node, thereby avoiding the influence of the reading operation of the data to be modified on the writing operation of the data to be modified, and ensuring the smooth update of the data.

On the other hand, if the reading operation of the data to be modified in a short time is not avoided, the data is directly updated, which may cause the situation that the value of the data to be modified, which is read by the user for multiple times in a short time, is different, for example, the value before modification is read for the previous time, and the value after modification is read for the next time, thereby affecting the consistency of the data and reducing the data query experience of the user.

Further, updating data in the first host node includes: modifying data in the first main node;

the method in the embodiment of the application further comprises the following steps:

before data modification is carried out in the first host node, determining that data to be modified is cached in the second host node according to the data reading record cached in the reading record caching cluster, and deleting the data to be modified in the second host node; the reading record cache cluster is used for caching data reading records; after the second slave node finishes the data deletion of the second master node, automatically synchronizing the data in the second master node;

according to the data updating result of the first main node, data updating is carried out in the second main node, and the data updating method comprises the following steps:

and adding the modified data in the second host node according to the data modification result in the first host node.

Specifically, taking the data to be modified as key-value data as an example, determining that the data to be modified is cached in the second host node according to the data reading record cached in the reading record caching cluster, specifically: judging whether a read record of the data to be modified is cached in the read record cache cluster according to the key value of the data to be modified, if the read record corresponding to the key value is found in the read record cache cluster, determining that the data to be modified is read, and if the read record corresponding to the key value is not found in the read record cache cluster, determining that the data to be modified is not read; the read data are cached in the second main node, so that when the data to be modified are determined to be read out, the data to be modified are cached in the second main node, and correspondingly, when the data to be modified are determined not to be read out, the data to be modified are not cached in the second main node.

If it is determined that the data to be modified is cached in the second master node, the data to be modified is deleted in the second master node, and it can be understood that after the data deletion of the second slave node in the second master node is completed, the data in the second master node is automatically synchronized, for example, all the data in the second master node are synchronized by each second slave node, so that the data to be modified is deleted from the data cache cluster. And after the data to be modified is deleted in the second host node, adding the modified data in the second host node according to the data modification result of the first host node.

On the contrary, if the data to be modified is not cached in the second host node, the modified data is added in the second host node directly according to the data modification result of the first host node.

Fig. 5 is a schematic diagram of a data modification process according to another embodiment of the present application, and as shown in fig. 5, the process includes:

step S502, determining data to be modified and a modification mode corresponding to the data to be modified;

step S504, according to the data reading record cached in the reading record caching cluster, judging whether the second main node caches the data to be modified;

if so, go to step S506, otherwise, go to step S508.

Step S506, deleting data to be modified in the second master node, wherein each second slave node automatically synchronizes the data in the second master node after the data deletion of the second master node is completed;

step S508, according to the above-mentioned data to be modified and its correspondent modification mode, carry on the data modification in the first host node;

step S510, according to the data modification result in the first host node, adding the modified data in the second host node.

By the embodiment, the data to be modified can be deleted in the data cache cluster before the data is modified, so that the purpose of deleting old cache data before the data is updated is achieved. After the old cache data is deleted, according to a data query mechanism for preferentially querying the data cache cluster, a user can directly query data from the data storage cluster after the old cache data is deleted and when data query is performed before the data cache cluster is updated, so that the updated data is queried, and the problem that the user queries the data before updating from the data cache cluster because the old cache data is not deleted is solved.

Through the foregoing process, it can be known that a read record cache cluster is provided in the embodiment of the present application, and in the embodiment of the present application, a third master node and a third slave node can also be determined in a plurality of third nodes included in the read record cache cluster by using a distributed consistency algorithm (e.g., a raft algorithm).

Specifically, the read record cache cluster includes a plurality of third nodes, and after the third master node and the third slave nodes are determined, the third slave nodes automatically synchronize data in the third master node after the third slave nodes finish reading records of data cached in the third master node, and the third master node and each third slave node cache the same data. And caching the corresponding data reading record in the third main node every time data is read from the data caching cluster or the data reading cluster.

In this embodiment, the maximum number of data reading records corresponding to the same read data is cached in the third master node, and is positively correlated with the heat of the read data, and the total number of the data reading records cached in the third master node does not exceed the preset number threshold.

Specifically, the heat of the read data is related to the read frequency of the read data, and the higher the read frequency of the read data is, the higher the heat of the read data is, and in a specific embodiment, the heat of the read data can be calculated according to the read frequency of the read data. In this embodiment, the third master node is configured to cache the maximum number of data read records corresponding to the same read data, and the maximum number is directly related to the heat degree of the read data, for example, according to the heat degree of the read data a, it is determined that the maximum number of data read records corresponding to the read data a is ten thousand, the heat degree of the read data a is greater than the heat degree of the read data b, and according to the heat degree of the read data b, it is determined that the maximum number of data read records corresponding to the read data b is five thousand, so that it is ensured that the read records of the data with higher heat degree are cached more.

Taking the key-value data as an example, the data reading record includes a key value and a reading timestamp, in this embodiment, a maximum expiration time may also be set for each key value, the expiration time may be a cache time of the data reading record corresponding to each key value, and when the cache time of the data reading record reaches the expiration time of the corresponding key value, the data reading record is deleted. The expiration time corresponding to the key value can be set to be positively correlated with the heat of the read data corresponding to the key value.

In this embodiment, the data read records cached by each third node are the same, and when the number of the data read records cached in the third node reaches the preset number threshold, the data read record with the longest caching duration is preferentially deleted, so as to ensure that the newer data read record is preferentially cached.

In this embodiment, the total number of the data reading records cached in the third host node is set to be not more than the preset number threshold, so that the cached data amount is not more than the load capacity range of the third host node, and the normal operation of the third host node is ensured.

According to the process, in the embodiment of the application, a first slave node can synchronize data in a first master node, a second slave node can synchronize data in a second master node, and a third slave node can synchronize data in a third master node; the preset quantity proportion is the proportion of the quantity of the first slave nodes for data synchronization to the quantity of all the first slave nodes; the second configuration information is used for configuring all the second slave nodes in the data cache cluster to automatically synchronize the data in the second master node after the data in the second master node is updated each time; the third configuration information is used for configuring that all the third slave nodes in the cache cluster are read and recorded after the reading record of the cache data in the third master node is completed every time, and the data in the third master node is automatically synchronized. The second configuration information may also be configured, and after the data deletion in the second master node is completed each time, all the second slave nodes in the data cache cluster automatically synchronize the data in the second master node.

For example, after data updating in the first master node is completed each time, more than 50% of the first slave nodes in the data storage cluster automatically synchronize data in the first master node, and after the data synchronization of more than 50% of the first slave nodes is completed, it is determined that the data updating of the data storage cluster is completed, and then the data in the data cache cluster is updated.

In a specific embodiment, after the first master node determines that data update of the first master node is completed, the first master node randomly selects second slave nodes, the number of which exceeds 50% of the second slave nodes, according to the first configuration information, to synchronize data in the first master node, so that when data update is performed in the first master node for multiple times, first slave nodes performing data update synchronously each time may be different, for example, when data a is updated in the first master node, the first master node randomly selects first slave nodes a, B, and c to perform data update synchronously, and when data B is updated in the first master node next time, the first master node randomly selects first slave nodes B, d, and f to perform data update synchronously.

Accordingly, after data update (e.g., writing new data) is performed in the first master node each time, the same data update process (e.g., writing data) is performed on a corresponding portion of the first slave nodes, so that a corresponding data is stored in the first master node and a portion of the first slave nodes, and based on this, the first master node further stores a quantity routing table, where the routing table is used to record an identifier of the first slave node, where each data is stored correspondingly, and each first slave node also stores the quantity routing table, accordingly, fig. 6 is a schematic diagram of a data query flow provided by an embodiment of the present application, and as shown in fig. 6, the flow includes:

step S602, target data is inquired from a second main node or a second slave node;

step S604, if the query is received, outputting the target data;

step S606, if the data is not inquired, a data inquiry request corresponding to the target data is sent to the first main node, wherein the first main node provides a data inquiry function according to the data inquiry request, or the first main node controls a first slave node stored with the target data to provide the data inquiry function;

step S608, obtaining and outputting target data according to the provided data query function;

step S610, caching the target data in the second master node, where the second slave node in the data cache cluster automatically synchronizes the data in the second master node after the second slave node completes caching the target data in the second master node.

Specifically, the target data is data to be queried, and the target data is queried in the second node first, for example, the target data is queried in the second master node or the second slave node with the highest communication speed, where the communication speed is related to the communication distance. And if the target data is inquired, outputting the target data. And if the target data is not inquired, sending a data inquiry function corresponding to the target data to the first main node.

When data is updated, the data is updated in the first master node, and part of the first slave nodes are also controlled to update the data, so that target data are stored in the first master node and part of the first slave nodes. Therefore, after the first master node receives the data query request, the first master node provides the data query function according to the request, or determines the first slave node storing the target data by using the stored data routing table according to the request, and controls the first slave node storing the target data to provide the data query function. Further, target data is acquired according to the data query function.

In order to facilitate subsequent query of the target data, the target data is cached in the second master node, and preferably, after all the second slave nodes finish caching the target data in the second master node, the data in the second master node is synchronized, so that all the second slave nodes cache the target data, subsequent query of the target data is facilitated, and the target data can be obtained by querying the data cache cluster.

In this embodiment, each first node (including a first slave node and a first master node) is further configured to perform first heartbeat communication, each second node (including a second slave node and a second master node) is configured to perform second heartbeat communication, each third node (including a third slave node and a third master node) is configured to perform third heartbeat communication, whether an unavailable first node exists or not is judged according to a first heartbeat communication result, whether an unavailable second node exists or not is judged according to a second heartbeat communication result, and whether an unavailable third node exists or not is judged according to a third heartbeat communication result; if the unavailable first node exists, synchronizing data stored by the unavailable first node to other available first nodes, if the unavailable second node exists, synchronizing data cached by the unavailable second node to other available second nodes, and if the unavailable third node exists, synchronizing data cached by the unavailable third node to other available third nodes.

Similarly, when the second node does not send the response data of the heartbeat communication within the specified time, the second node is determined to be unavailable, and when the third node does not send the response data of the heartbeat communication within the specified time, the third node is determined to be unavailable. All data of an unavailable first node may be synchronized to an available first node, and similarly, all data of an unavailable second node may be synchronized to an available second node, and all data of an unavailable third node may be synchronized to an available third node.

In this embodiment, it is set that the data storage cluster is determined to be unavailable when more than 50% of nodes in the data storage cluster are unavailable, the data cache cluster is determined to be unavailable when more than 50% of nodes in the data cache cluster are unavailable, and the read record cache cluster is determined to be unavailable when more than 50% of nodes in the read record cache cluster are unavailable. When the cluster is unavailable, the data cannot be written, and the data can be read.

In this embodiment, when the first master node is not available, the election master node may be voted again in each available first node according to a raft algorithm, and each first node may be elected as the master node because each first node stores the data routing table. Similarly, when the second master nodes are not available, the election master nodes can be voted again in each available second node according to the raft algorithm, and when the third master nodes are not available, the election master nodes can be voted again in each available third node according to the raft algorithm.

In this embodiment, the data cache cluster may be a redis cache cluster, where redis is a high-performance cache service middleware.

In this embodiment, the timestamps of each first node, each second node, and each third node are set to be synchronous by using an HLC (Hybrid logic Clock), so that the timestamps of each cluster are guaranteed to be consistent, and high consistency of data is guaranteed. Specifically, the HLC uses a physical clock and a logical clock, and can ensure that a single-point time generator is monotonically increased, so as to solve the problem of clock deviation between different nodes, and can control the clock deviation between different nodes to be within a specified error range as much as possible, so as to ensure the consistency of timestamps of each server node in each region.

In summary, the method in this embodiment can ensure consistency of node data of each region in a global high concurrency environment when data changes. For example, when advertisement data is updated, by the method, data in the data cache cluster and the data storage cluster can be updated, so that the effect of synchronously updating advertisement data of all regions in the world in a short time such as 3 seconds is achieved, and advertisement data seen by users in all regions in the world are the same. Similarly, by the method, when the user data such as the red packet data and the point data are updated, the effect of updating the user data of all regions in the world in a short time can be achieved, so that the red packet data and the point data seen by the user in all regions in the world are the same.

Corresponding to the above data processing method, an embodiment of the present application further provides a data processing apparatus, and fig. 7 is a schematic diagram of module composition of the data processing apparatus provided in the embodiment of the present application, as shown in fig. 7, the apparatus includes:

a first determining unit 71, configured to determine, by using a distributed consistency algorithm, a first master node and a first slave node from among a plurality of first nodes included in the data storage cluster; wherein the data storage cluster comprises a plurality of first nodes, the first nodes being configured to store data;

a second determining unit 72, configured to determine, by using the distributed consistency algorithm, a second master node and a second slave node in a plurality of second nodes included in the data cache cluster; the data cache cluster comprises a plurality of second nodes, and the second nodes are used for caching data;

a first updating unit 73, configured to perform data updating in the first master node; after the data of the first main node is updated, the first slave node automatically synchronizes the data in the first main node;

a second updating unit 74, configured to update data in the second host node according to a data update result of the first host node; and the second slave node automatically synchronizes the data in the second master node after the data update of the second master node is completed.

Optionally, the first updating unit 73 is specifically configured to: performing data modification in the first host node;

the device further comprises:

the cache determining unit is used for determining a read record of the data to be modified, wherein the uncached read time in the read record cache cluster is within a specified time range, before the data modification is carried out in the first host node;

the device further comprises:

a data deleting unit, configured to determine, before data modification is performed in the first host node, that data to be modified is cached in the second host node according to a data reading record cached in a reading record cache cluster, and delete the data to be modified in the second host node; the reading record cache cluster is used for caching data reading records; the second slave node automatically synchronizes the data in the second master node after the data in the second master node is deleted;

the second updating unit 74 is specifically configured to:

Optionally, the apparatus further comprises:

a third determining unit, configured to determine, by using the distributed consistency algorithm, a third master node and a third slave node in a plurality of third nodes included in the read record cache cluster;

the reading record cache cluster comprises a plurality of third nodes, and the third slave nodes automatically synchronize data in the third master node after the third master node finishes the reading record of the cache data; and caching the maximum number of data reading records corresponding to the same read data in the third main node, wherein the maximum number of the data reading records is positively correlated with the heat degree of the read data.

Optionally, the apparatus further comprises:

a first setting unit for setting first configuration information; the first configuration information is used for configuring that after data updating in the first main node is completed each time, first slave nodes exceeding a preset quantity proportion exist in the data storage cluster, and data in the first main node are automatically synchronized; the preset quantity proportion is the proportion of the quantity of the first slave nodes for data synchronization to the quantity of all the first slave nodes;

a second setting unit for setting second configuration information; the second configuration information is used for configuring that all second slave nodes in the data cache cluster automatically synchronize data in the second master node after the data in the second master node is updated every time;

a third setting unit configured to set third configuration information; and the third configuration information is used for configuring that all third slave nodes in the read record cache cluster automatically synchronize the data in the third master node after the read record of the cached data in the third master node is completed each time.

Optionally, the apparatus further includes a data querying unit, configured to:

querying target data from the second master node or the second slave node;

if the target data is not inquired, sending a data inquiry request corresponding to the target data to the first main node, wherein the first main node provides a data inquiry function according to the data inquiry request, or the first main node controls a first slave node which stores the target data to provide the data inquiry function;

acquiring the target data according to the provided data query function;

and caching the target data in the second main node, wherein the second slave node in the data cache cluster automatically synchronizes the data in the second main node after the second main node finishes caching the target data.

Optionally, the apparatus further comprises:

and the clock setting unit is used for setting the timestamp synchronization of each first node, each second node and each third node by using a hybrid logic clock HLC.

Further, based on the data processing method, an embodiment of the present application further provides a data processing device, and fig. 8 is a schematic structural diagram of the data processing device provided in the embodiment of the present application.

As shown in fig. 8, the data processing apparatus may have a relatively large difference due to different configurations or performances, and may include one or more processors 801 and a memory 802, and one or more stored applications or data may be stored in the memory 802. Wherein the memory 802 may be a transient storage or a persistent storage. The application program stored in memory 802 may include one or more modules (not shown), each of which may include a series of computer-executable instructions for a data processing device. Still further, the processor 801 may be arranged in communication with the memory 802 to execute a series of computer executable instructions in the memory 802 on the data processing device. The data processing apparatus may also include one or more power supplies 803, one or more wired or wireless network interfaces 804, one or more input-output interfaces 805, one or more keyboards 806, and the like.

In a specific embodiment, the data processing apparatus includes a processor, a memory, and a computer program stored on the memory and executable on the processor, and when the computer program is executed by the processor, the computer program implements the processes of the above data processing method embodiment, and specifically includes the following steps:

Optionally, the updating data in the first host node includes: performing data modification in the first host node;

further comprising:

before data modification is carried out in the first host node, determining a read record of data to be modified, wherein the uncached read time in the read record cache cluster is within a specified time range;

further comprising:

before data modification is carried out in the first host node, determining that data to be modified is cached in the second host node according to a data reading record cached in a reading record caching cluster, and deleting the data to be modified in the second host node; the reading record cache cluster is used for caching data reading records; the second slave node automatically synchronizes the data in the second master node after the data in the second master node is deleted;

the updating data in the second host node according to the data updating result of the first host node includes:

Optionally, the method further comprises:

determining a third master node and a third slave node in a plurality of third nodes included in the read record cache cluster by using the distributed consistency algorithm;

Optionally, the method further comprises:

setting first configuration information; the first configuration information is used for configuring that after data updating in the first main node is completed each time, first slave nodes exceeding a preset quantity proportion exist in the data storage cluster, and data in the first main node are automatically synchronized; the preset quantity proportion is the proportion of the quantity of the first slave nodes for data synchronization to the quantity of all the first slave nodes;

setting second configuration information; the second configuration information is used for configuring that all second slave nodes in the data cache cluster automatically synchronize data in the second master node after the data in the second master node is updated every time;

setting third configuration information; and the third configuration information is used for configuring that all third slave nodes in the read record cache cluster automatically synchronize the data in the third master node after the read record of the cached data in the third master node is completed each time.

Optionally, the method further comprises:

querying target data from the second master node or the second slave node;

acquiring the target data according to the provided data query function;

Optionally, the method further comprises:

and setting the timestamp synchronization of each first node, each second node and each third node by using a hybrid logic clock HLC.

Further, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the data processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement or the like made within the spirit and principle of the present application shall be included in the scope of the claims of the present application.

Claims

1. A data processing method applied to a data update cluster includes:

determining a first master node and a first slave node in a plurality of first nodes included in a data storage cluster by using a distributed consistency algorithm; wherein the data storage cluster comprises a plurality of first nodes, the first nodes being configured to store advertisement data;

determining a second master node and a second slave node in a plurality of second nodes included in the data cache cluster by using the distributed consistency algorithm; the data cache cluster comprises a plurality of second nodes, and the second nodes are used for caching advertisement data;

determining a third main node and a third slave node in a plurality of third nodes included in the reading record cache cluster by using the distributed consistency algorithm; the data cache cluster comprises a plurality of third nodes, and the third nodes are used for caching advertisement data reading records;

according to the data updating result of the first main node, data updating is carried out in the second main node, so that the data updating results of the data cache cluster and the data storage cluster are the same; after the data of the second master node is updated, the second slave node automatically synchronizes the data in the second master node;

after reading the advertisement data from the data cache cluster or the data storage cluster, caching corresponding advertisement data reading records in the third main node; after the third slave node finishes caching the advertisement data reading record in the third master node, automatically synchronizing the data in the third master node;

querying target data from the second master node or the second slave node;

wherein the updating data in the first master node includes: and under the condition that the read record of the data to be modified, the uncached read time of which is within the specified time range, in the read record cache cluster is determined, and/or under the condition that the data to be modified is cached in the second main node and the data to be modified is deleted in the second main node, modifying the data in the first main node.

2. The method of claim 1, further comprising:

3. The method of claim 1, further comprising:

before data modification is carried out in the first host node, determining that data to be modified is cached in the second host node according to the data reading record cached in the reading record caching cluster; the reading record cache cluster is used for caching data reading records; the second slave node automatically synchronizes the data in the second master node after the data in the second master node is deleted;

4. The method according to claim 2 or 3,

and caching the maximum number of data reading records corresponding to the same read data in the third main node, wherein the maximum number of the data reading records is positively correlated with the heat degree of the read data.

5. The method of claim 4, further comprising:

6. The method of claim 1, further comprising:

if the target data is not queried from the second master node or the second slave node, sending a data query request corresponding to the target data to the first master node, wherein the first master node provides a data query function according to the data query request, or the first master node controls a first slave node storing the target data to provide the data query function;

acquiring the target data according to the provided data query function;

7. The method of claim 4, further comprising:

8. A data processing apparatus, applied to a data update cluster, comprising:

a first determining unit, configured to determine, by using a distributed consistency algorithm, a first master node and a first slave node from among a plurality of first nodes included in a data storage cluster; wherein the data storage cluster comprises a plurality of first nodes, the first nodes being configured to store advertisement data;

a second determining unit, configured to determine, by using the distributed consistency algorithm, a second master node and a second slave node in a plurality of second nodes included in the data cache cluster; the data cache cluster comprises a plurality of second nodes, and the second nodes are used for caching advertisement data;

a third determining unit, configured to determine, by using the distributed consistency algorithm, a third master node and a third slave node in a plurality of third nodes included in the read record cache cluster; the data cache cluster comprises a plurality of third nodes, and the third nodes are used for caching advertisement data reading records;

a second updating unit, configured to perform data updating in the second master node according to a data updating result of the first master node, so that data updating results of the data cache cluster and the data storage cluster are the same; after the data of the second master node is updated, the second slave node automatically synchronizes the data in the second master node;

a third updating unit, configured to cache a corresponding advertisement data reading record in the third host node after reading the advertisement data from the data cache cluster or the data storage cluster; after the third slave node finishes caching the advertisement data reading record in the third master node, automatically synchronizing the data in the third master node;

the data query unit is used for querying target data from the second master node or the second slave node;

wherein the first updating unit is specifically configured to: and under the condition that the read record of the data to be modified, the uncached read time of which is within the specified time range, in the read record cache cluster is determined, and/or under the condition that the data to be modified is cached in the second main node and the data to be modified is deleted in the second main node, modifying the data in the first main node.

9. The apparatus of claim 8, further comprising:

10. The apparatus of claim 8, further comprising:

the second updating unit is specifically configured to:

11. The apparatus of claim 9 or 10,

12. The apparatus of claim 11, further comprising:

13. The apparatus of claim 8, wherein the data query unit is further configured to:

acquiring the target data according to the provided data query function;