CN118093614A - Data consistency and query method, device and system for multiple Neo4j - Google Patents

Data consistency and query method, device and system for multiple Neo4j Download PDF

Info

Publication number
CN118093614A
CN118093614A CN202410517172.XA CN202410517172A CN118093614A CN 118093614 A CN118093614 A CN 118093614A CN 202410517172 A CN202410517172 A CN 202410517172A CN 118093614 A CN118093614 A CN 118093614A
Authority
CN
China
Prior art keywords
node
data processing
data
processing system
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410517172.XA
Other languages
Chinese (zh)
Inventor
王伟
孙凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xiaoying Information Technology Co ltd
Original Assignee
Shenzhen Xiaoying Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Xiaoying Information Technology Co ltd filed Critical Shenzhen Xiaoying Information Technology Co ltd
Priority to CN202410517172.XA priority Critical patent/CN118093614A/en
Publication of CN118093614A publication Critical patent/CN118093614A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application is applicable to the technical field of data processing, and provides a method, a device and a system for data consistency and query of a plurality of Neo4j, which comprise the following steps: a first node receives a request instruction sent by a client, wherein the request instruction is a request for updating data of a graph database corresponding to the first node, and the first node is any node in the data processing system; after the first node executes the update task according to the request instruction, generating first log information corresponding to the update task, wherein the first log information comprises the data content of the update task; the first node sends the first log information to each second node, wherein the second nodes are nodes except the first node in the data processing system; and the second node executes an updating task according to the first log information. The method can ensure the consistency of the node data of the multiple graph databases.

Description

Data consistency and query method, device and system for multiple Neo4j
Technical Field
The application belongs to the technical field of data processing, and particularly relates to a method, a device and a system for data consistency and query of a plurality of Neo4 j.
Background
With the rapid development of industries such as social, electronic commerce, finance, retail, internet of things and the like, a huge and complex relation network is woven among things in the real world, and a traditional database is always in the way of being in the spotlight of such complex relation. Thus, a graph database has been created. For example, neo4J belongs to a primary graph database, and the used storage back end is specially customized and optimized for storing and managing graph structure data, and physical addresses of nodes related to each other on the graph in the database point to each other, so that the advantage of graph structure form data can be exerted.
However, since the graph database of Neo4J is a single node, when multiple Neo4J instances are used, the data of multiple Neo4J nodes cannot guarantee the consistency of the data.
Disclosure of Invention
The embodiment of the application provides a data consistency and query method, device and system based on a plurality of Neo4j, which can ensure the consistency of node data of a plurality of graph databases.
In a first aspect, an embodiment of the present application provides a data processing method, including:
A first node receives a request instruction sent by a client, wherein the request instruction is a request for updating data of a graph database corresponding to the first node, and the first node is any node in the data processing system;
After the first node executes the update task according to the request instruction, generating first log information corresponding to the update task, wherein the first log information comprises the data content of the update task;
The first node sends the first log information to each second node, wherein the second nodes are nodes except the first node in the data processing system;
And the second node executes an update task according to the first log information.
In the embodiment of the application, each graph database corresponds to one node, the first node is equivalent to a leader node in a plurality of nodes, and after the leader node receives an instruction for updating the graph database sent by a client, the leader node executes an updating task according to the instruction. The update task generation log information (first log information) is then transmitted to other nodes (second nodes), each of which performs an update task based on the first log information transmitted by the first node. In other words, after the leader node (first node) performs the update task sent by the client, by sending the log information of the update task to other nodes, it is ensured that all nodes perform the same update operation in the same order, thereby maintaining the consistency of the data in the system. Therefore, the method can ensure the consistency of the node data of the plurality of graph databases.
In a possible implementation manner of the first aspect, the method further includes:
after a third node sends a voting request instruction to each fourth node, a voting result of each fourth node for voting to the third node is obtained, wherein the third node is any node in the data processing system, and the fourth node is a node except the third node in the data processing system;
in the voting result, if the number of votes from the fourth node to the third node is greater than a first threshold, determining the third node as the first node, wherein the first threshold is determined according to the number of all nodes in the data processing system;
And if the number of votes from the fourth node to the third node is not greater than the first threshold value, the third node sends a voting request instruction to each fourth node again.
In the embodiment of the application, the first node is determined by using an election algorithm, so as to ensure that a leader node (first node) can be effectively elected in the distributed system to be responsible for coordinating and managing the operation of the whole system. The election mechanism can improve the stability and reliability of the system and ensure that the system can still normally operate under the conditions of node faults or network partitions and the like.
In a possible implementation manner of the first aspect, the method further includes:
And in the process of voting from the fourth node to the third node, if the third node receives the communication information sent by the fourth node, determining the fourth node sending the communication information as the first node.
In the embodiment of the application, when the fourth node sends the communication information to the third node, the third node can immediately respond without waiting for the voting results of other nodes. This real-time response helps to quickly elect a leader node, especially in emergency situations where a quick return to normal system operation is required.
In a possible implementation manner of the first aspect, the method further includes:
when the second node receives a request instruction sent by the client, the second node sends the request instruction to the first node;
and the first node executes an update task according to the request instruction sent by the second node.
In the embodiment of the application, the first node is responsible for executing the update task, which means that all the update operations are executed uniformly by the first node, and the problem of inconsistent data caused by the simultaneous execution of the update operations by a plurality of nodes or the problem of inconsistent data caused by the execution sequence is avoided. This centralized way of updating helps to ensure consistency and integrity of the data.
In a possible implementation manner of the first aspect, the data processing system further includes a load balancer, and the method further includes:
The load balancer obtains the initial weight of each fifth node, and the fifth node is any node in the data processing system;
after each fifth node performs an update task, the load balancer calculates the current weight of each fifth node according to the initial weight;
and the load balancer sends a plurality of request instructions to each fifth node according to the current weight.
In the embodiment of the application, the load balancer can effectively distribute the requests to each node by considering the weight and the load condition of the nodes, so that the condition of performance degradation caused by overhigh load of some nodes is avoided, and the load balancing is realized.
In a possible implementation manner of the first aspect, the data processing system further includes a controller; the method further comprises the steps of:
The controller obtains the operation information of each sixth node, wherein the sixth node is any node in the data processing system;
The controller judges whether the sixth node fails according to the operation information;
and if the sixth node fails, the controller removes the service interface corresponding to the sixth node and stops the operation of the sixth node.
In the embodiment of the application, based on the acquired operation information, the controller can automatically judge whether the sixth node fails. Once a fault is found, the controller automatically executes corresponding fault handling measures, such as removing the service interface corresponding to the node and stopping the operation of the node, so as to prevent the fault node from further affecting the system.
In a possible implementation manner of the first aspect, the operation information includes a data index; the method further comprises the steps of:
The controller monitors whether the data index in each sixth node is the same or not so as to ensure that the data of each sixth node is consistent.
In an embodiment of the application, the controller monitors the data index in each sixth node, ensuring their consistency. This means that the user can acquire the same data regardless of which node they access, thereby ensuring consistency and accuracy of the data.
In a second aspect, an embodiment of the present application provides a data processing system, including:
The load balancer and the plurality of nodes are configured to implement the data processing method according to the first aspect;
the controller is used for monitoring the operation information of each node of the data processing system.
In a third aspect, embodiments of the present application provide a computer readable storage medium storing a computer program which, when executed by a processor, implements a data processing method as in any one of the first aspects above.
In a fourth aspect, an embodiment of the present application provides a computer program product for, when run on a terminal device, causing the terminal device to perform the data processing method of any one of the first aspects above.
It will be appreciated that the advantages of the second to fourth aspects may be found in the relevant description of the first aspect and are not repeated here.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a system flow diagram of a data processing method provided by an embodiment of the present application;
FIG. 2 is a schematic diagram of generating first log information according to an embodiment of the present application;
FIG. 3 is a diagram of copying first log information according to an embodiment of the present application;
FIG. 4 is a schematic flow chart of determining a first node according to an embodiment of the present application;
FIG. 5 is a schematic block diagram of determining a first node provided by an embodiment of the present application;
FIG. 6 is a schematic diagram of monitoring node operation information provided by an embodiment of the present application;
FIG. 7 is a system block diagram of a data processing method provided by an embodiment of the present application;
Fig. 8 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise.
Typically, when we use the database to find the relationship between things, only a short-range relationship query (a relationship within two layers) is needed. When a longer-range, broader-range relational query is required, the function of the graph database is required. With the rapid development of industries such as social, electronic commerce, finance, retail, internet of things and the like, a huge and complex relation network is woven among things in the real world, and a traditional database is always in the way of being in the spotlight of such complex relation. Thus, a graph database has been created. A Graph database (Graph database) refers to a database that stores and queries data in the form of Graph data structures.
For example, neo4J belongs to a primary graph database, and the used storage back end is specially customized and optimized for storing and managing graph structure data, and physical addresses of nodes related to each other on the graph in the database point to each other, so that the advantage of graph structure form data can be exerted. The data storage form of Neo4J is mainly nodes and edges to organize data. The nodes can represent entities in the knowledge graph, the edges can be used for representing the relationship among the entities, the relationship can be directional, and the two ends correspond to the starting node and the ending node. In addition, one or more labels may be added to the node to represent the classification of the entity, and a set of key-value pairs to represent some additional attributes of the entity in addition to the relationship attributes. The relationship may also be accompanied by additional attributes and labels.
However, since the open source version of the graph database such as Neo4J is a single node, it is not possible to deal with large and magnitude query problems. There is a single point of failure. And when using multiple Neo4J instances, the data of multiple Neo4J nodes cannot guarantee the consistency of the data.
In order to solve the above-mentioned problems, in the embodiment of the present application, a data processing method is provided, where a leader node and a follower node are selected from a plurality of Neo4J nodes, where the leader node is responsible for interacting with a client, and the follower node is responsible for executing a command task sent by the leader node. And the leader node executes the update task according to the request command sent by the client, and generates log information for the update task after the update task is executed. And finally, the leader node sends the log information to other follower nodes, so that all the follower nodes can execute the same updating operation according to the same sequence according to the log information, and the consistency of data in the system is maintained.
Referring to fig. 1, which is a schematic flow chart of a data processing method according to an embodiment of the present application, by way of example and not limitation, the method may include the following steps:
S101, a first node receives a request instruction sent by a client, wherein the request instruction is a request for updating data of a graph database corresponding to the first node, and the first node is any node in a data processing system.
In the embodiment of the application, the first node is a leader node selected from a plurality of Neo4J nodes, the second node is equivalent to a follower node selected from a plurality of Neo4J nodes, only the leader node interacts with the client, and the follower node executes tasks according to instructions of the leader node.
The leader node is responsible for receiving request instructions from clients, which are commands for updating the graph database, and may include operations to add, delete, or modify data.
In some implementations, each Neo4J database corresponds to a node, while each Neo4J database corresponds to a server or executor. The leader node may act as an HTTP server listening for a particular port with which the client may communicate by sending HTTP requests. When the client sends a request instruction, the leader node analyzes the request through the HTTP protocol and extracts the instruction from the request for processing. This approach is applicable to Web-based applications and can be implemented using existing HTTP server frameworks.
In other implementations, asynchronous communication may occur between the leader node and the client via a message queue. The client sends the request instruction to the message queue, and the leader node listens to the message queue and obtains the request instruction from the message queue for processing. The method can realize decoupling and asynchronous processing, and improves the scalability and stability of the system.
In the method, the leader node receives the request instruction sent by the client, and can correctly copy and process the request instruction so as to maintain the consistency of the data.
S102, after the first node executes the update task according to the request instruction, generating first log information corresponding to the update task, wherein the first log information comprises data content of the update task.
In the embodiment of the application, first, after receiving a request instruction sent by a client, a leader node (first node) of the system executes a corresponding update task according to the request instruction, wherein the update task may involve modifying, adding or deleting data in the system.
After the update task is executed, the leader node generates corresponding first log information according to the update task. The first log information records details regarding performing the update task including, but not limited to, the type of update operation, the object of the operation, the time stamp of the operation, the operator, etc. Importantly, the first log information includes the data content of the update task, i.e. the change condition of the related data before and after the update, etc.
Exemplary, referring to fig. 2, a schematic diagram of copying first log information according to an embodiment of the present application is shown in fig. 2, where a request instruction sent by a client to a leader node a (first node) is a write "hello < -world" instruction into a database (such as the flow ① in fig. 2). After receiving the instruction, the leader node a submits a received proposal message to the node state machine, the message type of which is Msgprop (as in the ② flow in fig. 2), and the node state machine writes "hello < -world" into the graph database corresponding to the leader node a to complete the task of updating the graph database.
After the update task is executed, the leader node a first checks that the task execution is completed. This includes verifying that the task completed successfully, checking whether an exception or error occurred during the task execution, etc. For a successfully completed task, the leader node A begins building log entries (first log entries) from the completed updated task, collating the specific information, type, index, execution object, etc. of the instruction, and its specific operations written to the graph database, into the format of log entries, to generate first log information and then append the first log entry to an unpersisted, unstable node log (as in the ④ flow in FIG. 2).
In the method, the first log information records the data content of the update task, and by writing the execution result of the update task into the log, not only can the system be ensured to be restored to a consistent state after the system is in fault or the node is invalid, but also the log entries can be sent to other nodes for replication in the distributed system, so that all nodes in the cluster can execute the update task according to the log entries, and the consistency of the data is ensured.
S103, the first node sends the first log information to each second node, wherein the second nodes are nodes except the first node in the data processing system.
In the embodiment of the application, after the first node generates the first log information, other nodes (second nodes) in the system are traversed, the first log information is copied, and the copied first log information is transmitted to the second nodes through a proper communication mechanism, so that the second nodes execute corresponding operations according to the first log information, and the data in the graph database corresponding to each node are consistent.
For example, referring to fig. 3, a schematic diagram of copying first log information provided by an embodiment of the present application is shown in fig. 3, where the first node may be a leader node a, the second node may be a follower node B and a follower node C, the leader a transmits the copied first log information to the follower node B and the follower node C through a network communication protocol (such as the flow ⑤ in fig. 3), after the follower node B and the follower node C receive the first log information, the first log information is analyzed and added to the corresponding log, if the addition is successful, the information of the successful addition of the first log information is returned to the leader node a (such as the flow ⑥ in fig. 3), and after receiving the feedback information of the successful addition of the log of the follower node B and the follower node C, the leader node a stores the first log information into a storage state machine (such as the flow ⑦ in fig. 3).
In the method, by sending the log information of the first node to other nodes, it is ensured that all nodes have the same log data. This helps to ensure that all nodes in the system are able to obtain the latest log information, thereby maintaining data synchronization and consistency.
S104, the second node executes an update task according to the first log information.
In the embodiment of the application, the second node analyzes and processes the received log information to extract the update task information contained in the log information. This may involve analysis and decoding of the log information to determine the update task content and related information contained therein. Once the update tasks are extracted, the second node performs the corresponding update operations based on the tasks.
In one embodiment, step S104 includes:
the second node converts the first log information into first application data corresponding to a graph database; the second node performs an update task according to the first application data.
In the embodiment of the application, the first application data is a query statement corresponding to the graph data. The first node typically records information such as the operational history and state changes of the system in a log. When the second node needs to execute the update task, the state opportunity corresponding to the second node first acquires log information (first log information) of the first node. Such log information may include various operations in the system, such as, for example, incremental censoring of data, changes in status, etc. The second node converts the acquired log information of the first node into corresponding first application data in the graph database.
For example, for the map database of Neo4J, the corresponding query statement is a Cypher statement. The implementation step of the second node to execute the update task according to the first log information of the first node may be:
1. The second node analyzes the first log information of the first node after receiving the first log information of the first node. These logs may record various operations in the system, such as creation, updating, deletion of nodes, establishment of relationships, modification, change of attributes, and so forth. The log is parsed to extract the contents of these operations.
2. For each log record, the type of operation needs to be identified, such as whether a node is created, updated, or deleted, or whether a relationship is created, updated, or deleted. This helps determine the specific operations that need to be performed in the graph database. And constructing a corresponding Cypher statement according to the analyzed operation type and the related data. Cypher is a query and operation language used in graph database Neo4j, which may represent update tasks such as creation, update, deletion, etc. of nodes and relationships in the graph.
3. When the Cypher statement is constructed, the operation content extracted from the log is required to be converted into a proper Cypher command, so that the corresponding operation can be accurately executed in the graph database.
4. And the second node sends the constructed Cypher statement to the graph database for execution. The graph database carries out corresponding operations on the data according to the Cypher statement, including update tasks such as node and relation addition, deletion and the like.
The second node converts the first log information into a Cypher statement by using the corresponding state machine, writes the Cypher statement into the corresponding graph database of the second node, and feeds back the result to the leader node A.
In the method, the log information can be converted into the corresponding query statement in the graph database, so that the updating operation of the data in the graph database can be realized, and the data consistency of each graph database in the system can be maintained.
In an embodiment, referring to fig. 4, a schematic flow chart of determining a first node according to an embodiment of the present application is shown in fig. 4, where the method further includes:
s201, after the third node sends a voting request instruction to each fourth node, a voting result of each fourth node for voting to the third node is obtained, wherein the third node is any node in the data processing system, and the fourth node is a node except the third node in the data processing system.
In the embodiment of the application, in the initial state, all nodes in the system are in an unknown state, and no node is selected as a leader node (first node). Each node periodically (typically randomly selecting a time interval) sends heartbeat signals to other nodes to maintain network communication and activity. Meanwhile, each node sets an election timeout time, and if the heartbeat signal from the leader node is not received in the time, the node considers that no leader node exists currently, enters a candidate state and initiates election.
When the election timeout of the third node (any node in the database) arrives and it has not received the heartbeat signal of the leader node, the node becomes a candidate and sends a voting request to the other node (fourth node) requesting the other node to vote to support itself as the leader node. After receiving the voting request, the other nodes vote to the candidate if the other candidates are not currently voted to, and finally, whether the third node is the leader node (the first node) is determined according to the voting result of the fourth node.
For example, referring to fig. 5, a schematic block diagram of determining a first node according to an embodiment of the present application is shown in fig. 5, where there are three nodes in the system, namely, node a, node B and node C, and each node has a corresponding timeout period, which may be arbitrarily set, for example, the timeout periods of node a, node B and node C are 120ms, 200ms and 280ms, respectively, and when the timeout period of node a (third node) arrives first, the node a becomes a candidate, and node B (fourth node) and node C (fourth node) become a follower. Then node A initiates a vote to node B and node C and obtains the voting result.
In the method, the third node can implement the consistency protocol by requesting the voting instruction and acquiring the voting result, which helps to ensure that each node in the system makes a decision to agree on, thereby ensuring the data consistency and reliability of the system.
S202, in the voting result, if the voting number of the fourth node to the third node is larger than a first threshold, determining the third node as the first node, wherein the first threshold is determined according to the number of all nodes in the data processing system.
In the embodiment of the present application, first, a so-called first threshold value needs to be determined. The first threshold is determined based on the number of all nodes in the data processing system.
In one implementation, the first threshold may be determined based on the number of total nodes. As in the example of fig. 5, there are a total of 3 nodes in the system, the first threshold is determinable from (3+1)/2, meaning that if there are more than 2 tickets the role of the node will be determined, i.e. the third node is determined to be the leader node role (first node).
In another implementation, if the nodes in the system have different characteristics or weights, the first threshold may be adjusted according to the importance and reliability of the nodes. For example, the voting weight of nodes with higher performance or reliability may be increased, thereby affecting the determination of election results and roles.
In the above method, if the number of votes obtained by the third node exceeds the first threshold, the system can determine that the third node is the first node. Therefore, the accurate positioning of the node roles can be ensured, and the system can smoothly execute various tasks in the subsequent operation.
And S203, if the number of votes from the fourth nodes to the third node is not greater than the first threshold, the third node sends a voting request instruction to each fourth node again.
In the embodiment of the application, if the number of votes from the fourth node to the third node does not reach the first threshold, it means that the role of the third node cannot be determined. In this case, the third node needs to send an instruction to request voting again to each fourth node to resume the voting process.
In the method, if the voting result is lower than the first threshold value, the voting process is restarted, so that the possible voting deviation or error can be dealt with, and the fairness and accuracy of the voting process are ensured.
In one embodiment, the method further comprises:
and in the process of voting from the fourth node to the third node, if the third node receives the communication information sent by the fourth node, determining the fourth node sending the communication information as the first node.
In the practice of the present application, the communication information may also be referred to as a heartbeat signal. Heartbeat signals are typically used to represent the active state and availability of a node. A heartbeat signal is a message that a leader node periodically sends to other nodes to maintain its leadership and to notify the other nodes of its own presence. When the other nodes receive the heartbeat signal of the leader node, they know that the leader node is still in an active state, so that their own election process is not started. Therefore, if the third node receives the heartbeat signal sent by the fourth node while waiting for the vote of the fourth node, which means that the fourth node is now in the role of the leader node, the fourth node is determined to be the leader node (first node).
In the process of voting to node a, as shown in the example of fig. 5, if node a receives the heartbeat signal sent by any node B or node C, it indicates that node B or node C is in an active state at this time, and the voting process needs to be terminated at this time, so that node B or node C is determined as the leader node.
In the method, the system can more quickly determine the first node by timely identifying and responding to the communication information from other nodes, so that the role determination process is quickened.
In one embodiment, the method further comprises:
When the second node receives a request instruction sent by the client, the second node sends the request instruction to the first node; and the first node executes the update task according to the request instruction sent by the second node.
In an embodiment of the application, the follower node (second node) does not interact directly with the client. When a follower node receives a command from a client, it forwards the command to the current leader node (first node). The purpose of this design is to ensure that only the leader node interacts directly with the clients in the system, while other nodes achieve consistent operation and state management through the leader node.
When the second node receives a request instruction from the client, the second node forwards the instruction to the first node, typically through network communication, e.g. by transmitting the request instruction to the first node via TCP or UDP protocol. After receiving the request instruction from the second node, the first node executes the update task according to the specific task content contained in the instruction. This may involve modifications to the system state, updates to data, scheduling of tasks, etc.
In the method, the task is concentrated to the first node for execution, so that task processing logic can be unified, and the task execution in the system is ensured to be consistent. This helps to improve the stability and maintainability of the system.
In one embodiment, the data processing system further comprises a load balancer; the method further comprises the steps of:
the load balancer obtains the initial weight of each fifth node, and the fifth node is any node in the data processing system; after each fifth node performs the update task, the load balancer calculates a second weight of each fifth node according to the initial weight; and the load balancer sends a plurality of request instructions to each fifth node according to the second weight.
In the embodiment of the application, the load balancer first acquires the initial weight of each fifth node (any node in the system). This initial weight may be pre-configured or may be dynamically adjusted based on the current state of the system. After each fifth node performs the update task, the load balancer recalculates the second weight of each node. This second weight may reflect the load condition, performance index, or other relevant information of the node, and the load balancer sends multiple request instructions to each fifth node according to the calculated weight.
The second weight is obtained by adopting a smoothing weighting algorithm, and the algorithm is as follows:
There are 5 servers s= { Xgraph, xgraph02, xgraph, xgraph, 04, xgraph }, with default weights w= { W1, W2, W3, W4, W5}, and current weights cw= { CW1, CW2, CW3, CW4, CW5}. In the algorithm, two weights exist, the default weight represents the original weight of the server, the current weight represents the weight recalculated after each access, the initial value of the current weight is the default weight value, the server with the largest current weight value is maxWeightServer, the sum of all the default weights is weightSum, the server list is SERVERLIST, and the algorithm can be described as follows:
finding out a server maxWeightServer with the maximum current weight value;
Calculating the sum weightSum of { W0, W1, W2, W3, W4, W5 };
maxweight server.cw=maxweight server.cw-weightSum;
The current weight CW of { Xgraph, xgraph02, xgraph, xgraph04, xgraph } is recalculated with a calculation formula of Xgraph n.cw= Xgraph n.cw+ Xgraph n.wn
Return maxWeightServer.
For example, if the number of request instructions sent by the client is 20, and the weight ratio of the current 5 servers is 1:1:1:1, the number of request instructions obtained by each server through the load balancer is 4, and after each server is accessed, if the current weight (the second weight) of each server is unchanged through the algorithm under the premise of not changing the weight ratio. If one service is found to be inoperable due to failure in the access servers, the current weight (second weight) of each server is changed according to the algorithm, and the load balancer distributes 20 request instructions to 4 servers.
In the above method, the load balancer receives requests from clients and distributes the requests to a plurality of backend servers. The purpose of this is to ensure that all servers can fully utilize their resources, avoiding performance degradation caused by overload of certain servers.
In one embodiment, the data processing system further comprises a controller; the method further comprises the steps of:
The controller acquires the operation information of each sixth node, wherein the sixth node is any node in the data processing system; the controller judges whether the sixth node fails according to the operation information; and if the sixth node fails, the controller removes the service interface corresponding to the sixth node to stop the operation of the sixth node.
In the embodiment of the present application, referring to fig. 6, which is a schematic diagram of operation information of a monitoring node provided in the embodiment of the present application, as shown in fig. 6, the operation information of a sixth node (any node in the system) may include information such as a node ID, a current role of the node, a data index, a service interface, and the like in the system. This involves analysing the status of the node, for example checking if there is abnormal behaviour, error reporting or unexpected load etc. If a sixth node is determined to be faulty, the controller will take corresponding measures to cope with. In this case, the controller will remove the service interface corresponding to the node to stop the operation of the node. This involves removing the portal for the node from the load balancer or service registry, preventing further requests from being sent to it.
According to the method, the sixth node in the data processing system can be identified and processed in time when faults occur, so that dynamic plug of the service node is realized, and the reliability and stability of the system are guaranteed.
In one embodiment, the operational information includes a data index; the method further comprises the steps of:
the controller monitors whether the data index in each sixth node is the same to ensure that the data of each sixth node is consistent.
In embodiments of the present application, a data index is typically used to indicate data storage locations and access paths. The controller compares the data indexes of each sixth node to determine whether they are identical. This involves comparing the indexed content, version number, or other identifier to determine if there is a discrepancy. If the data index of a sixth node (any node in the system) is inconsistent with other nodes, the controller will take corresponding action to deal with such data inconsistency. This may include marking inconsistent nodes as error states, resynchronizing the data index, or performing other data consistency repair operations.
As in the example of fig. 6, the submitted data index of each node may be obtained, and if the data indexes of node a, node B and node C are all a, it may be determined that the data corresponding to each node is consistent, and if they are different, corresponding measures may be taken.
In the above method, by monitoring and processing the data index in each sixth node, the controller may ensure that each node in the data processing system has the same data view and consistency.
Referring to fig. 7, a system block diagram of a data processing method according to an embodiment of the present application is shown in fig. 7, where in the application, a plurality of graph databases are included, and each graph data corresponds to a node. The data processing process is as follows:
1. When each node corresponding to the Neo4j graph database is started, selecting a leader node (a first node) from the nodes A, B, C, D, E and N;
2. The load balancer distributes the task command sent by the client to the node corresponding to each graph database;
3. If the node A is a leader node when the request command is distributed to the node A, the node A executes a data updating task according to the request command and generates first log information;
4. The node A copies the first log information, and sends the first log information to the follower node B, the node C, the node D, the node E and the node N (second node), the node B, the node C, the node D and the node E analyze the first log information, convert the first log information into a Cypher statement corresponding to the Neo4j graph database, execute a data updating task, and return the result of the updating task to the leader node A.
In the method, after the leader node (the first node) finishes the update task sent by the client, by sending the log information of the update task to other nodes (the second node), all nodes can be ensured to execute the same update operation according to the same sequence, so that the consistency of data in the system is maintained. Therefore, the method can ensure the consistency of the node data of the plurality of graph databases.
It should be noted that, in the data processing method provided by the embodiment of the present application, the number of nodes in the graph database may be 3,5, and 7 … …, and the number of nodes must be the radix. The number of the nodes is set as the base number, so that when a leader is elected among a plurality of nodes, the election failure caused by the occurrence of a fault of one node is prevented, and the problems of node faults, network partitions and the like can be better processed, so that the stability and fault tolerance of the system are ensured, and the reliability of the system is ensured.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
Fig. 8 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 8, the terminal device 8 of this embodiment includes: at least one processor 80 (only one shown in fig. 8), a memory 81 and a computer program 82 stored in the memory 81 and executable on the at least one processor 80, the processor 80 implementing the steps in any of the various data processing method embodiments described above when executing the computer program 82.
The terminal equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The terminal device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that fig. 8 is merely an example of the terminal device 8 and is not limiting of the terminal device 8, and may include more or fewer components than shown, or may combine certain components, or different components, such as may also include input-output devices, network access devices, etc.
The Processor 80 may be a central processing unit (Central Processing Unit, CPU), the Processor 80 may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 81 may in some embodiments be an internal storage unit of the terminal device 8, such as a hard disk or a memory of the terminal device 8. The memory 81 may in other embodiments also be an external storage device of the terminal device 8, such as a plug-in hard disk provided on the terminal device 8, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD) or the like. Further, the memory 81 may also include both an internal storage unit and an external storage device of the terminal device 8. The memory 81 is used for storing an operating system, application programs, boot Loader (Boot Loader), data, other programs, etc., such as program codes of the computer program. The memory 81 may also be used to temporarily store data that has been output or is to be output.
Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, performs the steps of the respective method embodiments described above.
Embodiments of the present application provide a computer program product enabling a terminal device to carry out the steps of the method embodiments described above when the computer program product is run on the terminal device.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to an apparatus/terminal device, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (10)

1. A data processing method, characterized in that it is applied to a data processing system, said data processing system comprising a plurality of nodes, each node corresponding to a graph database; the method comprises the following steps:
A first node receives a request instruction sent by a client, wherein the request instruction is a request for updating data of a graph database corresponding to the first node, and the first node is any node in the data processing system;
After the first node executes the update task according to the request instruction, generating first log information corresponding to the update task, wherein the first log information comprises the data content of the update task;
The first node sends the first log information to each second node, wherein the second nodes are nodes except the first node in the data processing system;
And the second node executes an update task according to the first log information.
2. The data processing method of claim 1, wherein the method further comprises:
after a third node sends a voting request instruction to each fourth node, a voting result of each fourth node for voting to the third node is obtained, wherein the third node is any node in the data processing system, and the fourth node is a node except the third node in the data processing system;
in the voting result, if the number of votes from the fourth node to the third node is greater than a first threshold, determining the third node as the first node, wherein the first threshold is determined according to the number of all nodes in the data processing system;
And if the number of votes from the fourth node to the third node is not greater than the first threshold value, the third node sends a voting request instruction to each fourth node again.
3. The data processing method of claim 2, wherein the method further comprises:
And in the process of voting from the fourth node to the third node, if the third node receives the communication information sent by the fourth node, determining the fourth node sending the communication information as the first node.
4. The data processing method of claim 1, wherein the method further comprises:
when the second node receives a request instruction sent by the client, the second node sends the request instruction to the first node;
and the first node executes an update task according to the request instruction sent by the second node.
5. The data processing method of claim 1, wherein the second node performs an update task according to the first log information, comprising:
The second node converts the first log information into first application data corresponding to the graph database;
And the second node executes an update task according to the first application data.
6. The data processing method of any of claims 1-5, wherein the data processing system further comprises a load balancer;
the method further comprises the steps of:
The load balancer obtains the initial weight of each fifth node, and the fifth node is any node in the data processing system;
after each fifth node performs an update task, the load balancer calculates a second weight of each fifth node according to the initial weight;
The load balancer sends a plurality of request instructions to each fifth node according to the second weight.
7. The data processing method of claim 6, wherein the data processing system further comprises a controller;
the method further comprises the steps of:
The controller obtains the operation information of each sixth node, wherein the sixth node is any node in the data processing system;
The controller judges whether the sixth node fails according to the operation information;
And if the sixth node fails, the controller removes the service interface corresponding to the sixth node to stop the operation of the sixth node.
8. The data processing method of claim 7, wherein the operation information includes a data index;
the method further comprises the steps of:
The controller monitors whether the data index in each sixth node is the same or not so as to ensure that the data of each sixth node is consistent.
9. A data processing system comprising a load balancer, a controller, and a plurality of nodes;
the load balancer and the plurality of nodes are configured to implement the data processing method according to claim 6;
the controller is used for monitoring the operation information of each node of the data processing system.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 8.
CN202410517172.XA 2024-04-28 2024-04-28 Data consistency and query method, device and system for multiple Neo4j Pending CN118093614A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410517172.XA CN118093614A (en) 2024-04-28 2024-04-28 Data consistency and query method, device and system for multiple Neo4j

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410517172.XA CN118093614A (en) 2024-04-28 2024-04-28 Data consistency and query method, device and system for multiple Neo4j

Publications (1)

Publication Number Publication Date
CN118093614A true CN118093614A (en) 2024-05-28

Family

ID=91165645

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410517172.XA Pending CN118093614A (en) 2024-04-28 2024-04-28 Data consistency and query method, device and system for multiple Neo4j

Country Status (1)

Country Link
CN (1) CN118093614A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112559632A (en) * 2020-12-15 2021-03-26 北京百度网讯科技有限公司 Method, device, electronic equipment and medium for synchronizing state of distributed graph database
CN114328566A (en) * 2021-12-30 2022-04-12 北京金堤科技有限公司 Relationship graph updating method, device, medium, equipment and generating method
US20240061754A1 (en) * 2022-08-18 2024-02-22 Tigergraph, Inc. Management of logs and cache for a graph database

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112559632A (en) * 2020-12-15 2021-03-26 北京百度网讯科技有限公司 Method, device, electronic equipment and medium for synchronizing state of distributed graph database
CN114328566A (en) * 2021-12-30 2022-04-12 北京金堤科技有限公司 Relationship graph updating method, device, medium, equipment and generating method
US20240061754A1 (en) * 2022-08-18 2024-02-22 Tigergraph, Inc. Management of logs and cache for a graph database

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"俞方桦_Raft+-+基于共识的分布式数据库协同算法及其在+Neo4j+集群中的实现【PPT】", pages 1 - 74, Retrieved from the Internet <URL:https://z.itpub.net/article/detail/E679E254FCCFE5C6C648290C32135893> *
巧克力加糖: "17:10 - 18:00 俞方桦_Raft - 基于共识的分布式数据库协同算法及其在 Neo4j 集群中的实现", pages 1 - 74, Retrieved from the Internet <URL:https://www.modb.pro/doc/5440> *

Similar Documents

Publication Publication Date Title
US11397709B2 (en) Automated configuration of log-coordinated storage groups
US11687555B2 (en) Conditional master election in distributed databases
US10296606B2 (en) Stateless datastore—independent transactions
US9817703B1 (en) Distributed lock management using conditional updates to a distributed key value data store
US10373247B2 (en) Lifecycle transitions in log-coordinated data stores
US9323569B2 (en) Scalable log-based transaction management
US10303795B2 (en) Read descriptors at heterogeneous storage systems
US9799017B1 (en) Cross-data-store operations in log-coordinated storage systems
US20130117226A1 (en) Method and A System for Synchronizing Data
CN111930489B (en) Task scheduling method, device, equipment and storage medium
US9910881B1 (en) Maintaining versions of control plane data for a network-based service control plane
US11409711B2 (en) Barriers for dependent operations among sharded data stores
CN110063042A (en) A kind of response method and its terminal of database failure
CN112596801B (en) Transaction processing method, device, equipment, storage medium and database
CN112559525B (en) Data checking system, method, device and server
US10248508B1 (en) Distributed data validation service
CN112711606A (en) Database access method and device, computer equipment and storage medium
JP4928480B2 (en) Job processing system and job management method
CN113448775B (en) Multi-source heterogeneous data backup method and device
CN118093614A (en) Data consistency and query method, device and system for multiple Neo4j
US20230185817A1 (en) Multi-model and clustering database system
CN113032477B (en) Long-distance data synchronization method and device based on GTID and computing equipment
US11169728B2 (en) Replication configuration for multiple heterogeneous data stores
CN114020446A (en) Cross-multi-engine routing processing method, device, equipment and storage medium
CN108769246B (en) NFS sharing maximization test method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination