CN113259470B - Data synchronization method and data synchronization system - Google Patents

Data synchronization method and data synchronization system Download PDF

Info

Publication number
CN113259470B
CN113259470B CN202110618006.5A CN202110618006A CN113259470B CN 113259470 B CN113259470 B CN 113259470B CN 202110618006 A CN202110618006 A CN 202110618006A CN 113259470 B CN113259470 B CN 113259470B
Authority
CN
China
Prior art keywords
data
node
synchronization
central node
synchronized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110618006.5A
Other languages
Chinese (zh)
Other versions
CN113259470A (en
Inventor
全绍军
洪伟
廖伟健
林格
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Longse Technology Co ltd
Original Assignee
Longse Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Longse Technology Co ltd filed Critical Longse Technology Co ltd
Priority to CN202110618006.5A priority Critical patent/CN113259470B/en
Publication of CN113259470A publication Critical patent/CN113259470A/en
Application granted granted Critical
Publication of CN113259470B publication Critical patent/CN113259470B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Abstract

The invention is suitable for the technical field of internet, and provides a data synchronization method and a data synchronization system, wherein the data synchronization method comprises the following steps: the central node receives data to be synchronized sent by the primary data node and extracts data characteristic information corresponding to the data to be synchronized, wherein the data to be synchronized is sent to the primary data node by a user terminal connected with the secondary data node in a descending mode through the secondary data node; the central node identifies data conflicts based on the data characteristic information of all the data to be synchronized, and determines target synchronous data; and when the central node meets the preset synchronization triggering condition, performing data synchronization on all target synchronization data. By adopting the method and the device, the repeated data can be prevented from being stored in each data node, the utilization rate of the storage space of the data nodes is improved, and the accuracy of data synchronization is improved.

Description

Data synchronization method and data synchronization system
Technical Field
The invention belongs to the technical field of internet, and particularly relates to a data synchronization method and a data synchronization system.
Background
With the development of science and technology, real-time data synchronization among multiple devices, multiple nodes and multiple systems and the requirement for mutual transmission is more and more. The data volume of the synchronous data is larger and larger, and the quality requirement of the synchronous data is higher and higher. How to improve the accuracy of data synchronization becomes an indispensable technical requirement in practical application.
According to the existing data synchronization technology, a central node is often configured in a data synchronization system, a plurality of sub-nodes are connected below the central node, each sub-node can upload data to be synchronized to the directly connected central node, and the central node needs to have the characteristics of high throughput and multiple data interfaces, so that the equipment cost of the central node is greatly increased, and the central node has higher requirements on a network. Moreover, because the central node directly synchronizes the received data to each of the subordinate nodes, if a certain user terminal uploads the same data to a plurality of subordinate nodes, the synchronization process may cause data collision, and the accuracy of data synchronization is reduced.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data synchronization method and a data synchronization system, so as to solve the problem that in the existing data synchronization technology, a plurality of child nodes are connected below a central node, and when the number of system nodes increases, the equipment cost of the central node and the requirements of the network environment of the central node are greatly increased, and when data is synchronized, data collision easily occurs, and the accuracy of data synchronization is low.
The first aspect of the embodiments of the present invention provides a method for data synchronization, which is applied to a multi-node data synchronization system, where the data synchronization system includes more than two data clusters and a central node, and each data cluster includes a primary data node and at least one secondary data node;
the data synchronization method comprises the following steps:
the central node receives data to be synchronized sent by the primary data node and extracts data characteristic information corresponding to the data to be synchronized, wherein the data to be synchronized is sent to the primary data node by a user terminal connected with the secondary data node in a descending mode through the secondary data node;
the central node identifies data conflicts based on the data characteristic information of all data to be synchronized, and determines target synchronous data without data conflicts;
and when the central node meets a preset synchronization triggering condition, performing data synchronization on all the target synchronization data.
A second aspect of the embodiments of the present invention provides a data synchronization system, where the data synchronization system includes more than two data clusters and a central node, and each data cluster includes a primary data node and at least one secondary data node;
the central node is used for receiving data to be synchronized sent by the primary data node and extracting data characteristic information corresponding to the data to be synchronized, wherein the data to be synchronized is sent to the primary data node by a user terminal connected with the secondary data node through the secondary data node;
the central node is used for identifying data conflict based on the data characteristic information of all data to be synchronized and determining target synchronous data without data conflict;
and the central node is used for carrying out data synchronization on all the target synchronization data when a preset synchronization triggering condition is met.
The method and the system for data synchronization provided by the embodiment of the invention have the following beneficial effects:
the embodiment of the invention divides each data node in the data synchronization system into a plurality of different data clusters, and the different data clusters are configured with two data nodes, namely a primary data node and a secondary data node, wherein the secondary data node can upload the data to be synchronized uploaded by a user terminal to the primary data node and transfer the data to be synchronized uploaded by the primary data node to the central node, so that the number of the data nodes directly connected with the central node can be greatly reduced, the equipment requirement and the network requirement on the central node are also greatly reduced, after the central node receives the data to be synchronized uploaded by each primary data node, the data conflict identification can be carried out on all the data to be synchronized, the data with conflict can be identified, the target synchronization data without conflict among the data can be obtained, and the target synchronization data can be synchronized to each data node in the data synchronization system, therefore, repeated data can be prevented from being stored in each data node, the utilization rate of the storage space of the data nodes is improved, and the accuracy of data synchronization is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic structural diagram of a data synchronization system according to a first embodiment of the present invention;
fig. 2 is a flowchart of an implementation of a method for data synchronization according to a first embodiment of the present invention;
fig. 3 is a flowchart illustrating a detailed implementation of a method S203 for data synchronization according to a second embodiment of the present invention;
fig. 4 is a flowchart illustrating a detailed implementation of a method S2035 for data synchronization according to a third embodiment of the present invention;
fig. 5 is a flowchart of a specific implementation of a data synchronization method according to a fourth embodiment of the present invention;
fig. 6 is a flowchart illustrating a detailed implementation of a method S204 for data synchronization according to a fifth embodiment of the present invention;
fig. 7 is a flowchart of a specific implementation of a data synchronization method according to a sixth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the invention divides each data node in the data synchronization system into a plurality of different data clusters, and the different data clusters are configured with two types of data nodes, namely a primary data node and a secondary data node, the secondary data node can upload the data to be synchronized received by a user terminal to the primary data node and transfer the data to be synchronized to the central node by the primary data node, so that the number of the data nodes directly connected with the central node can be greatly reduced, the equipment requirement and the network requirement on the central node are also greatly reduced, after the central node receives the data to be synchronized uploaded by each primary data node, the data conflict identification can be carried out on all the data to be synchronized, the data with conflict can be identified, the conflict data can be filtered, the target synchronization data can be obtained, and the target synchronization data can be synchronized to each data node in the data synchronization system, the problem of current data synchronization's technique, central node concatenate a plurality of child nodes down, under the circumstances that system node number increases, greatly increased central node's equipment cost and central node's network environment's requirement to data conflict's the condition appears easily when data synchronization, data synchronization accuracy is low is solved.
In the embodiment of the invention, the data synchronization system comprises more than two data clusters and a central node, wherein each data cluster comprises a primary data node and at least one secondary data node. The central data node, the primary data node and the secondary data node include but are not limited to: devices capable of performing data synchronization, such as servers, computers, smart phones, laptops, and tablets. Exemplarily, fig. 1 shows a schematic structural diagram of a data synchronization system provided by an embodiment of the present application. Referring to fig. 1, all data nodes in the data synchronization system may be pre-divided into a plurality of different data clusters 10, each data cluster includes a primary data node 11 and a secondary data node 12, and the data synchronization system further includes a central node 20. The data cluster 10 may be divided according to the geographical location of each data node, or may be divided into different data clusters according to other division rules, for example, the data clusters are divided into corresponding data clusters according to different device models or different manufacturers of the data nodes.
In the following, taking the example of dividing the data nodes based on the geographic location as an example, the data synchronization system may configure a global map, and divide the global map into a plurality of geographic areas, each geographic area may be divided into at least one data cluster, and exemplarily, the data synchronization system may divide a chinese area into three geographic areas, namely, a south china area, a middle china area, and a north china area, and configure corresponding data clusters for the geographic areas. In this case, the data synchronization system may determine the data cluster corresponding to the data node according to the geographic area corresponding to the installation location of the data node, for example, if a secondary data node is installed in guangzhou and the guangzhou belongs to a south china area, the secondary data node belongs to the data cluster corresponding to the south china area.
In one possible implementation, the data synchronization system may manage the cluster size of each data cluster. In this case, the data synchronization system may set a corresponding maximum number of nodes for each data cluster, where the maximum number of nodes corresponding to each data cluster may be the same or different, for example, the maximum data node corresponding to a data cluster may be determined according to the network performance corresponding to the area where the data cluster is located, for example, if the total amount of the network bandwidth corresponding to a certain area is large, or the average rate of data transmission is high, the value of the corresponding maximum number of nodes may be large; on the contrary, if the total amount of the network bandwidth corresponding to a certain area is small, or the average rate of data transmission is low, that is, the network state is poor, the value of the corresponding maximum data node may be small. The data synchronization system can manage the data nodes of each data cluster by configuring the maximum node number, so that the data synchronization task overload in the data cluster is avoided, and the data synchronization efficiency and the data cluster stability are not influenced.
In a possible implementation manner, in order to improve the flexibility of networking, the data synchronization system may dynamically adjust the data nodes in the data cluster. If the number of the data nodes contained in any data cluster is larger than the preset maximum number of the nodes, a new data cluster can be configured for the geographic area, and the newly added data nodes exceeding the maximum number of the nodes are added into the newly configured data cluster, so that the total number of the data nodes of each data cluster can be kept not to exceed the maximum number of the nodes. In this case, if the total number of data nodes of any two data clusters in the geographic area is less than the preset maximum number of nodes, the two data clusters in the geographic area may be merged, so that the number of data clusters can be reduced, and the aggregation degree of the data nodes is improved.
In a possible implementation manner, more than two levels of data nodes are also configured in the data cluster, that is, the second level data nodes may not be directly connected to the user terminal, but may be connected with corresponding third level data nodes, of course, the third level data nodes may also be connected with corresponding fourth level data nodes, the networking manner is determined specifically according to the actual synchronization requirement of the data synchronization system, and the cascade level of the data cluster is not limited here.
In this embodiment, the central node may be independent from each data cluster, that is, the central node may not belong to any data cluster. In one possible implementation, the central node belongs to a data cluster in the data synchronization system. For example, if the installation location of the central node is guangzhou and the data synchronization system is divided into a south china area data cluster, a central china area data cluster and a north china area data cluster based on the geographic area, the central node may belong to the south china area data cluster. In some embodiments, if a central node in the data synchronization system is updated, the central node before updating may be replaced by the updated primary data node, and at this time, the replaced central node may be used as the primary data node or the secondary data node corresponding to the data cluster to which the central node belongs.
Specifically, the primary data node is configured to receive data to be synchronized uploaded by the secondary data node; the data to be synchronized is uploaded by the user terminal of the secondary data node downlink;
the central node is used for receiving the data to be synchronized sent by the primary data node and extracting data characteristic information corresponding to the data to be synchronized;
the central node is used for identifying data conflict based on the data characteristic information of all data to be synchronized and determining target synchronous data without data conflict;
and the central node is used for carrying out data synchronization on all the target synchronization data when a preset synchronization triggering condition is met.
Optionally, the central node is configured to perform data collision identification based on data characteristic information of all data to be synchronized, and determine target synchronization data without data collision therebetween, where the data collision identification includes:
the central node is used for calculating a first similarity between any two data to be synchronized according to the data uploading time in the data characteristic information and the equipment identification to which the data uploading time belongs;
the central node is configured to identify any two pieces of data to be synchronized as candidate collision data if the first similarity between any two pieces of data to be synchronized is greater than a preset first similarity threshold;
the central node is used for determining a feature extraction algorithm associated with the data type based on the data type of the candidate conflict data;
the central node is used for importing the candidate conflict data into the feature extraction algorithm to obtain a data feature vector corresponding to the candidate conflict data;
the central node is used for calculating a second similarity between the candidate conflict data based on the data feature vector;
the central node is configured to identify candidate conflict data with the second similarity not greater than a preset second similarity threshold as target synchronization data.
Optionally, the data feature vector includes a content dimension vector and an application dimension vector; the content dimension vector is used to identify content features of the candidate conflict data; the application dimension vector is used for representing all application programs which process the candidate conflict data; the central node is configured to calculate a second similarity between the candidate collision data based on the data feature vector, and includes:
the central node is used for calculating the vector distance between the data characteristic vectors of the candidate conflict data and determining a first similarity factor based on the vector distance;
the central node is configured to determine, according to the application dimension vector, a processing order corresponding to each application program that has processed the candidate conflict data, and determine a weighting weight of each application program based on the processing orders corresponding to all the application programs;
the central node is used for calculating a second similarity factor between the candidate conflict data based on the weighted weights of all the application programs and the application dimension vector; the calculation algorithm of the second similarity factor is as follows:
Figure 394922DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 231291DEST_PATH_IMAGE002
is the second similarity factor;
Figure 738496DEST_PATH_IMAGE003
the value of the ith application program in the application dimension vector corresponding to one candidate conflict data;
Figure 423074DEST_PATH_IMAGE004
the weighting weight corresponding to the ith application program;
Figure 597703DEST_PATH_IMAGE005
the value of the jth application program in the application dimension vector corresponding to another candidate conflict datum;
Figure 604973DEST_PATH_IMAGE006
the weighting weight corresponding to the jth application;na total number of applications processed for said one of the candidate conflict data;ma total number of applications processed for the another candidate conflict data;
the central node is configured to obtain the second similarity based on the first similarity factor and the second similarity factor.
Optionally, the central node is further configured to:
if the preset node updating triggering condition is met, acquiring node characteristic information of a primary data node corresponding to each data cluster; the node characteristic information comprises a node position, a node performance parameter and a node network parameter corresponding to the central node;
respectively determining position dimension scores corresponding to the primary data nodes based on all the node positions;
determining a data volume average value of the data synchronization system based on the data volumes of all uploaded data, and respectively determining a network dimension score corresponding to each primary data node based on the data volume average value and the node network parameters;
determining a maximum performance parameter according to a locally corresponding reference performance parameter and all the node performance parameters, and determining a performance dimension score corresponding to each primary data node based on the maximum performance parameter;
determining an evaluation index corresponding to each primary data node based on the position dimension score, the network dimension score and the performance dimension score;
if the evaluation index of any one primary data node is larger than the reference index corresponding to the central node, node updating information is sent to all data nodes in the data synchronization system; and the node updating information contains the node identification of the primary data node with the evaluation index larger than the reference index, so that the primary data node with the evaluation index larger than the reference index is used as a new central node.
Optionally, the central node is configured to perform data synchronization on all the target synchronization data when a preset synchronization triggering condition is met, and the data synchronization includes:
the central node is used for determining a target synchronization node for data synchronization based on a preset synchronization subscription list;
the central node is used for acquiring the synchronization information of the synchronized data corresponding to the target synchronization node;
the central node is used for determining incremental synchronization data based on the synchronization information and all the target synchronization data;
and the central node is used for sending the incremental synchronization data to the target synchronization node.
Optionally, the central node is further configured to:
if synchronization failure information sent by the target synchronization node is received or synchronization completion information fed back by the target synchronization node is not received within a preset effective feedback time length, increasing the count value of an abnormal synchronization counter;
and if the count value is greater than a preset abnormal threshold value, carrying out full data synchronization operation on the target synchronization node.
Optionally, the secondary data node is further configured to:
if the secondary data node receives node abnormal information broadcast by a primary data node in an associated data cluster, acquiring node information of all secondary data nodes in the data cluster;
and determining a target secondary node based on all the node information, and broadcasting primary node updating information to the data cluster so as to take the target secondary node as a new primary data node in the data cluster.
For convenience of explaining the data synchronization method provided in the embodiment of the present application, fig. 2 shows an interaction flowchart of the data synchronization method provided in the first embodiment of the present invention, which is detailed as follows:
in S201, the primary data node receives data to be synchronized uploaded by the secondary data node; and the data to be synchronized is uploaded by the user terminal of the secondary data node downlink.
In this embodiment, the data cluster includes a primary data node and a secondary data node. The primary data nodes are used as data summarizing nodes in the data cluster and can receive data to be synchronized uploaded by each secondary data node; the secondary data node is used as a data collection node in the data cluster, can be directly connected with each user terminal, and receives data uploaded by the user terminals.
Exemplarily, an application scenario of video synchronous monitoring is taken as an example for explanation. The data synchronization system is specifically a video synchronization monitoring system. The data synchronization system comprises a plurality of video acquisition devices (namely, the user terminals) distributed in different areas, wherein the video acquisition devices can acquire video data acquired from each area and upload the video data to associated data storage devices, and the data storage devices are secondary data nodes in the video synchronization monitoring system and are used for storing the video data acquired by one or more video acquisition devices. At this time, the data storage device may upload the received video data to an associated database server, that is, a primary data node in the associated data cluster, and the primary data node may receive the video data sent by each data storage device in the data cluster.
In this embodiment, the secondary data node may receive data uploaded by a user terminal that is connected downstream, and the data uploaded by each user terminal may be configured with a corresponding data identifier, where the data identifier may be determined according to a device number of the user terminal and a corresponding upload time. For example, the data identifier includes 2 bytes, where one byte is used to store a device identifier of the user terminal, and another byte is used to store an upload time corresponding to the data, where the device identifier of the user terminal is a unique number for the user terminal in the entire data synchronization system, so that a user terminal can be uniquely determined in the data synchronization system based on the device identifier.
In this embodiment, the secondary data node may store data synchronization information. The data synchronization information records the data identification which is uploaded. The secondary data node may determine, according to the data synchronization information, that synchronization is not completed, that is, data that is not uploaded to the primary data node, identify the data that is not uploaded to the primary data node as data to be synchronized, and send the data to be synchronized to the primary data node.
In a possible implementation manner, the secondary data node may send the data to be synchronized to the primary data node when receiving the data to be synchronized sent by the user terminal.
In a possible implementation manner, the secondary data node may be configured with a corresponding data synchronization triggering condition, and when the data synchronization triggering condition is satisfied, the data to be synchronized may be uploaded to the primary data node. The data synchronization triggering condition can be a time triggering condition, namely the data to be synchronized is sent to the primary data node when a preset uploading moment is reached; the data synchronization triggering condition may also be a data threshold, and when it is detected that the total data amount of the data to be synchronized is greater than the preset data threshold, the secondary data node sends all the data to be synchronized to the primary data node. Certainly, the secondary data node may also determine whether to perform an operation of data uploading according to the current network state; and if the current network state meets a preset uploading trigger condition (if the available network bandwidth is greater than a preset bandwidth threshold, and/or the network transmission rate is greater than a preset rate threshold, and/or the network signal-to-noise ratio is greater than a preset ratio), the secondary data node sends the data to be synchronized to the primary data node.
In S202, the central node receives the data to be synchronized sent by the primary data node, and extracts data feature information corresponding to the data to be synchronized.
In this embodiment, the primary data node may send the received data to be synchronized uploaded by each secondary data node to the central node, that is, the primary data node is a secondary central node in the data cluster where the primary data node is located, so that each secondary data node in the data cluster does not need to be directly connected to the central node, the number of data interfaces required by the central node is reduced, and the load pressure on data transmission of the central node is also reduced. In part of application scenarios, if a plurality of user terminals in the data synchronization system simultaneously generate data to be synchronized and upload the data to the secondary data nodes, and at this time, the plurality of secondary data nodes simultaneously send the data to be synchronized to the central node, the data throughput in the whole data synchronization system is large, the network load of the data synchronization system is greatly increased, and the stability of the data synchronization system is affected. The primary data nodes are configured for the data cluster, the data to be synchronized of each secondary data node is received through the primary data nodes, and then the data to be synchronized is forwarded to the central node, so that the load of the network where the central node is located can be achieved, and the stability of the data synchronization system is improved.
The first-level data node may directly send the data to be synchronized to the central node, or may be configured with a corresponding data synchronization trigger condition, and when a preset data synchronization trigger condition is met, the data to be synchronized is uploaded to the central node.
In a possible implementation manner, a corresponding node update condition may be configured in the data cluster, and if a preset node update condition is satisfied, the primary data node in the data cluster may be updated. The node upgrading condition may be an update period, and the first-level data node may be updated in the data cluster in a preset update period. In this case, the data cluster may select, in each update period, the secondary data node corresponding to the round value order as the primary data node of the data cluster based on the preset round value order, and the primary data node of the previous update period may be re-used as the secondary data node of the data cluster, and so on.
In a possible implementation manner, the above updating manner of the primary data node may be applied to a scenario where performance parameters of all data nodes in the data cluster are consistent. Because the performance parameters of all the data nodes in the data cluster are consistent and have no difference between the advantages and the disadvantages, any data node can be used as a primary data node of the data cluster, other data nodes are used as secondary data nodes, and the primary data node is changed when each updating period arrives in a round value mode, so that the primary data node is prevented from being in a high-writing and high-reading state for a long time, the service life of equipment of the primary data node in the data cluster can be prolonged, and the use loss of the primary data node is reduced.
In S203, the central node performs data collision identification based on the data characteristic information of all the data to be synchronized, and determines target synchronization data.
In this embodiment, after receiving the data to be synchronized sent by the primary data node of each data cluster, the central node may first perform a collision identification operation on all the data to be synchronized. Since part of the user terminals may upload the same data to different secondary data nodes, different secondary data nodes all send the same data to the primary data node, and then the primary data node may serve as data to be synchronized and upload the data to the central node, at this time, if the same data is synchronized to each secondary data node, a situation that the secondary data node which originally stores the data has a synchronized data conflict may occur, and for other secondary data nodes, a storage space may be wasted for storing the same two data. Therefore, in order to avoid the above situation, before performing data synchronization, the central node performs data collision identification on all data to be synchronized, and filters out abnormal data with data collision, so as to obtain the above target synchronization data.
In this embodiment, the central node is configured with a data feature extraction algorithm, and respectively imports each data to be synchronized into the data feature extraction algorithm, so as to determine data feature information corresponding to each data to be synchronized.
In a possible implementation manner, if the data to be synchronized includes a data identifier, the data identifier may be used as data characteristic information of the data to be synchronized. Wherein, the data identification is determined according to the equipment number of the user terminal and the data uploading time. In this case, the performing of the data collision recognition may specifically be: and identifying two or more data to be synchronized, which have the same equipment number and have the difference value between the data uploading times of a preset jitter threshold value, as a conflict data group, only retaining one data to be synchronized in the conflict data group, and deleting other data to be synchronized in the conflict data. For example, if the data identifier corresponding to the data a to be synchronized is { a device, 8:11}, the data corresponding to the data B to be synchronized is represented as { a device, 8:11}, the user terminals uploading the data a to be synchronized and the data B to be synchronized are both a devices, and the uploading time is close, the two data to be synchronized are identified as a conflicting data group, one of the data to be synchronized is retained, for example, the data a to be synchronized is retained, and the other data to be synchronized is deleted, that is, the data B to be synchronized is deleted.
In one possible implementation, the central node is configured with a duplicate search script. The central node may extract feature keywords in the data feature information through the duplicate search script, and if the number of the same feature keywords between any two pieces of data to be synchronized is greater than a preset number threshold, for example, all feature keywords extracted based on the data feature information are the same, identify that the two pieces of data to be synchronized are duplicate data, in this case, identify that the two pieces of data to be synchronized are duplicate data, delete one piece of data to be synchronized, and retain the other piece of data.
In S204, when the central node meets a preset synchronization trigger condition, performing data synchronization on all the target synchronization data.
In this embodiment, the central node may be configured with a preset synchronization trigger condition. And when the central node detects that the preset synchronization triggering condition is met, synchronizing the target synchronization data to all data nodes in the data synchronization system. The data synchronization mode may be as follows: the central node sends the target synchronous data to the primary data nodes in each data cluster and sends the target synchronous data to each secondary data node through the primary data nodes; or, the central node may generate a full-network synchronous broadcast in the data synchronization system, where the full-network synchronous broadcast carries the target synchronous data, so that the data nodes (including the primary data node and the secondary data node) that receive the full-network synchronous broadcast may store the target synchronous data locally, thereby achieving the purpose of data synchronization.
In a possible implementation manner, the data synchronization performed by the central node on all the target synchronization data specifically includes: the central node determines a data cluster corresponding to the uploaded target synchronous data, takes other clusters except the data cluster of the uploaded target synchronous data as a cluster to be synchronized, and sends the target synchronous data to a primary data node in the cluster to be synchronized, so that the primary data node sends the target synchronous data to each secondary data node in the cluster to be synchronized. After the target synchronization data is sent to the primary data node through the secondary data node, the primary data node can send the target synchronization data to each secondary data node in the data cluster and send the target synchronization data to the central node, that is, the target synchronization data is synchronized in the data cluster to which the target synchronization data belongs, so that the central node does not need to send the target synchronization data to the data nodes in the data cluster to which the central node belongs again, and unnecessary data transmission operation is reduced.
In a possible implementation manner, if the data synchronization system is a real-time data synchronization system, the synchronization trigger condition is null, and when the central node acquires target synchronization data that needs to be synchronized, the central node immediately sends the target synchronization data to each data node in the data synchronization system, so as to implement data synchronization.
It can be seen from the above that, the data synchronization method provided in the embodiments of the present invention divides each data node in a data synchronization system into a plurality of different data clusters, and the different data clusters configure two types of data nodes, which are respectively a primary data node and a secondary data node, the secondary data node can upload data to be synchronized received by a user terminal to the primary data node and upload the data to a central node from the primary data node, so that the number of data nodes directly connected to the central node is greatly reduced, thereby greatly reducing the device requirement and the network requirement for the central node, and after the central node receives the data to be synchronized uploaded by each primary data node, the central node can perform data collision identification on all data to be synchronized, can identify data with collision, and perform filtering processing on the collision data, the target synchronous data are obtained and synchronized to each data node in the data synchronization system, so that repeated data can be prevented from being stored in each data node, the storage space utilization rate of the data nodes is improved, and the accuracy of data synchronization is improved.
Fig. 3 shows a flowchart of a specific implementation of the method S203 for data synchronization according to the second embodiment of the present invention. Referring to fig. 3, with respect to the embodiment shown in fig. 2, in the method for data synchronization provided by this embodiment, S203 includes: S2031-S2036, which is detailed as follows:
further, the central node performs data collision recognition based on the data characteristic information of all the data to be synchronized, and determines target synchronization data, including:
in S2031, the central node calculates a first similarity between any two pieces of data to be synchronized, according to the data upload time in the data feature information and the device identifier to which the central node belongs.
In this embodiment, the central node may perform a preliminary screening on all data to be synchronized based on two dimensional information in the data feature information, and select candidate collision data with a high collision probability. The two selected dimension information are respectively data uploading time and the device identifier of the data, that is, whether each data to be synchronized is uploaded by the same user terminal and whether the uploading time corresponding to each data to be synchronized is the same or similar is determined. The device identifier may be a device model and a physical address of the user terminal, or a network address corresponding to a network where the user terminal is located, a port number for uploading data, or the like.
In this embodiment, a first similarity calculation algorithm is preset in the central node, and the data upload time and the device identifier of any two pieces of data to be synchronized may be imported into the first similarity calculation algorithm to determine a first similarity between the two pieces of data to be synchronized. The first similarity calculation algorithm may specifically be:
Figure 865053DEST_PATH_IMAGE007
wherein the content of the first and second substances,
Figure 678289DEST_PATH_IMAGE008
the first similarity is the first similarity;
Figure 582791DEST_PATH_IMAGE009
uploading data of one piece of data to be synchronized;
Figure 885596DEST_PATH_IMAGE010
for one of the associated devices to be synchronizedIdentifying;
Figure 508339DEST_PATH_IMAGE011
the data uploading time of another data to be synchronized;
Figure 922002DEST_PATH_IMAGE012
is identified for the belonging device of another data to be synchronized,
Figure 681011DEST_PATH_IMAGE013
is a preset weight of the time dimension,
Figure 154718DEST_PATH_IMAGE014
for a preset device dimension weight,diffi1x,yI is a predetermined difference function.
In a possible implementation manner, if the first similarity between any two pieces of data to be synchronized is less than or equal to a preset first similarity threshold, it indicates that the two pieces of data to be synchronized are not similar, that is, not collision data, and the two pieces of data to be synchronized may be identified as target synchronization data.
In S2032, if the first similarity between any two pieces of data to be synchronized is greater than a preset first similarity threshold, the central node identifies the any two pieces of data to be synchronized as candidate conflict data.
In this embodiment, when the central node detects that the first similarity between any two pieces of data to be synchronized is greater than a preset first similarity threshold, it indicates that the two pieces of data to be synchronized may be mutually conflicting data, and in order to improve accuracy of identifying conflicting data, further identification is required. Therefore, the two data to be synchronized are identified as candidate conflict data.
In S2033, the central node determines, based on the data type of the candidate collision data, a feature extraction algorithm associated with the data type.
In this embodiment, the content forms stored in different types of data and the applications used for editing the data may also differ, and based on this, the central node first needs to determine the data type corresponding to the candidate conflict data to determine the corresponding feature extraction algorithm. For example, if the two candidate conflicting data types are text types and the stored content is a character, the data feature vector can be determined by a corresponding keyword extraction algorithm; if the two candidate conflicting data types are audio types, the stored content is a sound track, and the data feature vector can be determined through a corresponding audio analysis algorithm. Therefore, the central node may store feature extraction algorithms associated with different data types, and after determining the data type corresponding to the candidate conflicting data, may obtain the feature extraction algorithm corresponding to the candidate conflicting data, so as to perform the operation of S1034.
In S2034, the central node imports the candidate collision data into the feature extraction algorithm to obtain a data feature vector corresponding to the candidate collision data; the data feature vector comprises a content dimension vector and an application dimension vector; the content dimension vector is used to identify content features of the candidate conflict data; the application dimension vector is used to represent all applications that have processed the candidate conflict data.
In this embodiment, the central node may import the two candidate collision data into a feature extraction algorithm associated with the data types of the two candidate collision data, respectively, so as to generate a data feature vector corresponding to the candidate collision data. The data feature vector specifically includes two dimension vectors, which are a content dimension vector and an application dimension vector. The content dimension vector is specifically used for characterizing content features of the candidate conflict data, and for example, if the data type corresponding to the candidate conflict data is a text type, the content dimension vector may be generated based on text keywords extracted from the candidate conflict data; if the data type corresponding to the candidate conflict data is an image type, the content dimension vector may be a vector obtained by performing convolution feature extraction on image data in the candidate conflict data.
In this embodiment, each candidate conflict data is generated by an application program on the user terminal before being uploaded, and after processing the data, each application program may add a corresponding edit identifier to a field corresponding to the data. Based on the above, the central device may identify the relevant fields in the candidate conflict data, determine all the application programs that have been used to edit the candidate conflict data, and generate the corresponding application dimension vector according to the editing order and the application program identifier that each application program corresponds to the candidate conflict data.
For example, if a candidate conflict data is a text, the text is edited by the following three applications: 1. in the tablet application → 2.word document editing application → 3.WPS editing application, the central node may determine all application programs that have processed the candidate conflict data by identifying the file suffix of the candidate conflict data and reading the parameter of the associated byte, and generate a corresponding application dimension vector, for example, the application dimension vector constituted by the above example may be: [ txt, word, WPS ].
In S2035, the central node calculates a second similarity between the candidate collision data based on the data feature vector.
In this embodiment, after determining the data feature vectors corresponding to the two candidate collision data, the central node may calculate the corresponding second similarity according to the data feature vectors. Since the data feature vector includes the content dimension vector and the application dimension vector, it can be determined whether the two candidate conflict data are the same data from two aspects of content similarity and similarity of the processed application program. Because part of the data can be edited by the application program, for example, a punctuation mark is corrected by the application program, the similarity of the contents between the two data is large, and it is difficult to distinguish whether the two data are the same conflict data, and at this time, the data are distinguished from each other by the type of the application program which has processed the data and the corresponding processing times (i.e., application dimension vector). Based on the data similarity calculation model, the central node can lead the data feature vectors into a preset similarity calculation model, and determines a second similarity between candidate conflict data.
In one possible implementation, the central node may calculate a vector distance between the two data feature vectors, and determine a second similarity between the two candidate collision data based on the vector distance. The vector distance is determined based on two distance factors, namely a content dimension distance factor and an application dimension distance factor, the center node can calculate the distance between the content dimension vectors and the distance between the application dimension vectors respectively so as to obtain the two distance factors, the two distance factors are weighted and superposed based on the weight values associated with the content dimension and the application dimension, the vector distance between the two data feature vectors is calculated, and the second similarity is determined based on the vector distance.
Further, fig. 4 shows a flowchart of a specific implementation of S2035 provided in another embodiment of the present application. Referring to fig. 4, S2035 provided in the present embodiment includes S401 to S404, which are described in detail as follows:
in S401, the central node calculates a vector distance between the content dimension vectors of the candidate collision data, and determines a first similarity factor based on the vector distance.
In this embodiment, the central node may extract content dimension vectors in the data feature vectors, and introduce the content dimension vectors of the candidate collision data into a preset vector distance calculation function, so as to determine a vector distance between two content dimension vectors. The vector distance calculation function may be an euclidean distance calculation function, or a mahalanobis distance calculation function.
In S402, the central node parses the application dimension vector, determines a processing order of each application program that has processed the candidate conflict data, and determines a weighting weight of each application program based on the processing order corresponding to all the application programs.
In this embodiment, the central node may extract an application dimension vector from the data feature vector, analyze the application dimension vector, and determine the program identifier and the corresponding processing order of each application program that has processed the candidate conflict data. As described in the above example, if a candidate conflict data is processed by three applications, the corresponding application dimension vector is: and [ txt, word, WPS ], the central node analyzes the application dimension vector, and can determine the application programs which process the candidate conflict data and the corresponding processing sequence of each application program. It should be noted that, the same application program may process candidate conflict data for multiple times, and then multiple corresponding elements may exist in the application dimension vector, for example, if a certain candidate conflict data is edited twice by the word document application, the application dimension vector may be: [ txt, word, word, WPS ].
In this embodiment, different processing orders may configure the corresponding weighting weights. Therefore, after the processing order of each application program is determined, the weighting weight corresponding to the processing order can be acquired based on the correspondence between the preset processing order and the weight.
In S403, the central node calculates a second similarity factor between the candidate conflict data based on the weighted weights of all the applications and the application dimension vector; the calculation algorithm of the second similarity factor is as follows:
Figure 999177DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 950952DEST_PATH_IMAGE002
is the second similarity factor;
Figure 157943DEST_PATH_IMAGE015
the value of the ith application program in the application dimension vector corresponding to one candidate conflict data;
Figure 677917DEST_PATH_IMAGE004
the weighting weight corresponding to the ith application program;
Figure 399885DEST_PATH_IMAGE005
the value of the jth application program in the application dimension vector corresponding to another candidate conflict datum;
Figure 30718DEST_PATH_IMAGE006
the weighting weight corresponding to the jth application;na total number of applications processed for said one of the candidate conflict data;mis the total number of applications processed for the other candidate conflict data.
In this embodiment, the element values corresponding to each application program are subjected to weighting operation based on the processing order of each application program, and are subjected to iterative superposition, if the application programs processed by the corresponding processing orders in the two candidate conflict data are the same, the corresponding numerical values obtained through calculation are larger, so that the corresponding similarity is higher, and the probability that the two candidate conflict data are the same is larger; on the contrary, if the application programs processed by the corresponding processing orders in the two candidate conflict data are different, the corresponding numerical value obtained by calculation is smaller, so that the corresponding similarity is smaller.
In S404, the central node obtains the second similarity based on the first similarity factor and the second similarity factor.
In this embodiment, after determining the first similarity factor corresponding to the content dimension vector and the second similarity factor corresponding to the application dimension vector, the central node may determine the second similarity between the candidate collision vectors based on the two similarity factors. The central node may directly superimpose the first similarity factor and the second similarity factor to obtain the second similarity, or may calculate an average of the first similarity factor and the second similarity factor as the second similarity.
In the embodiment of the application, whether the candidate conflict data are the same conflict data can be evaluated from two dimensions by respectively calculating the first similarity factor corresponding to the content dimension vector and the second similarity factor corresponding to the application dimension vector, so that the accuracy of conflict data identification is improved.
In S2036, candidate conflict data having the second similarity not greater than a preset second similarity threshold is identified as target synchronization data.
In this embodiment, if the central node detects that the second similarity between the two candidate collision data is smaller than a preset second similarity threshold, it indicates that the two candidate collision data are not the same collision data, and both the two candidate collision data may be identified as target synchronization data to perform data synchronization on each data node in the system.
In a possible implementation manner, if the second similarity is greater than a preset second similarity threshold, the two candidate collision data are identified as collision data, one of the candidate collision data is deleted, the other candidate collision data is reserved, and the reserved candidate collision data is identified as target synchronization data.
In the embodiment of the application, the data to be synchronized is subjected to similarity calculation twice, and conflict identification is performed on the data to be synchronized from four dimensions of uploading time, the equipment to which the data to be synchronized belongs, content characteristics and an application program, so that the accuracy of conflict data identification can be improved.
Fig. 5 is a flowchart illustrating a specific implementation of a method for data synchronization according to a third embodiment of the present invention. Referring to fig. 5, with respect to the embodiment shown in fig. 2, before the receiving, by the central node, the data to be synchronized sent by the primary data node and extracting the data feature information corresponding to the data to be synchronized, the method for data synchronization provided in this embodiment further includes: S501-S506, detailed details are as follows:
further, before the central node receives the data to be synchronized sent by the primary data node and extracts the data characteristic information corresponding to the data to be synchronized, the method further includes:
in S501, if the central node meets a preset node update triggering condition, node feature information of a primary data node corresponding to each data cluster is obtained; the node characteristic information comprises a node position, a node performance parameter and a node network parameter corresponding to the central node.
In this embodiment, the central node may be changed periodically, so that the data node with better performance can be selected from the data synchronization system as the central node of the whole data synchronization system. Based on this, when the central node meets the preset node update triggering condition, the node characteristic information corresponding to each primary data node in the data cluster can be acquired. The update triggering condition may be a time condition, for example, a corresponding update period is set, or may be an event triggering condition, for example, when the central node detects that the current network fluctuation is greater than a certain fluctuation threshold, if the duration of the network throughput that is less than the preset threshold is greater than a certain time threshold, it is identified that the node update triggering condition is satisfied.
In this embodiment, the node characteristic information acquired by the central node includes information of three dimensions, which are a geographical dimension, an equipment performance dimension, and a network state dimension of the network where the node characteristic information is located, and the corresponding decibels are a node position, a node performance parameter, and a node network parameter.
In S502, the central node determines, based on the positions of all the nodes, a position dimension score corresponding to each of the primary data nodes, respectively.
In this embodiment, the central node may respectively determine the position dimension score corresponding to each primary data node according to the node position corresponding to each primary data node. Each primary data node needs to upload data to be synchronized to a central node, so that if the geographic position of the central node is closer to the network center of the whole data synchronization system, the average hop route of each primary data node is shorter, the central node can calculate the distance value between each primary data node and each primary data node of other data clusters according to the node position, and determine the position dimension score corresponding to the primary data node based on the distance value.
In S503, the central node determines a data volume average value of the data synchronization system based on the data volumes of all uploaded data, and determines a network dimension score corresponding to each of the primary data nodes based on the data volume average value and the node network parameters.
In this embodiment, the central device stores all the uploaded data, and calculates a data volume average value corresponding to the data synchronization system based on the data volume corresponding to each uploaded data. The larger the average value of the data volume is, the larger the network requirement on the data synchronization system is; conversely, if the average value of the data volume is smaller, the network requirement on the data synchronization system is smaller. Based on the data volume average value and the node network parameters of the primary data node, the central node can calculate the network dimension score. The network dimension score can be determined according to the ratio between the node network parameter and the data quantity average value and based on the ratio. If the numerical value of the ratio is larger, the corresponding network dimension score is larger; conversely, if the numerical value of the ratio is smaller, the corresponding network dimension score is smaller.
In a possible implementation manner, the central node may determine a network parameter extremum according to the data volume average value, and configure the network dimension score as a preset score value if the node network parameter is greater than the network parameter extremum, that is, after the node network parameter is greater than the preset network parameter extremum, the corresponding network dimension score is not continuously increased, and a corresponding preset score value is obtained. Since it is not necessary for the data synchronization network with a small transmission data amount to belong to when the network transmission rate is greater than a certain value, a fixed value can be configured.
In S504, the central node determines a maximum performance parameter according to the locally corresponding reference performance parameter and all the node performance parameters, and determines a performance dimension score corresponding to each of the primary data nodes based on the maximum performance parameter.
In this embodiment, the central node may obtain its own reference performance parameter, and select the maximum performance parameter from the reference performance parameter and the node performance parameters of all the first-level data nodes. And determining a performance dimension score corresponding to the primary data node based on the ratio of the maximum performance parameter to the node performance parameter of the primary data node.
In S505, the central node determines an evaluation index corresponding to each of the primary data nodes based on the location dimension score, the network dimension score, and the performance dimension score.
In this embodiment, the central node may calculate the evaluation index of the data node based on the scores corresponding to the three dimensions. The central node may configure corresponding weighting weights for each dimension, perform weighted superposition based on each weighting weight and the corresponding dimension score, and calculate the above-mentioned evaluation index. If the numerical value of the evaluation index is larger, the evaluation index is more suitable to be used as a central node; conversely, if the evaluation index is smaller, the evaluation index is not suitable as the center node.
In a possible manner, if the evaluation index of each primary data node is less than or equal to the reference index corresponding to the central node, it indicates that no primary data node is more suitable for the central node than the current central node, and at this time, the central node of the data synchronization system is not replaced, that is, the node update information is not required to be sent.
In S506, if the evaluation index of any one of the primary data nodes is greater than the reference index corresponding to the central node, the central node sends node update information to all data nodes in the data synchronization system; and the node updating information contains the node identification of the primary data node with the evaluation index larger than the reference index, so that the primary data node with the evaluation index larger than the reference index is used as a new central node.
In this embodiment, if the central node detects that the evaluation index of one of the primary data nodes is greater than the reference index of the central node, it indicates that the primary data node is more suitable for being used as the central node of the data synchronization system, and at this time, the central node may use the primary data node with the largest evaluation index as a new central node, generate corresponding node update information, and send the node update information to the data nodes in each data synchronization system, so that the primary data nodes of subsequent data clusters send data to be synchronized to the new central node. It should be noted that, the calculation method of the reference index of the central node may refer to the calculation method of the evaluation index, and is not described herein again.
In a possible implementation manner, the replaced central node may serve as a primary data node of the data cluster to which the central node belongs, may also serve as a primary data node of the data cluster to which the new central node belongs, and of course, may also serve as a secondary data node of the data cluster to which the central node belongs.
In the embodiment of the application, when the preset node update triggering condition is met, the central node can acquire the node characteristic information of the primary data nodes of each data cluster, so that whether the more suitable data nodes exist as the central node is determined, and when the condition that the evaluation index of the primary data nodes is larger than the self reference index is detected, the central node of the data synchronization system is replaced, so that the central node is ensured to be in a high-efficiency running state, and the synchronization efficiency of the whole data synchronization system is improved.
Fig. 6 shows a flowchart of a specific implementation of the method S204 for data synchronization according to the fourth embodiment of the present invention. Referring to fig. 6, with respect to the embodiment shown in fig. 2, a method S204 for data synchronization provided by this embodiment includes: S2041-S2046, which is detailed as follows:
further, when the central node meets a preset synchronization trigger condition, performing data synchronization on all the target synchronization data, including:
in S2041, the central node determines a target synchronization node for data synchronization based on a preset synchronization subscription list.
In this embodiment, the central node stores a synchronization subscription list, and the synchronization subscription list records a node identifier of a data node that needs to perform data synchronization. The central node can determine a target synchronization node which needs to perform data synchronization by reading the synchronization subscription list.
In S2042, the central node acquires synchronization information of the synchronized data corresponding to the target synchronization node.
In this embodiment, the central node sends a synchronization information obtaining request to the target synchronization node, and the target synchronization node may send the synchronization information to the central node after receiving the synchronization information obtaining request. The synchronization information may be information on data whose synchronization has been completed, and may be, for example, a data number whose synchronization has been completed.
In S2043, the central node determines incremental synchronization data based on the synchronization information and all of the target synchronization data.
In this embodiment, the central node may determine, according to the synchronization information, target synchronization data that is not synchronized by the target synchronization node from all the target synchronization data, and use the target synchronization data that is not synchronized as incremental synchronization data.
In S2044, the central node sends the incremental synchronization data to the target synchronization node.
In this embodiment, the central node sends the incremental synchronization data to the target synchronization node to implement incremental synchronization. It should be noted that, because the synchronization states of different data nodes are different, the above operations can be performed on different target synchronization nodes, and corresponding incremental synchronization data is determined for different target synchronization nodes.
In S2045, if the central node receives synchronization failure information sent by the target synchronization node, or does not receive synchronization completion information fed back by the target synchronization node within a preset effective feedback duration, the central node increases a count value of an abnormal synchronization counter.
In this embodiment, the target synchronization node may send a synchronization completion message to the central node after receiving the incremental synchronization data, so as to notify the central node to complete the incremental synchronization. Based on this, if the central node does not receive the synchronization completion information sent by the target data node within the preset effective feedback duration, the target data node may be in a network fluctuation or abnormal state, and further abnormality identification is required, and the count value of the abnormal synchronization counter may be increased.
In this embodiment, the target synchronization node may not send synchronization completion information to the central node, and in a case that data synchronization is not completed, send synchronization failure information to the central node. At this time, the central node may also increment the count value of the abnormal synchronization counter and resend the incremental synchronization data to the target synchronization node.
In a possible implementation manner, if synchronization completion information sent by the target synchronization node is received in a subsequent synchronization process, or the central node determines that the target synchronization node completes incremental synchronization, the abnormal synchronization counter may be cleared. I.e. if the abnormal synchronization counter is used to determine the number of consecutive failed synchronizations of the target synchronization node.
In S2046, if the count value is greater than a preset abnormal threshold, the central node performs full data synchronization operation on the target synchronization node.
In this embodiment, if the count value of the exception and synchronization counter is greater than the preset exception threshold, it indicates that the target synchronization node cannot complete the data synchronization operation for multiple times, and at this time, the central node performs full data synchronization on the target synchronization node, that is, all local data that needs to be synchronized is sent to the target synchronization node.
In the embodiment of the application, the central node performs incremental synchronization on each target synchronization node, so that the efficiency of data synchronization can be improved; on the other hand, under the condition that the target synchronization node fails to synchronize for multiple times, full synchronization is carried out, and the accuracy of data synchronization can be improved.
Fig. 7 is a flowchart illustrating a specific implementation of a method for data synchronization according to a fifth embodiment of the present invention. Referring to fig. 7, with respect to the embodiment described in any one of fig. 2 to 6, the method for data synchronization provided in this embodiment further includes: S701-S702, which are detailed as follows:
in S701, if the secondary data node receives node exception information broadcast by a primary data node in the associated data cluster, node information of all secondary data nodes in the data cluster is obtained.
In this embodiment, the abnormal state of the primary data node can be automatically repaired in the data cluster, so as to improve the robustness of the data cluster. When the first-level data node is in an abnormal state, one node abnormal information can be broadcast in the data cluster, and when the second-level data node receives the node abnormal information, the node information of all the second-level data nodes in the data cluster can be acquired. The node information may include performance parameters, network parameters, storage occupancy, and the like of the secondary data node.
In S702, the secondary data node determines a target secondary node based on all the node information, and broadcasts primary node update information to the data cluster, so as to use the target secondary node as a new primary data node in the data cluster.
In this embodiment, the secondary data nodes may determine performance scores corresponding to the secondary data nodes according to the node information, select a secondary data node with the highest performance score as a target secondary node, and broadcast the update information of the primary node to the data cluster, so that the target secondary node can replace the abnormal primary data node.
In the embodiment of the application, the node information corresponding to each secondary data node is acquired through the secondary data node under the condition that the primary data node is abnormal, and the target secondary node replacing the abnormal primary data node is selected, so that the stability of the data synchronization system can be improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (8)

1. A data synchronization method is applied to a multi-node data synchronization system and is characterized in that the data synchronization system comprises more than two data clusters and a central node, and each data cluster comprises a primary data node and at least one secondary data node;
the data synchronization method comprises the following steps:
the central node receives data to be synchronized sent by the primary data node and extracts data characteristic information corresponding to the data to be synchronized, wherein the data to be synchronized is sent to the primary data node by a user terminal connected with the secondary data node in a descending mode through the secondary data node;
the central node identifies data conflicts based on the data characteristic information of all data to be synchronized, and determines target synchronous data without data conflicts;
when the central node meets a preset synchronization triggering condition, performing data synchronization on all the target synchronization data;
the central node identifies data conflicts based on the data characteristic information of all data to be synchronized, and determines target synchronization data without data conflicts, wherein the data conflicts comprise:
the central node calculates a first similarity between any two data to be synchronized according to the data uploading time in the data characteristic information and the equipment identification of the central node;
if the first similarity between any two data to be synchronized is greater than a preset first similarity threshold, the central node identifies any two data to be synchronized as candidate conflict data;
the central node determines a feature extraction algorithm associated with the data type based on the data type of the candidate conflict data;
the central node imports the candidate conflict data into the feature extraction algorithm to obtain a data feature vector corresponding to the candidate conflict data;
the central node calculates a second similarity between the candidate collision data based on the data feature vector;
and identifying candidate conflict data with the second similarity not greater than a preset second similarity threshold as target synchronization data.
2. The method of claim 1, wherein the data feature vector comprises a content dimension vector and an application dimension vector; the content dimension vector is used to identify content features of the candidate conflict data; the application dimension vector is used for representing all application programs which process the candidate conflict data; the central node calculates a second similarity between the candidate collision data based on the data feature vectors, including:
the central node calculates vector distances between the content dimension vectors of the candidate collision data, and determines a first similarity factor based on the vector distances;
the central node analyzes the application dimension vector, determines the processing sequence of each application program which processes the candidate conflict data, and determines the weighting weight of each application program based on the processing sequence corresponding to all the application programs;
the central node calculates a second similarity factor between the candidate conflict data based on the weighted weights of all the application programs and the application dimension vector; the calculation algorithm of the second similarity factor is as follows:
Figure 274766DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 798151DEST_PATH_IMAGE002
is the second similarity factor;
Figure 165678DEST_PATH_IMAGE003
the value of the ith application program in the application dimension vector corresponding to one candidate conflict data;
Figure 407304DEST_PATH_IMAGE004
the weighting weight corresponding to the ith application program;
Figure 620111DEST_PATH_IMAGE005
the value of the jth application program in the application dimension vector corresponding to another candidate conflict datum;
Figure 263581DEST_PATH_IMAGE006
the weighting weight corresponding to the jth application;na total number of applications processed for said one of the candidate conflict data;ma total number of applications processed for the another candidate conflict data;
the central node obtains the second similarity based on the first similarity factor and the second similarity factor.
3. The method of claim 1, further comprising:
if the central node meets a preset node updating triggering condition, acquiring node characteristic information of a primary data node corresponding to each data cluster; the node characteristic information comprises a node position, a node performance parameter and a node network parameter corresponding to the central node;
the central node respectively determines position dimension scores corresponding to the primary data nodes based on the positions of all the nodes;
the central node determines a data volume average value of the data synchronization system based on the data volumes of all uploaded data, and respectively determines network dimension scores corresponding to the primary data nodes based on the data volume average value and the node network parameters;
the central node determines the maximum performance parameter according to the locally corresponding reference performance parameter and all the node performance parameters, and determines the performance dimension score corresponding to each primary data node based on the maximum performance parameter;
the central node determines an evaluation index corresponding to each primary data node based on the position dimension score, the network dimension score and the performance dimension score;
if the evaluation index of any one primary data node is larger than the reference index corresponding to the central node, the central node sends node update information to all data nodes in the data synchronization system; and the node updating information contains the node identification of the primary data node with the evaluation index larger than the reference index, so that the primary data node with the evaluation index larger than the reference index is used as a new central node.
4. The method according to any one of claims 1 to 3, wherein the central node performs data synchronization on all the target synchronization data when a preset synchronization trigger condition is met, including:
the central node determines a target synchronization node for data synchronization based on a preset synchronization subscription list;
the central node acquires the synchronous information of the synchronized data corresponding to the target synchronous node;
the central node determines incremental synchronization data based on the synchronization information and all the target synchronization data;
and the central node sends the incremental synchronization data to the target synchronization node.
5. The method of claim 4, wherein after the central node sends the delta synchronization data to the target synchronization data, further comprising:
if the central node receives synchronization failure information sent by the target synchronization node or does not receive synchronization completion information fed back by the target synchronization node within a preset effective feedback time length, increasing the count value of an abnormal synchronization counter;
and if the count value is greater than a preset abnormal threshold value, the central node performs full data synchronization operation on the target synchronization node.
6. The method according to any one of claims 1-3, further comprising:
if the secondary data node receives node abnormal information broadcast by a primary data node in an associated data cluster, acquiring node information of all secondary data nodes in the data cluster;
and the secondary data node determines a target secondary node based on all the node information and broadcasts primary node updating information to the data cluster so as to take the target secondary node as a new primary data node in the data cluster.
7. A data synchronization system is characterized by comprising more than two data clusters and a central node, wherein each data cluster comprises a primary data node and at least one secondary data node;
the central node is used for receiving data to be synchronized sent by the primary data node and extracting data characteristic information corresponding to the data to be synchronized, wherein the data to be synchronized is sent to the primary data node by a user terminal connected with the secondary data node through the secondary data node;
the central node is used for identifying data conflict based on the data characteristic information of all data to be synchronized and determining target synchronous data without data conflict;
the central node is used for carrying out data synchronization on all the target synchronization data when a preset synchronization triggering condition is met;
the central node is configured to perform data collision identification based on data characteristic information of all data to be synchronized, and determine target synchronization data without data collision therebetween, and includes:
the central node is used for calculating a first similarity between any two data to be synchronized according to the data uploading time in the data characteristic information and the equipment identification to which the data uploading time belongs;
the central node is configured to identify any two pieces of data to be synchronized as candidate collision data if the first similarity between any two pieces of data to be synchronized is greater than a preset first similarity threshold;
the central node is used for determining a feature extraction algorithm associated with the data type based on the data type of the candidate conflict data;
the central node is used for importing the candidate conflict data into the feature extraction algorithm to obtain a data feature vector corresponding to the candidate conflict data;
the central node is used for calculating a second similarity between the candidate conflict data based on the data feature vector;
the central node is configured to identify candidate conflict data with the second similarity not greater than a preset second similarity threshold as target synchronization data.
8. The system of claim 7, wherein the data feature vector comprises a content dimension vector and an application dimension vector; the content dimension vector is used to identify content features of the candidate conflict data; the application dimension vector is used for representing all application programs which process the candidate conflict data; the central node is configured to calculate a second similarity between the candidate collision data based on the data feature vector, and includes:
the central node is used for calculating the vector distance between the data characteristic vectors of the candidate conflict data and determining a first similarity factor based on the vector distance;
the central node is configured to determine, according to the application dimension vector, a processing order corresponding to each application program that has processed the candidate conflict data, and determine a weighting weight of each application program based on the processing orders corresponding to all the application programs;
the central node is used for calculating a second similarity factor between the candidate conflict data based on the weighted weights of all the application programs and the application dimension vector; the calculation algorithm of the second similarity factor is as follows:
Figure 598748DEST_PATH_IMAGE007
wherein the content of the first and second substances,
Figure 265353DEST_PATH_IMAGE002
is the second similarity factor;
Figure 344167DEST_PATH_IMAGE003
the value of the ith application program in the application dimension vector corresponding to one candidate conflict data;
Figure 779828DEST_PATH_IMAGE004
the weighting weight corresponding to the ith application program;
Figure 551474DEST_PATH_IMAGE005
the value of the jth application program in the application dimension vector corresponding to another candidate conflict datum;
Figure 502113DEST_PATH_IMAGE006
the weighting weight corresponding to the jth application;na total number of applications processed for said one of the candidate conflict data;ma total number of applications processed for the another candidate conflict data;
the central node is configured to obtain the second similarity based on the first similarity factor and the second similarity factor.
CN202110618006.5A 2021-06-03 2021-06-03 Data synchronization method and data synchronization system Active CN113259470B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110618006.5A CN113259470B (en) 2021-06-03 2021-06-03 Data synchronization method and data synchronization system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110618006.5A CN113259470B (en) 2021-06-03 2021-06-03 Data synchronization method and data synchronization system

Publications (2)

Publication Number Publication Date
CN113259470A CN113259470A (en) 2021-08-13
CN113259470B true CN113259470B (en) 2021-09-24

Family

ID=77186119

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110618006.5A Active CN113259470B (en) 2021-06-03 2021-06-03 Data synchronization method and data synchronization system

Country Status (1)

Country Link
CN (1) CN113259470B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114185984A (en) * 2022-02-15 2022-03-15 数皮科技(湖北)有限公司 Well site data conversion and synchronization method
CN115794837B (en) * 2023-02-01 2023-06-23 天翼云科技有限公司 Data table synchronization method, system, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111680106A (en) * 2020-06-17 2020-09-18 深圳前海微众银行股份有限公司 Method and device for synchronizing data of multiple application systems
CN112818064A (en) * 2021-02-25 2021-05-18 平安普惠企业管理有限公司 Multi-system data synchronization method, device, equipment and storage medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101005428A (en) * 2006-01-19 2007-07-25 华为技术有限公司 Realizing method for detecting and resolving data synchronous conflict
US8719375B2 (en) * 2007-03-22 2014-05-06 Microsoft Corporation Remote data access techniques for portable devices
US7363329B1 (en) * 2007-11-13 2008-04-22 International Business Machines Corporation Method for duplicate detection on web-scale data in supercomputing environments
US20100077007A1 (en) * 2008-09-18 2010-03-25 Jason White Method and System for Populating a Database With Bibliographic Data From Multiple Sources
CN105338093A (en) * 2015-11-16 2016-02-17 中国建设银行股份有限公司 Data synchronizing method and system
CN108243208A (en) * 2016-12-23 2018-07-03 深圳市优朋普乐传媒发展有限公司 A kind of method of data synchronization and device
CN110381149B (en) * 2019-07-24 2022-03-18 北京视界云天科技有限公司 Data distribution method and device and data synchronization method and device
CN111625596B (en) * 2020-05-14 2023-12-26 国网辽宁省电力有限公司 Multi-source data synchronous sharing method and system for real-time new energy consumption scheduling

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111680106A (en) * 2020-06-17 2020-09-18 深圳前海微众银行股份有限公司 Method and device for synchronizing data of multiple application systems
CN112818064A (en) * 2021-02-25 2021-05-18 平安普惠企业管理有限公司 Multi-system data synchronization method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN113259470A (en) 2021-08-13

Similar Documents

Publication Publication Date Title
CN113259470B (en) Data synchronization method and data synchronization system
CN107332876B (en) Method and device for synchronizing block chain state
US9124612B2 (en) Multi-site clustering
CN106599104B (en) Massive data association method based on redis cluster
US20060053163A1 (en) Hierarchical space partitioning for scalable data dissemination in large-scale distributed interactive applications
CN108874803B (en) Data storage method, device and storage medium
CN109558065B (en) Data deleting method and distributed storage system
CN108205569A (en) For updating the method and apparatus of configuration management database
CN113986873A (en) Massive Internet of things data modeling processing, storing and sharing method
CN113360456B (en) Data archiving method, device, equipment and storage medium
CN110543472B (en) Data reconciliation method and related device
CN105159925A (en) Database cluster data distribution method and system
CN107465706B (en) Distributed data object storage device based on wireless communication network
CN117221078A (en) Association rule determining method, device and storage medium
CN111552701A (en) Method for determining data consistency in distributed cluster and distributed data system
CN110224847B (en) Social network-based community division method and device, storage medium and equipment
CN112115382B (en) Data processing method and device, storage medium and electronic device
CN101610281A (en) A kind of data fingerprint store method and device
JP5922811B1 (en) Log information classification device, log information classification method, and program
CN114691700A (en) Kafaka cluster-based intelligent park retrieval method
CN110807051B (en) Context-aware real-time service recommendation method based on cloud platform
CN111698321A (en) Internet of things equipment data synchronization method and device and control center
CN111061719A (en) Data collection method, device, equipment and storage medium
CN115481187B (en) Data reading and writing method, device and storage medium
CN112990380B (en) Filling method and system for missing data of Internet of things

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant