CN115221245A

CN115221245A - Intelligent data acquisition synchronization method, system and equipment

Info

Publication number: CN115221245A
Application number: CN202210834391.1A
Authority: CN
Inventors: 梁汉健; 麦耀锋; 谢少强; 陈家俊; 江振钱
Original assignee: Guangzhou Nasdaq Data Intelligence Technology Co ltd
Current assignee: Guangzhou Nasdaq Data Intelligence Technology Co ltd
Priority date: 2022-07-14
Filing date: 2022-07-14
Publication date: 2022-10-21
Anticipated expiration: 2042-07-14
Also published as: CN115221245B

Abstract

The application relates to a big data technology, and provides an intelligent data acquisition synchronization method, a system and equipment, wherein a first platform sends a data partitioning result to a data center station; the data center station divides the received data partition result according to a data division strategy to obtain a plurality of data sub-partitions, and performs hash verification based on asynchronous processing to obtain a data verification result; the data center platform performs difference comparison on the data verification result and a local data verification result sent by the first platform to obtain a difference comparison result; and if the data center determines that the difference comparison result is not a null value, acquiring the data comparison result of the target sub-partition, determining the target data sub-partition in the plurality of data sub-partitions and using the target data sub-partition as a data synchronization abnormal sub-partition. The method and the device realize the rapid detection of the data consistency of the first platform and the data center station in the data synchronization process, and can reduce the communication resource loss and increase the real-time performance of the data by times.

Description

Intelligent data acquisition synchronization method, system and equipment

Technical Field

The present application relates to the field of big data technologies, and in particular, to an intelligent data acquisition synchronization method, system, computer device, and storage medium.

Background

At present, with the development of enterprise digital transformation, most enterprises start to own informatization systems, and as a midstream supplier, when the supply of industrial chains is ordered, the upstream and downstream informatization systems are required to be connected. Such as: in the textile industry/intelligent manufacturing industry, enterprises need to build own data processing platforms, when a customer (downstream) places an order, the customer can transmit order data to the data processing platform through a customer system, the data processing platform can transmit data to be synchronized to an upstream supplier system, and the upstream supplier system receives the data to be synchronized and starts to perform service feedback and processing.

Such multi-system and multi-link data interfacing often encounters the problem that the processing efficiency of data consistency is too low when the data volume (for example, about 100 ten thousand data to be synchronized per hour) is too large.

In order to detect data consistency in the data synchronization process, at least two ways can be used:

the first processing method for ensuring data consistency is to confirm whether a client sends data or not by communication one by one in a mode of API (Application Program Interface);

the second processing method for ensuring data consistency is a processing method for copying data through a database at regular time, and can solve the problem of synchronization with large data volume.

However, in the first method, the processing method for ensuring data consistency often confirms whether the client side sends data one by one through an API, and this data verification method often needs to occupy relatively large network bandwidth resources, even resulting in network congestion. Such as: the client system A places 1 order data, the data processing platform receives 1 order data through the API, and then 1 to-be-synchronized data is sent to the supplier system B through the API. At the moment, the data is stored and sent in the client system A, stored and sent in the data processing platform and stored and confirmed in the supplier system B, and single data of the three systems is stored for three times and communicated for four times, so that resources are consumed greatly.

In the second method, a processing method for copying data by using a timing database can solve the problem of synchronization with large data volume, but the real-time performance of upstream and downstream data is difficult to ensure; due to the problems of time difference and frequent updating of incremental data, the real-time performance and consistency of the data are difficult to ensure; the large amount of data plus the frequent duplication of data may even put a minor strain on the original business system database. Such as: the data is copied at one time in half an hour, and if the client system A puts 1000 items in order within half an hour, the provider system B needs to synchronize the data after half an hour, and after the data synchronization is completed, the provider system B needs to update the new data after half an hour because the synchronized data can be updated.

Disclosure of Invention

The embodiment of the application provides an intelligent data acquisition and synchronization method, a system, computer equipment and a storage medium, and aims to solve the problems that communication resources are consumed greatly, data instantaneity is poor and data consistency is difficult to ensure when data are synchronized from a client system to a supplier system in the prior art.

In a first aspect, an embodiment of the present application provides an intelligent data acquisition synchronization method, which is applied to an intelligent data acquisition synchronization system, where the intelligent data acquisition synchronization system at least includes a first platform and a data center platform, and includes:

the first platform divides a plurality of data to be synchronized to obtain data partitioning results;

when the first platform meets the data synchronization condition, the data partitioning result is sent to the data center station;

the data intermediate station divides the received data partition results according to a preset data partition strategy to obtain a plurality of data sub-partitions;

the data center station carries out Hash verification on the data sub-partitions based on asynchronous processing to obtain a data verification result;

the first platform acquires a local data verification result corresponding to the data partition result and sends the local data verification result to the data center station;

the data center station performs difference comparison on the data verification result and the local data verification result to obtain a difference comparison result; the difference comparison result comprises a plurality of sub-partition data comparison results, and each sub-partition data comparison result corresponds to one data sub-partition in the plurality of data sub-partitions;

and if the data center determines that the difference comparison result is not a null value, acquiring a corresponding sub-partition data comparison result as a target sub-partition data comparison result, and determining target data sub-partitions in the plurality of data sub-partitions according to the target sub-partition data comparison result and using the target data sub-partitions as data synchronization abnormal sub-partitions.

In a second aspect, an embodiment of the present application provides an intelligent data acquisition synchronization system, which includes a first platform and a data center station, where the intelligent data acquisition synchronization system is configured to execute the intelligent data acquisition synchronization method according to the first aspect.

In a third aspect, an embodiment of the present application further provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the intelligent data acquisition synchronization method according to the first aspect when executing the computer program.

In a fourth aspect, the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, causes the processor to execute the intelligent data acquisition synchronization method according to the first aspect.

The embodiment of the application provides an intelligent data acquisition synchronization method, a system, computer equipment and a storage medium, wherein a first platform divides a plurality of data to be synchronized to obtain data partitioning results; when the first platform meets the data synchronization condition, the data partitioning result is sent to a data center station; the data center station divides the received data partition result according to a preset data partition strategy to obtain a plurality of data sub-partitions; the data center station carries out Hash verification on the plurality of data sub-partitions based on asynchronous processing to obtain a data verification result; the first platform acquires a local data verification result corresponding to the data partition result and sends the local data verification result to the data center station; the data center station performs difference comparison on the data verification result and the local data verification result to obtain a difference comparison result; the difference comparison result comprises a plurality of sub-partition data comparison results, and each sub-partition data comparison result corresponds to one data sub-partition in the plurality of data sub-partitions; and if the data center determines that the difference comparison result is not a null value, acquiring a corresponding sub-partition data comparison result as a target sub-partition data comparison result, determining target data sub-partitions in the plurality of data sub-partitions according to the target sub-partition data comparison result, and using the target data sub-partitions as data synchronization abnormal sub-partitions. The method and the device realize the rapid detection of the data consistency of the first platform and the data center station in the data synchronization process, and can reduce the communication resource loss and increase the real-time performance of the data by times.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic view of an application scenario of an intelligent data acquisition synchronization method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of an intelligent data acquisition synchronization method according to an embodiment of the present application;

fig. 3 is a schematic block diagram of an intelligent data acquisition synchronization system provided in an embodiment of the present application;

fig. 4 is a schematic block diagram of a computer device provided in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, of the embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

Referring to fig. 1 and fig. 2, fig. 1 is a schematic view of an application scenario of an intelligent data acquisition synchronization method according to an embodiment of the present application; fig. 2 is a schematic flowchart of an intelligent data acquisition synchronization method according to an embodiment of the present application, where the intelligent data acquisition synchronization method is applied to an intelligent data acquisition synchronization system, and the intelligent data acquisition synchronization system at least includes a first platform and a data center platform. The first platform is specifically a customer ordering system (which can also be understood as a customer ordering server) or a supplier system (which can also be understood as a supplier system server). In specific implementation, the first platform is not necessarily limited to an ordering system or a supplier system, and may be any system having data synchronization and data consistency detection requirements.

The technical scheme is described by taking a first platform as a client ordering system. As shown in fig. 2, the method includes steps S201 to S207.

S201, the first platform divides a plurality of data to be synchronized to obtain data partitioning results.

In this embodiment, a first platform acquires a plurality of data to be synchronized, and the acquired data to be synchronized are divided in the first platform based on a certain rule to obtain a data partitioning result. The reason for dividing the data to be synchronized is to divide the data into a plurality of data blocks, and the processing pressure of large data volume can be reduced after the data blocks are processed in a blocking mode.

In one embodiment, step S201 includes:

the first platform acquires data types of a plurality of data to be synchronized;

if the first platform determines that the data to be synchronized are of a real-time incremental data type, dividing the data to be synchronized according to a preset time interval to obtain a data partitioning result; wherein the data partition result comprises a plurality of data partitions;

if the first platform determines that the data to be synchronized are non-real-time full data, dividing the data to be synchronized according to a preset data dimension division strategy to obtain a data partitioning result; wherein the data partition result comprises a plurality of data partitions.

In this embodiment, since there are at least two situations in the multiple pieces of data to be synchronized acquired in the first platform, one is a real-time incremental data type, and the other is non-real-time full data, the first platform may first determine the data types of the multiple pieces of data to be synchronized. When the data to be synchronized is of a real-time incremental data type, the data to be synchronized is order data received in a time period and can be understood as real-time data. When the data to be synchronized is of a non-real-time full data type, the representation is all order data from the current system time ago and can be understood as historical data.

When the first platform receives a plurality of data to be synchronized and determines that the data to be synchronized corresponds to the real-time incremental data type, the data to be synchronized can be divided according to a preset time interval (for example, the time interval is set to 3s, and the specific implementation is not limited to 3s, and can be set to other time interval values according to actual requirements), so as to obtain a data partitioning result. For example, if the plurality of data to be synchronized correspond to N1 pieces of data (N1 is a positive integer value), and the preset time interval is 3s, the N1 pieces of data to be synchronized may be divided into N1/3 groups or [ N1/3] +1 group of data (if N1 can be divided by 3, N1/3 is a positive integer, and if N1 cannot be divided by 3, the rounding up is [ N1/3] + 1). After the data to be synchronized of the multiple data are divided, a data partition result comprising N1/3 or [ N1/3] +1 data partitions can be obtained.

When the first platform receives a plurality of data to be synchronized and determines that the data to be synchronized corresponds to non-real-time full data, the data to be synchronized can be divided according to a preset data dimension division strategy, for example, according to the commodity model or the commodity batch of each data to be synchronized, so as to obtain a data partitioning result, and the data partitioning result comprises a plurality of data partitions.

S202, when the first platform meets the data synchronization condition, the data partitioning result is sent to the data intermediate station.

In this embodiment, since the data grouping result in the data of the first platform is finally required to be sent to the second platform, the data grouping result needs to be synchronized from the first platform to the data center platform. In order to synchronize the data grouping result from the first platform to the data center more regularly, the first platform may send the data partitioning result to the data center when a data synchronization condition is satisfied (e.g., every T1 interval, where T1 is a preset time period value).

And S203, the data center station divides the received data partition result according to a preset data partition strategy to obtain a plurality of data sub-partitions.

In this embodiment, after the data partition result is received by the data center station, the data partition result needs to be divided more finely, and specifically, each data partition in the data partition result may be divided based on a data division policy such as a bisection method to obtain a plurality of data sub-partitions. Therefore, each data partition in the data partition result is divided into finer granularity, data can be cut in order, and the data can be positioned quickly in the follow-up process.

In one embodiment, step S203 includes:

and the data center station divides the data subareas in the data subarea result according to a dichotomy to obtain a plurality of data subareas.

In the present embodiment, the data partition is divided according to the dichotomy, similar to randomly dividing the data into two packets having the same data amount. For example, the data partitioning result includes N1/3 data partitions, and one of the data partitions is denoted as data partition 1, at this time, the first half of the data to be synchronized in the data partition 1 is partitioned into data sub-partition a, and the second half of the data to be synchronized in the data partition 1 is partitioned into data sub-partition B. If the remaining system resources in the data center station are still sufficient to support further division of the data sub-partition a and the data sub-partition B, the data sub-partition a and the data sub-partition B may be further divided based on a bisection method, respectively, to obtain four data sub-partitions (for example, the data sub-partition a is divided into the data sub-partition A1 and the data sub-partition A2, and the data sub-partition B is divided into the data sub-partition B1 and the data sub-partition B2). Therefore, each data partition in the data partition result is divided at least twice based on the dichotomy, so that a plurality of data sub-partitions can be obtained, and ordered data cutting is realized.

And S204, the data center station performs Hash check on the data sub-partitions based on asynchronous processing to obtain a data check result.

In this embodiment, after receiving a plurality of data sub-partitions in the data center, the data sub-partitions may be subjected to hash check in an asynchronous processing manner to obtain sub-partition data check results respectively corresponding to each data sub-partition, so as to form a data check result.

When the hash check is performed on the data of each data sub-partition, at least the hash key and the hash value of the data sub-partition are required to be acquired, the hash check is considered to be completed after the hash key and the hash value of each data sub-partition are acquired, and finally, the data check result is composed of the sub-partition data check results respectively corresponding to each data sub-partition.

In one embodiment, step S204 includes:

the data center station acquires one data sub-partition in the data sub-partitions and takes the data sub-partition as a data sub-partition to be checked;

obtaining a hash key of the data sub-partition to be verified;

acquiring the number of data pieces of the sub-partition to be verified, the data storage size, the data date interval, the first data ID of the sub-partition to be verified and the last data ID of the sub-partition to be verified, and correspondingly generating a hash value corresponding to the sub-partition to be verified according to the number of data pieces of the sub-partition to be verified, the data storage size, the data date interval, the first data ID of the sub-partition to be verified and the last data ID of the sub-partition to be verified;

forming a sub-partition data verification result of the sub-partition of the data to be verified by the hash key of the sub-partition of the data to be verified and the hash value corresponding to the sub-partition of the data to be verified;

and storing the sub-partition data verification result to a hash table so as to update the hash table.

In this embodiment, the hash check process is described by taking one of the data sub-partitions as an example, and the hash check processes of other data sub-partitions are the same as those of the following example.

For example, the data center station obtains one data sub-partition A1 of the plurality of data sub-partitions, and uses the data sub-partition as the data sub-partition to be verified. At this time, the hash key of the data sub-partition A1 may be obtained first (for example, the partition name of the data sub-partition A1 is directly used as the hash key), then the hash value of the data sub-partition A1 is obtained, and finally the hash key and the hash value of the data sub-partition A1 are stored in the hash table stored in the data staging platform, that is, the hash check process for the data sub-partition A1 is completed.

The hash check is performed on the plurality of sub-partitions by using an asynchronous processing mode in the data center, and specifically, the hash check is performed on the plurality of sub-partitions by using a multithread mode, which is different from the hash check performed on the data sub-partition A1 in that the time for starting the hash check is different, but the data processing processes for performing the hash check on the data sub-partitions are the same. After the hash check of all the received data sub-partitions is completed in the data center, the data check result can be composed of the sub-partition data check results respectively corresponding to each data sub-partition, and the data check result is stored in the hash table. Therefore, the hash table comprises the hash key and the hash value of each data sub-partition, and the partition characteristics are fully extracted by obtaining the hash value of each data sub-partition based on the method, so that the accuracy of data verification is ensured.

S205, the first platform obtains a local data verification result corresponding to the data partitioning result, and sends the local data verification result to the data center station.

In this embodiment, the data partitioning result local to the first platform may be stored locally to the first platform, in addition to being sent to the data middlebox. Therefore, the same processing procedure for dividing the data partition results in the data intermediate station can be referred to in the first platform, and the data partition results are subjected to the same data division by the first platform, so that a plurality of local data sub-partitions are obtained. The total number of the sub-partitions corresponding to the plurality of local data sub-partitions is completely the same as the total number of the data sub-partitions obtained by dividing the data partition result in the data center station. Moreover, in order to facilitate comparison between the local data verification result and the data verification result in the data center station, the first platform may also send the local data verification result to the data center station. Specifically, API communication is established between the first platform and the data center station, and then the first platform sends the local data verification result to the data center station based on the API communication.

If the data partitioning result is not sent to the data intermediate station synchronously from the first platform without any error, the first platform performs the same data partitioning on the data partitioning result to obtain a plurality of local data sub-partitions, the plurality of local data sub-partitions obtained by the data partitioning result of the data intermediate station according to the data partitioning strategy are completely the same data partitioning, and each local data sub-partition can find a completely corresponding data sub-partition with the same data.

If an error occurs in the data partitioning result sent from the first platform to the data center station synchronously, because the data partitioning between the plurality of local data sub-partitions in the first platform and the plurality of data sub-partitions in the data center station is still completely the same, at least one local data sub-partition may not be completely the same as the data of the corresponding data sub-partition in the data center station, which is also a case of data inconsistency caused by data transmission. The subsequent technical scheme of the application is to quickly position the specific information of the data of the local data sub-partition and the corresponding data sub-partition in the data intermediate station, so as to timely process the data synchronization abnormity.

In one embodiment, step S205 includes:

the first platform divides the partition result according to the data division strategy to obtain a plurality of local data sub-partitions;

and the first platform carries out Hash verification on the local data sub-partitions based on asynchronous processing to obtain a local data verification result.

In this embodiment, the partitioning result is also partitioned by a binary method in the first platform, the multiple partitioned local data sub-partitions are the same as the processing procedure of partitioning the data partitioning result in the data center platform, and the data partitioning result is partitioned by the first platform in the same manner to obtain multiple local data sub-partitions.

And then carrying out hash check on each data sub-partition based on asynchronous processing to obtain a local data check result. For example, one local data sub-partition of the local data sub-partitions is denoted as a local data sub-partition A1, which is a completely same data sub-partition as the data sub-partition A1 on the premise that data synchronization does not fail, and the hash check process is described by taking the local data sub-partition A1 as an example, and the hash check process of other local data sub-partitions in the first platform is the same as the hash check process of the following example.

The hash key of the local data sub-partition a1 is firstly acquired, then the hash value of the local data sub-partition a1 is acquired, and finally the hash key and the hash value of the local data sub-partition a1 are stored in a local hash table stored in the first platform, so that the hash verification process of the local data sub-partition a1 is completed.

The hash check is performed on the plurality of local sub-partitions in the first platform by using an asynchronous processing mode, for example, the hash check is performed on the plurality of local sub-partitions by using a multithreading mode, which is different from the hash check performed on the local data sub-partition a1 in that the time for starting the hash check is different, but the data processing processes for performing the hash check on the local data sub-partitions are the same. After the hash check of all the local data sub-partitions is completed in the first platform, the local data check result can be composed of the local sub-partition data check result corresponding to each local data sub-partition, and the data check result is stored in the local hash table. As can be seen, the hash key and hash value for each local data sub-partition are included in the local hash table.

In a specific implementation, the first platform may further synchronize the local data sub-partitions corresponding to the data partition result and the local data verification result to a pre-library, where the pre-library may be understood as a non-core database in the first platform, which may effectively reduce a burden of a core service library.

S206, the data center station performs difference comparison on the data verification result and the local data verification result to obtain a difference comparison result; the difference comparison result comprises a plurality of sub-partition data comparison results, and each sub-partition data comparison result corresponds to one of the plurality of data sub-partitions.

In this embodiment, after the first platform sends the local data verification result to the data center station in an API communication manner, the data center station may perform difference comparison on the data verification result and the local data verification result, specifically, compare differences between the hash table corresponding to the data verification result and the local hash table corresponding to the local data verification result, and finally obtain a difference comparison result.

In one embodiment, step S206 includes:

the data center station acquires the data verification result of each sub-partition in the data verification result and the data verification result of each local sub-partition in the local data verification result;

performing difference comparison on each sub-partition data verification result in the data verification result and a corresponding local sub-partition data verification result to obtain a sub-partition data comparison result corresponding to each sub-partition data verification result;

and forming a difference comparison result by the sub-partition data comparison result corresponding to each sub-partition data verification result.

For example, the hash table corresponding to the data verification result is as follows:

hash key	Hash value
		Hash key C1	Hash value D1
Hash key C2	Hash value D2
		Hash key C3	Hash value D3
……	……
		Hash key Cn1	Hash value Dn1

TABLE 1

In table 1, each hash key corresponds to one data sub-partition of the plurality of data sub-partitions, and the hash value of each data sub-partition can be determined based on the number of data pieces of each data sub-partition, the data storage size, the data date interval, the first data ID of the sub-partition to be verified, and the last data ID of the sub-partition to be verified. In table 1, the hash key of the data sub-partition area is denoted by Ci, and the hash value of the data sub-partition area is denoted by Di, where the value range of i is 1 to n1, and the total number of the data sub-partitions after the data partition result is divided into multiple data sub-partitions is n1 (n 1 is a positive integer value). In table 1, a pair of hash keys and hash values in the same row constitute a hash key-value pair of a data sub-partition.

And the local hash table corresponding to the local data verification result is as follows 2:

TABLE 2

In table 2, each hash key corresponds to one local data sub-partition of the plurality of local data sub-partitions, and the hash value of each local data sub-partition can be determined based on the number of data pieces of the local data sub-partition, the data storage size, the data date interval, the first data ID of the sub-partition to be verified, and the last data ID of the sub-partition to be verified. In table 2, the hash key of the local data sub-partition area is denoted by Ej, the hash value of the local data sub-partition area is denoted by Fj, where the value range of j is 1-n2, and the total number of the local data sub-partitions after the data partition result is divided into the local data sub-partitions is n2 (n 2 is a positive integer value, and generally n2= n 1). In table 2, a pair of hash keys and hash values in the same row constitute a hash key-value pair of a local data sub-partition.

When the data verification result shown in table 1 and the local data verification result shown in table 2 are known, the hash key-value pair of each row in the data verification result may be compared with the hash key-value pair of the corresponding row in the local data verification result in the data console to determine whether there is a difference. For example, comparing the hash key-value pair C1-D1 composed of the hash key C1 and the hash value D1 in the first row in table 1 with the hash key-value pair E1-F1 composed of the hash key E1 and the hash value F1 in the first row in table 2, if the hash value D1 in the hash key-value pair C1-D1 is different from the hash value F1 in the hash key-value pair E1-F1 (indicating that the two hash values are not consistent), it may be determined that the data sub-partition corresponding to the hash key C1 and the local data sub-partition corresponding to the hash key E1 have a data difference, that is, it indicates that an error occurs in the data synchronization process and the two are not consistent. If the hash value D1 in the hash key-value pair C1-D1 is the same as the hash value F1 in the hash key-value pair E1-F1, it can be determined that there is no data difference between the data sub-partition corresponding to the hash key C1 and the local data sub-partition corresponding to the hash key E1, that is, it indicates that no error occurs and there is data consistency in the data synchronization process.

It can be seen that when referring to the comparison process between the hash key-value pair C1-D1 and the hash key-value pair E1-F1, the comparison results between the other hash key-value pairs in table 1 and the corresponding hash key-value pairs in table 2 can be obtained, so that the difference comparison result is finally determined from the comparison results of the hash key-value pairs in each row.

S207, if the data center determines that the difference comparison result is not a null value, acquiring a corresponding sub-partition data comparison result as a target sub-partition data comparison result, and determining target data sub-partitions in the plurality of data sub-partitions according to the target sub-partition data comparison result and using the target data sub-partitions as data synchronization abnormal sub-partitions.

In this embodiment, if the data center determines that the difference comparison result is not a null value, it indicates that at least one row of hash key-value pairs in table 1 is different from the hash key-value pairs in the corresponding row in table 2, that is, at least 1 data sub-partition has an error in the process of synchronizing from the first platform to the data center. Because the difference comparison result comprises a plurality of sub-partition data comparison results, at this time, the sub-partition data comparison result which is not null in the plurality of sub-partition data comparison results is determined as a target sub-partition data comparison result, then the data sub-partition corresponding to each target sub-partition data comparison result is respectively determined as a target data sub-partition, and finally the obtained target data sub-partition is used as a data synchronization abnormal sub-partition.

In an embodiment, step S207 is followed by:

the first platform acquires the plurality of local data sub-partitions from a pre-library;

the pre-library determines a current target data sub-partition from the plurality of local data sub-partitions based on the data synchronization exception sub-partition;

and comparing the current target data sub-partition with the data synchronization abnormal sub-partition based on a dichotomy, and determining a specific data difference area.

In this embodiment, since the pre-library of the first platform also stores a plurality of local data sub-partitions and local data verification results corresponding to the data partition results, after the data synchronization exception sub-partition is determined in the data center station, it is only determined that the data synchronization exception occurs in the data sub-partition, but it is unclear which section of data in the data sub-partition specifically has the data synchronization exception. In order to determine the data difference specific area more accurately, the first platform needs to first obtain the plurality of local data sub-partitions from the pre-library, which is identical to the plurality of local data sub-partitions in the first platform. Then, because the data synchronization abnormal sub-partition is determined in the data center, the current target data sub-partition corresponding to the data synchronization abnormal sub-partition can be determined in the plurality of local data sub-partitions in the pre-library based on the data synchronization abnormal sub-partition, that is, the current target data sub-partition is a sub-partition with data synchronization abnormality in the plurality of local data sub-partitions.

And finally, comparing the current target data sub-partition with the data synchronization abnormal sub-partition based on a dichotomy, and determining a data difference specific area. For example, the data synchronization exception sub-partition is marked as a data synchronization exception sub-partition G1 and can be divided into two smaller sub-partitions G11 and G12 based on a dichotomy; the current target data sub-partition is marked as a current target data sub-partition H1, and can be divided into two smaller sub-partitions H11 and H12 based on dichotomy (the dichotomy division rule adopted by the current target data sub-partition is completely the same as the dichotomy division rule adopted by the data synchronization abnormal sub-partition). At this time, the sub-partition of G11 may obtain the hash key and the hash value, the sub-partition of G12 may obtain the hash key and the hash value, the sub-partition of H11 may obtain the hash key and the hash value, and the sub-partition of H12 may obtain the hash key and the hash value. And comparing the hash value of the sub-partition of G11 with the hash value of the sub-partition of H11, and comparing the hash value of the sub-partition of G12 with the hash value of the sub-partition of H12 to judge whether the data of the sub-partition of G11 and the data of the sub-partition of H11 are consistent or not, and judging whether the data of the sub-partition of G12 and the data of the sub-partition of H12 are consistent or not. If the hash value of the sub-partition of G11 is the same as the hash value of the sub-partition of H11, it indicates that the data of the two sub-partitions are consistent, and if the hash value of the sub-partition of G11 is not the same as the hash value of the sub-partition of H11, it indicates that the data of the two sub-partitions are not consistent. If the hash value of the sub-partition of G12 is the same as the hash value of the sub-partition of H12, it indicates that the data of the two sub-partitions are consistent, and if the hash value of the sub-partition of G12 is different from the hash value of the sub-partition of H12, it indicates that the data of the two sub-partitions are inconsistent. Therefore, the data sub-partitions are gradually reduced, inconsistent data are quickly positioned, and after the positioning is finished, only correct and consistent data need to be resynchronized from the preposed library, instead of the whole data sub-partitions, so that the processing efficiency when the data are abnormal is improved. And the method combines the prepositive library and the dichotomy to process abnormal data, so that the access amount to the original service library is reduced on one hand, and the data processing speed can be increased on the other hand.

The present application further describes a technical solution with the first platform as a supplier system. Similarly, the method for performing data consistency detection after the supplier system synchronizes the data into the data center station can completely refer to the process that the client ordering system performs grouping division and asynchronous hash check on a plurality of data to be synchronized, then synchronizes the data to the data center station, and then performs comparison check.

When the supplier system groups the data to obtain the data partition result, the data are partitioned according to the supplier names, namely, each supplier name corresponds to one data sub-partition. Based on the partition, data of different suppliers can be stored in different base tables (namely data sub-partitions), and the data among different suppliers can be stored separately, so that the data is safely isolated, the data storage capacity of the suppliers is reduced, and the processing speed is improved.

The method realizes the rapid detection of the data consistency of the first platform and the data center station in the data synchronization process, and can reduce the communication resource loss and increase the real-time performance of the data by times.

The embodiment of the application also provides an intelligent data acquisition synchronization system, which is used for executing any embodiment of the intelligent data acquisition synchronization method. Specifically, referring to fig. 3, fig. 3 is a schematic block diagram of an intelligent data acquisition synchronization system 100 according to an embodiment of the present application.

As shown in fig. 3, the intelligent data acquisition synchronization system 100 includes a first platform 110 and a data center 120.

The first platform is used for dividing a plurality of data to be synchronized to obtain data partitioning results.

In an embodiment, the dividing the plurality of data to be synchronized to obtain the data partitioning result includes:

And the first platform is also used for sending the data partitioning result to the data center station when the data synchronization condition is met.

And the data center station is used for dividing the received data partitioning result according to a preset data partitioning strategy to obtain a plurality of data sub-partitions.

In this embodiment, after the data partitioning result is received by the data center station, the data partitioning result needs to be divided into finer granularity, and specifically, each data partition in the data partitioning result may be divided based on a data division policy such as bisection method to obtain a plurality of data sub-partitions. Therefore, each data partition in the data partition result is divided into finer granularity, data can be cut in order, and the data can be positioned quickly in the follow-up process.

In an embodiment, the dividing the received data partitioning result according to a preset data partitioning policy to obtain a plurality of data sub-partitions includes:

In the present embodiment, the data partition is divided according to the dichotomy, similar to randomly dividing the data into two packets having the same data amount. For example, the data partitioning result includes N1/3 data partitions, and one of the data partitions is denoted as data partition 1, at this time, the first half of the data to be synchronized in the data partition 1 is partitioned into data sub-partition a, and the second half of the data to be synchronized in the data partition 1 is partitioned into data sub-partition B. If the remaining system resources in the data middle station are still sufficient to support further division of the data sub-partition a and the data sub-partition B, the data sub-partition a and the data sub-partition B may be further divided based on a binary method, respectively, to obtain four data sub-partitions (for example, the data sub-partition a is divided into a data sub-partition A1 and a data sub-partition A2, and the data sub-partition B is divided into a data sub-partition B1 and a data sub-partition B2). Therefore, each data partition in the data partition result is divided at least twice based on the dichotomy, so that a plurality of data sub-partitions can be obtained, and ordered data cutting is realized.

And the data center station is also used for carrying out Hash check on the data sub-partitions based on asynchronous processing to obtain a data check result.

In an embodiment, the performing hash check on the plurality of data sub-partitions based on asynchronous processing to obtain a data check result includes:

obtaining a hash key of the data sub-partition to be verified;

For example, the data staging station obtains one data sub-partition A1 of the multiple data sub-partitions, and takes the data sub-partition as the data sub-partition to be verified. At this time, the hash key of the data sub-partition A1 may be obtained first (for example, the partition name of the data sub-partition A1 is directly used as the hash key), then the hash value of the data sub-partition A1 is obtained, and finally the hash key and the hash value of the data sub-partition A1 are stored in the hash table stored in the data intermediate station, so that the hash check process for the data sub-partition A1 is completed.

The hash check is performed on the plurality of sub-partitions by using an asynchronous processing mode in the data center, and specifically, the hash check is performed on the plurality of sub-partitions by using a multithread mode, which is different from the hash check performed on the data sub-partition A1 in that the time for starting the hash check is different, but the data processing processes for performing the hash check on the data sub-partitions are the same. After the hash check of all the received data sub-partitions is completed in the data relay station, the data check result can be composed of the sub-partition data check results respectively corresponding to each data sub-partition, and the data check result is stored in the hash table. Therefore, the hash table comprises the hash key and the hash value of each data sub-partition, and the partition characteristics are fully extracted by obtaining the hash value of each data sub-partition based on the method, so that the accuracy of data verification is ensured.

The first platform is further configured to obtain a local data verification result corresponding to the data partitioning result, and send the local data verification result to the data center.

In this embodiment, the data partitioning result local to the first platform may be stored locally to the first platform, in addition to being sent to the data middlebox. Therefore, in the first platform, the same processing procedure for dividing the data partition results in the data middle platform can be referred, and the same data division is performed on the data partition results by the first platform, so that a plurality of local data sub-partitions are obtained. The total number of the sub-partitions corresponding to the plurality of local data sub-partitions is completely the same as the total number of the data sub-partitions obtained by dividing the data partition result in the data center station. Moreover, in order to facilitate comparison between the local data verification result and the data verification result in the data center station, the first platform may also send the local data verification result to the data center station. Specifically, API communication is established between the first platform and the data center station, and then the first platform sends the local data verification result to the data center station based on the API communication.

If the data partitioning result is not sent to the data center station synchronously from the first platform, the first platform performs the same data partitioning on the data partitioning result to obtain a plurality of local data sub-partitions, the plurality of local data sub-partitions are completely the same data partitioning as the plurality of data sub-partitions obtained by dividing the data partitioning result of the data center station according to the data partitioning strategy, and each local data sub-partition can find the completely corresponding data sub-partition with the same data.

In an embodiment, the obtaining the local data verification result corresponding to the data partitioning result includes:

And then, carrying out hash check on each data sub-partition based on asynchronous processing to obtain a local data check result. For example, one local data sub-partition of the local data sub-partitions is denoted as a local data sub-partition A1, which is a completely same data sub-partition as the data sub-partition A1 on the premise that data synchronization does not fail, and the hash check process is described by taking the local data sub-partition A1 as an example, and the hash check process of other local data sub-partitions in the first platform is the same as the hash check process of the following example.

The data center is further configured to perform difference comparison on the data verification result and the local data verification result to obtain a difference comparison result; the difference comparison result comprises a plurality of sub-partition data comparison results, and each sub-partition data comparison result corresponds to one of the plurality of data sub-partitions.

In this embodiment, after the first platform sends the local data verification result to the data center by means of API communication, the data center may perform difference comparison between the data verification result and the local data verification result, specifically, compare differences between the hash table corresponding to the data verification result and the local hash table corresponding to the local data verification result, and finally obtain a difference comparison result.

In an embodiment, the differentially comparing the data verification result with the local data verification result, and obtaining a difference comparison result includes:

For example, the hash table corresponding to the data verification result is as shown in table 1 above, and the local hash table corresponding to the local data verification result is as shown in table 2 above.

When the data verification result shown in table 1 and the local data verification result shown in table 2 are known, the hash key-value pair of each row in the data verification result may be compared with the hash key-value pair of the corresponding row in the local data verification result in the data console to determine whether there is a difference. For example, comparing the hash key-value pair C1-D1 composed of the hash key C1 and the hash value D1 in the first row in table 1 with the hash key-value pair E1-F1 composed of the hash key E1 and the hash value F1 in the first row in table 2, if the hash value D1 in the hash key-value pair C1-D1 is different from the hash value F1 in the hash key-value pair E1-F1 (indicating that the two hash values are not consistent), it may be determined that the data sub-partition corresponding to the hash key C1 and the local data sub-partition corresponding to the hash key E1 have a data difference, that is, it indicates that an error occurs in the data synchronization process and the two are not consistent. If the hash value D1 of the hash key-value pair C1-D1 is the same as the hash value F1 of the hash key-value pair E1-F1, it can be determined that there is no data difference between the data sub-partition corresponding to the hash key C1 and the local data sub-partition corresponding to the hash key E1, that is, it indicates that no error occurs and there is data consistency in the data synchronization process.

And the data intermediate station is also used for acquiring a corresponding sub-partition data comparison result as a target sub-partition data comparison result if the difference comparison result is not a null value, determining target data sub-partitions in the plurality of data sub-partitions according to the target sub-partition data comparison result and using the target data sub-partitions as data synchronization abnormal sub-partitions.

In this embodiment, if the data center determines that the difference comparison result is not a null value, it indicates that at least one row of hash key-value pairs in table 1 is different from the hash key-value pairs in the corresponding row in table 2, that is, at least 1 data sub-partition has an error in the process of synchronizing from the first platform to the data center. And determining the sub-partition data comparison result which is not null in the sub-partition data comparison results as a target sub-partition data comparison result, then respectively determining the data sub-partition corresponding to each target sub-partition data comparison result as a target data sub-partition, and finally taking the obtained target data sub-partition as a data synchronization abnormal sub-partition.

In an embodiment, the first platform is further configured to:

In this embodiment, since the pre-repository of the first platform also stores a plurality of local data sub-partitions and local data check results corresponding to the data partition result, after the data synchronization abnormal sub-partition is determined in the data middle platform, it is only determined that data synchronization of the data sub-partition has occurred, but it is not clear which data synchronization abnormal sub-partition specifically has occurred in which data synchronization has occurred in the data sub-partition. In order to determine the specific data difference area more accurately, the first platform needs to acquire the plurality of local data sub-partitions from the pre-library, which is identical to the plurality of local data sub-partitions in the first platform. Then, because the data synchronization abnormal sub-partition is determined in the data center, the current target data sub-partition corresponding to the data synchronization abnormal sub-partition can be determined in the plurality of local data sub-partitions in the pre-library based on the data synchronization abnormal sub-partition, that is, the current target data sub-partition is a sub-partition with data synchronization abnormality in the plurality of local data sub-partitions.

And finally, comparing the current target data sub-partition with the data synchronization abnormal sub-partition based on a dichotomy, and determining a specific data difference area. For example, the data synchronization exception sub-partition is marked as a data synchronization exception sub-partition G1 and can be divided into two smaller sub-partitions G11 and G12 based on a dichotomy; the current target data sub-partition is marked as a current target data sub-partition H1, and can be divided into two smaller sub-partitions H11 and H12 based on dichotomy (the dichotomy division rule adopted by the current target data sub-partition is completely the same as the dichotomy division rule adopted by the data synchronization abnormal sub-partition). At this time, the sub-partition of G11 may obtain the hash key and the hash value, the sub-partition of G12 may obtain the hash key and the hash value, the sub-partition of H11 may obtain the hash key and the hash value, and the sub-partition of H12 may obtain the hash key and the hash value. Comparing the hash value of the sub-partition of G11 with the hash value of the sub-partition of H11, and comparing the hash value of the sub-partition of G12 with the hash value of the sub-partition of H12 to determine whether the data of the sub-partition of G11 and the data of the sub-partition of H11 are consistent, and determining whether the data of the sub-partition of G12 and the data of the sub-partition of H12 are consistent. If the hash value of the sub-partition of G11 is the same as the hash value of the sub-partition of H11, it indicates that the data of the two sub-partitions are consistent, and if the hash value of the sub-partition of G11 is different from the hash value of the sub-partition of H11, it indicates that the data of the two sub-partitions are inconsistent. If the hash value of the sub-partition of G12 is the same as the hash value of the sub-partition of H12, it indicates that the data of the two sub-partitions are consistent, and if the hash value of the sub-partition of G12 is not the same as the hash value of the sub-partition of H12, it indicates that the data of the two sub-partitions are not consistent. Therefore, the data sub-partitions are gradually reduced, inconsistent data are quickly positioned, and after the positioning is finished, only correct and consistent data need to be resynchronized from the preposed library, instead of the whole data sub-partitions, so that the processing efficiency when the data are abnormal is improved. And the method combines the prepositive library and the dichotomy to process abnormal data, so that the access amount to the original service library is reduced on one hand, and the data processing speed can be increased on the other hand.

The system realizes the rapid detection of the data consistency of the first platform and the data center station in the data synchronization process, and can reduce the communication resource loss and increase the real-time performance of the data by times.

The intelligent data acquisition synchronization system described above may be implemented in the form of a computer program that may be run on a computer device as shown in fig. 4.

Referring to fig. 4, fig. 4 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 500 may be a server or a server cluster. The server may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), and a big data and artificial intelligence platform.

Referring to fig. 4, the computer device 500 includes a processor 502, memory, and a network interface 505 connected by a system bus 501, where the memory may include a storage medium 503 and an internal memory 504.

The storage medium 503 may store an operating system 5031 and a computer program 5032. The computer programs 5032, when executed, can cause the processor 502 to perform an intelligent data acquisition synchronization method.

The processor 502 is used to provide computing and control capabilities that support the operation of the overall computer device 500.

The internal memory 504 provides an environment for the operation of the computer program 5032 in the storage medium 503, and when the computer program 5032 is executed by the processor 502, the processor 502 can be enabled to execute the intelligent data acquisition synchronization method.

The network interface 505 is used for network communication, such as providing transmission of data information. Those skilled in the art will appreciate that the configuration shown in fig. 4 is a block diagram of only a portion of the configuration associated with the present application and does not constitute a limitation of the computer device 500 to which the present application may be applied, and that a particular computer device 500 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

The processor 502 is configured to run the computer program 5032 stored in the memory to implement the intelligent data acquisition synchronization method disclosed in the embodiment of the present application.

Those skilled in the art will appreciate that the embodiment of a computer device illustrated in fig. 4 does not constitute a limitation on the specific construction of the computer device, and that in other embodiments a computer device may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may only include a memory and a processor, and in such embodiments, the structures and functions of the memory and the processor are consistent with those of the embodiment shown in fig. 4, which are not described herein again.

It should be understood that in the embodiment of the present Application, the Processor 502 may be a Central Processing Unit (CPU), and the Processor 502 may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

In another embodiment of the present application, a computer-readable storage medium is provided. The computer-readable storage medium may be a nonvolatile computer-readable storage medium or a volatile computer-readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program, when executed by a processor, implements the intelligent data acquisition synchronization method disclosed in embodiments of the present application.

It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described devices, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. Those of ordinary skill in the art will appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the components and steps of the various examples have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, device and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only a logical division, and there may be other divisions when the actual implementation is performed, or units having the same function may be grouped into one unit, for example, a plurality of units or components may be combined or may be integrated into another device, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiments of the present application.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a backend server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An intelligent data acquisition synchronization method is applied to an intelligent data acquisition synchronization system, and is characterized in that the intelligent data acquisition synchronization system at least comprises a first platform and a data center platform, and the intelligent data acquisition synchronization method comprises the following steps:

when the first platform meets the data synchronization condition, the data partitioning result is sent to the data intermediate station;

the data center station divides the received data partition result according to a preset data partition strategy to obtain a plurality of data sub-partitions;

and if the data center determines that the difference comparison result is not a null value, acquiring a corresponding sub-partition data comparison result as a target sub-partition data comparison result, determining target data sub-partitions in the plurality of data sub-partitions according to the target sub-partition data comparison result, and using the target data sub-partitions as data synchronization abnormal sub-partitions.

2. The intelligent data acquisition synchronization method of claim 1, wherein the first platform divides a plurality of data to be synchronized to obtain data partitioning results, comprising:

3. The intelligent data acquisition synchronization method of claim 2, wherein the data middlebox divides the received data partition results according to a preset data division strategy to obtain a plurality of data sub-partitions, and the method comprises:

4. The intelligent data acquisition synchronization method of claim 1, wherein the data staging station performs hash check on the plurality of data sub-partitions based on asynchronous data to obtain a data check result, and the method comprises:

obtaining a hash key of the data sub-partition to be verified;

5. The intelligent data acquisition synchronization method of claim 1, wherein the obtaining, by the first platform, the local data verification result corresponding to the data partitioning result comprises:

6. The intelligent data collection synchronization method of claim 4, wherein the data center station performs a difference comparison between the data verification result and the local data verification result to obtain a difference comparison result, and the difference comparison result comprises:

7. The intelligent data acquisition synchronization method according to claim 1, wherein if the data middlebox determines that the difference comparison result is not a null value, acquiring a corresponding sub-partition data comparison result as a target sub-partition data comparison result, and after determining a target data sub-partition in the plurality of data sub-partitions according to the target sub-partition data comparison result and using the target data sub-partition as a data synchronization abnormal sub-partition, the method further comprises:

8. An intelligent data acquisition synchronization system, comprising a first platform and a data center, wherein the intelligent data acquisition synchronization system is used for executing the intelligent data acquisition synchronization method according to any one of claims 1 to 7.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the intelligent data acquisition synchronization method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to carry out the intelligent data acquisition synchronization method according to any one of claims 1 to 7.