CN113468187B

CN113468187B - Multi-party data integration method and device, computer equipment and storage medium

Info

Publication number: CN113468187B
Application number: CN202111025298.8A
Authority: CN
Inventors: 潘玉婷; 姚兴泉
Original assignee: Taiping Financial Technology Services Shanghai Co Ltd Shenzhen Branch
Current assignee: Taiping Financial Technology Services Shanghai Co Ltd Shenzhen Branch
Priority date: 2021-09-02
Filing date: 2021-09-02
Publication date: 2021-11-23
Anticipated expiration: 2041-09-02
Also published as: CN113468187A

Abstract

The application relates to a multi-party data integration method, a multi-party data integration device, computer equipment and a storage medium. The method comprises the following steps: acquiring multi-party data; identifying an update time and a storage location of the multi-party data; determining a partition index number and a partition number according to the storage position; generating an initial identifier of each record in each partition according to the partition number and the partition index number and an arithmetic progression; and generating a serial number of each record in each partition according to the updating time and the initial identification. By adopting the method, the accuracy of the integrated data can be ensured.

Description

Multi-party data integration method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a method and an apparatus for integrating data from multiple parties, a computer device, and a storage medium.

Background

In the enterprise informatization, an enterprise develops to a certain stage, a plurality of business departments appear, each business department has respective data, and the data between the business departments are often stored and defined respectively. Data from each enterprise cannot (or is extremely difficult) interact with other data within the enterprise as isolated islands, resulting in "data islands".

In the conventional technology, the same user data is identified on the whole domain simply by using single user information, such as cookie or personal information such as identification number in the data, and a unique ID is given to the same user data.

However, the client data with missing part of information cannot be identified, and the data with unreal part of information cannot be excluded, so that the data after integration has errors.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a method, an apparatus, a computer device and a storage medium for integrating data from multiple parties, which can ensure the accuracy of the integrated data.

A method of multi-party data integration, the method comprising:

acquiring multi-party data;

identifying an update time and a storage location of the multi-party data;

determining a partition index number and a partition number according to the storage position;

generating an initial identifier of each record in each partition according to the partition number and the partition index number and an arithmetic progression;

and generating a serial number of each record in each partition according to the updating time and the initial identification.

In one embodiment, after the obtaining the multi-party data, the method further includes:

acquiring at least one preset field;

comparing the field values of the preset fields in the multi-party data to obtain records with the same field values of the preset fields;

and merging the records with the same field value of the preset fields.

and carrying out field check on each field in the multi-party data so as to delete the record failed in the check.

In one embodiment, the method further comprises:

acquiring a newly added record and the updating time and the storage position of the newly added record;

determining an initial identifier of a serial number of the last record of the corresponding partition according to the storage position;

calculating to obtain an initial identifier of the newly added record according to the initial identifier of the serial number of the last record;

and calculating to obtain the serial number of the newly added record according to the updating time and the initial identifier of the newly added record.

In one embodiment, the method further comprises:

acquiring an update record;

determining a corresponding original record according to the main key corresponding to the updated record;

and acquiring the updating time and the storage position of the updating record, generating an updating serial number, and replacing the original serial number of the original record by the updating serial number.

In one embodiment, the generating a sequence number of each record in each partition according to the update time and the initial identifier includes:

matching the records by at least one rule;

and acquiring the minimum sequence number corresponding to the successfully matched record as a new sequence number of the successfully matched record.

In one embodiment, the obtaining a minimum sequence number corresponding to a successfully matched record as a new sequence number of the successfully matched record includes:

acquiring a record to be processed with a sequence number changed after the current rule is executed;

aggregating the records to be processed according to the sequence number obtained after the last rule execution is completed to obtain a target association relation;

matching the serial number after the last rule in the target association relation is executed with the current serial number after aggregation;

if the matching is successful, updating the target association relationship according to the serial number after the last rule execution is completed and the current serial number after aggregation, and continuing to match the serial number after the last rule execution is completed in the updated target association relationship with the current serial number after aggregation until the serial number after the last rule execution is completed and the current serial number after aggregation do not exist in the target association relationship;

and processing the serial numbers of all records after the current rule is executed through the updated target association relation.

In one embodiment, the aggregating the to-be-processed records according to the sequence number obtained after the previous rule is executed to obtain the target association relationship includes:

acquiring records with the same sequence number obtained after the last rule is executed, and acquiring the minimum value of the sequence number after the current rule corresponding to the acquired records is executed;

and aggregating the acquired records, wherein the aggregated sequence number is the minimum value.

In one embodiment, the processing, through the updated target association relationship, the sequence numbers of the records after the current rule is executed includes:

matching the serial number of each record after the current rule is executed with the serial number obtained after the last rule in the target association relation is executed;

and when the matching is successful, acquiring an aggregated current serial number corresponding to the serial number obtained after the execution of the successfully matched last rule in the target association relation is completed, and updating the serial number of the successfully matched record after the execution of the current rule through the aggregated current serial number.

A multiparty data consolidation device, the device comprising:

the data acquisition module is used for acquiring multi-party data;

an identification module for identifying an update time and a storage location of the multi-party data;

the determining module is used for determining the index number and the partition number of the partition according to the storage position;

the initial identification generation module is used for generating an initial identification of each record in each partition according to the partition number and the partition index number and an arithmetic progression;

and the serial number generation module is used for generating the serial number of each record in each partition according to the updating time and the initial identification.

A computer device comprising a memory storing a computer program and a processor implementing the steps of the method described above when executing the computer program.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.

According to the multi-party data integration method, the multi-party data integration device, the computer equipment and the storage medium, even if repeated data exists in multi-party data, the repeated data is an independent serial number, so that data confusion is avoided, the problems of incompleteness and unreliability of data communication by using single information are solved to a great extent, the quality of data communication is improved effectively, and the problems of data entry errors and incompleteness are relieved; the defect that the unique serial number is generated by using a third-party component is overcome, the operation efficiency is greatly improved, and the problem of generation of the incremental serial number is reasonably solved.

Drawings

FIG. 1 is a diagram of an application environment for a method for multi-party data integration in one embodiment;

FIG. 2 is a flow diagram illustrating a method for multi-party data integration, according to one embodiment;

FIG. 3 is a diagram that illustrates a serial number of each record, in one embodiment;

FIG. 4 is a schematic representation of a rule-processed version of an embodiment;

FIG. 5 is a diagram illustrating processing of rule two in one embodiment;

FIG. 6 is a flow diagram illustrating sequence number revision in one embodiment;

FIG. 7 is a diagram illustrating a change in sequence number in one embodiment;

FIG. 8 is a progression diagram of sequence number changes in one embodiment;

FIG. 9 is a diagram of aggregated sequence numbers in one embodiment;

FIG. 10 is a diagram of a modified serial number in one embodiment;

FIG. 11 is a schematic flow chart diagram illustrating a method for multi-party data integration in another embodiment;

FIG. 12 is a block diagram of a multi-party data integration apparatus in one embodiment;

FIG. 13 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The multi-party data integration method provided by the application can be applied to the application environment shown in fig. 1. The database 102 communicates with the server 104 through a network, and the server 104 obtains multi-party data from the database 102 and identifies the update time and storage location of the multi-party data; so that server 104 can determine the partition index number and the partition number according to the storage location; generating an initial identifier of each record in each partition according to the partition number and the partition index number and the arithmetic progression; the serial number of each record in each partition is generated according to the updating time and the initial identification, so that even if repeated data exists in multi-party data, the repeated data is an independent serial number, the data confusion is avoided, the problems of incompleteness and unreliability existing in data communication by using single information are solved to a great extent, the quality of data communication is improved effectively, and the problems of data entry errors and incompleteness are relieved; the defect that the unique serial number is generated by using a third-party component is overcome, the operation efficiency is greatly improved, and the problem of generation of the incremental serial number is reasonably solved.

The server 104 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers.

In one embodiment, as shown in fig. 2, a multi-party data integration method is provided, which is exemplified by the application of the method to the server in fig. 1, and includes the following steps:

s202: multi-party data is acquired.

Specifically, the multi-party data refers to data from different sources, such as data between different departments and different systems, wherein the server acquires the multi-party data from the database and extracts the multi-party data into one table, and taking customer information as an example, the server extracts and integrates customer information (name, identification number, mobile phone number, bank card number, micro signal, contract number, table number, and table main key) of different tables of different systems into the same table. The table structure fields are as follows:

s204: update times and storage locations of the multi-party data are identified.

Specifically, the update time refers to a multi-party data storage time, and the storage location refers to different utilities/different systems corresponding to the multi-party data, where the different utilities/different systems are stored in different partitions in the data table, and each partition has a respective index number. Thereby obtaining the storage location, i.e. the index number of the partition of the corresponding multi-party data.

S206: and determining the index number and the partition number of the partition according to the storage position.

Specifically, the partition index number may refer to an identifier of a different partition, and may be, for example, 1, 2, or 3. The number of partitions is the number of all partitions, and the partition number is used for forming an arithmetic progression so as to ensure the uniqueness of a subsequently generated serial number.

S208: and generating an initial identifier of each record in each partition according to the partition number and the partition index number and the arithmetic progression.

S210: and generating a serial number of each record in each partition according to the updating time and the initial identification.

Specifically, a globally unique initialization sequence number is marked for multi-party data, and the rule structure of the initialization sequence number is as follows: time prefix-location suffix. The time prefix is the date yyymmdd when the multi-party data is processed, such as 20210101, and can also be a time stamp; in a distributed environment, data is stored in different partitions, each having a unique index number: 1. 2, 3, each piece of data within a partition also has a unique 1, 2, 3 identified in the partition.

The suffix generation algorithm is such that the unique ID value of the first element in each partition is: the index number of the partition, the unique ID value of the nth element in each partition is: (unique ID value of previous element) + (total number of partitions of the RDD).

For example, assuming that the total partition number is 2, the first element ID of the first partition is 0 and the first element ID of the second partition is 1. The second element ID of the first partition is 0+2=2, and the third element ID of the first partition is 2+2= 4. The second partition has a second element ID of 1+2=3 and the third element ID of the second partition has a third element ID of 3+2= 5.

Specifically, as shown in fig. 3, fig. 3 is a schematic diagram of a serial number of each record in an embodiment, but it should be noted that the number of records is more in practical application, and in this embodiment, only the 7 pieces of data are taken as an example for description. In the distributed environment, data in different partitions form a suffix of the unique serial number by using partition index numbers and index numbers of the data in the partitions. The concurrent processing in a distributed environment can be realized, and the efficiency is very high.

In the embodiment, even if repeated data exists in multi-party data, the repeated data is an independent serial number, so that the data confusion is avoided, the problems of incompleteness and unreliability of using single information to communicate the data are solved to a great extent, the communication quality of the data is improved effectively, and the problems of error and unreliability of data entry are relieved; the defect that the unique serial number is generated by using a third-party component is overcome, the operation efficiency is greatly improved, and the problem of generation of the incremental serial number is reasonably solved.

In one embodiment, after the multi-party data is acquired, the method further includes: acquiring at least one preset field; comparing the field values of preset fields in the multi-party data to obtain records with the same field values of the preset fields; and merging the records with the same field value of the preset field.

In this embodiment, in order to reduce the size of data, the size of data to be processed is reduced by using the data information content consistency characteristic. For example, when the name, the identification number, the mobile phone number, the bank card number and the micro signal of the data are the same, the records are combined into one record, and other data are aggregated into one field, so that the scale of the data to be processed can be effectively reduced, the data processing efficiency is obviously improved, and meanwhile, the data volume and the time can be shortened when the globally unique serial number is subsequently distributed.

It should be noted that in this embodiment, the preset fields are set to be a name, an identification number, a mobile phone number, a bank card number, and a micro signal, and in other embodiments, the server may set the preset fields as needed, and merge records through the preset fields to reduce data processing amount. Wherein the preset field may be a field that requires the user to pay attention.

In one embodiment, after the multi-party data is acquired, the method further includes: and carrying out field check on each field in the multi-party data so as to delete the record failed in the check.

Specifically, in this embodiment, in order to improve the quality of data, the corresponding field may be verified, where the rule of verification may be preset by the user, for example, the data cleaning for the name, the identification card number, the mobile phone number, and the bank card number may be performed by verifying the identification card number field according to the identification card coding and verification rule, and dummy data that does not meet the rule is set to be blank; and removing the numbers in the name.

In the embodiment, the data is cleaned in advance, so that the quality of the data is ensured.

In one embodiment, the multi-party data integration method further includes: acquiring a newly added record and the updating time and the storage position of the newly added record; determining an initial identifier of a serial number of the last record corresponding to the partition according to the storage position; calculating to obtain an initial identifier of the newly added record according to the initial identifier of the serial number of the last record; and calculating to obtain the serial number of the newly added record according to the updating time and the initial identifier of the newly added record.

Specifically, in the present embodiment, for the processing of the newly added record, the server keeps the sequence number of the processed data unchanged by a method of approaching to the minimum sequence number. Each piece of data, namely the record has a unique initial serial number, and when the rule matching is successful, the minimum serial number is taken as a new serial number of the data from all the initial serial numbers of the same user. Meanwhile, because the unique serial number of the newly added data is always larger than the serial number of the old data, the newly added data matched with the old data can obtain the serial number of the old data, and the serial number of the processed data is ensured to be unchanged. That is, aiming at the newly added data, the sequence number of the newly added record is obtained by calculation according to the updating time and the initial identifier of the newly added record, and then the sequence number is merged with the old data so as to ensure that the sequence number of the processed data is not changed.

In one embodiment, the multi-party data integration method further includes: acquiring an update record; determining a corresponding original record according to a main key corresponding to the updated record; and acquiring the updating time and the storage position of the updating record, generating an updating serial number, and replacing the original serial number of the original record by the updating serial number.

Specifically, in this embodiment, the data update problem is solved by the data primary key. In practice, data is not constant, and there is a need for updating, which needs to solve the problem that the original determination is incorrect after the data is updated. And judging to match the data by using the main key, judging whether the data is updated when the data is matched, if the data is updated, changing the originally judged serial number into a new initialized serial number, and judging the serial number again. That is, in this embodiment, after the updated data matches the original data, the sequence number of the updated data is calculated, and then sequence number merging is performed, and the specific operation of sequence number merging can be referred to below.

In one embodiment, after generating the sequence number of each record in each partition according to the update time and the initial identifier, the method includes: matching the records by at least one rule; and acquiring the minimum sequence number corresponding to the successfully matched record as a new sequence number of the successfully matched record.

Specifically, the rules are to merge the same data, each rule may be preset by the user, and still take the customer information as an example for explanation, which may include the following rules: first two digits of name and identification number, name and mobile phone number, identification number and mobile phone number, name and bank card number, name and micro-signal, mobile phone number and bank card number and mobile phone number and micro-signal.

The server may sequentially perform rule matching and serial number determination according to the order of the rules, and select the minimum serial number as a new serial number after determining that the records are the same, for example, assuming that there are two serial number determination rules, rule 1: data with the same name and identity card number is identified as the same user data, and rule 2: and the data with the identity card number being the same as the mobile phone number is identified as the same user data. The concrete description is as follows: the rule matching link can carry out rule matching according to a specified sequence, such as rule 1 matching and then rule 2 matching; when a rule is matched, the minimum sequence number is taken as a new sequence number of the data from all the initial sequence numbers judged to be the same user.

Specifically, as shown in fig. 4 and 5, in which the data shown in fig. 3 is still used as the initial data, rule 1 matching is performed first, and the minimum sequence number is taken as the new sequence number of all data with the same name and identity number. With reference to fig. 4, wherein initial sequence number: 20210101-1 and initial sequence number: 20210101-4 has the same name and identity number, and the lowest sequence number (20210101-1, 20210101-4) is used as 20210101-1 as the new sequence number of the data. For the initial sequence number: 20210101-3 and initial sequence number: 20210101-6. The other data does not conform to rule 1, so the sequence number is the initial sequence number.

Then the server executes rule 2, namely, performs rule 2 matching, and takes the minimum serial number from the initial serial numbers of all data with the same identification number and mobile phone number as the new serial number of the data, specifically, see fig. 5, where the initial IDs of the identification number and mobile phone number are the same (20210101-0, 20210101-3, 20210101-4) and (20210101-2, 20210101-5), and respectively takes the minimum serial number as the new serial numbers, that is, 20210101-0 and 0210101-2.

In the above embodiment, the multiple information is used as the identification rule to identify that two pieces of specific information in the same user request data are the same, and then the same user is identified. If the certificate number and the mobile phone number in the two data are required to be the same as each other, the two data are judged to belong to the same user if the certificate number and the mobile phone number in the two data are in accordance with the judgment rule, and the same unique serial number is printed. And a plurality of judgment rules are used for identifying whether the same user is the same user or not, and the same user is judged as the same user as long as one of the rules is met. If there are two assertion rules: rule 1: identity card number + mobile phone number, rule 2: the name + the license number has three pieces of data 01 (Zhang III, ID card number 1, mobile phone number 1), 02 (ID card number 1, mobile phone number 1), 03 (Zhang III, ID card number 1), because the data 01 and 02 accord with the rule 1, the data 01 and 03 accord with the rule 2, the three pieces of data are judged to be the same user, and the same unique serial number is printed.

In one embodiment, but as a result of this stage, it can be seen that the initial sequence number: 20210101-1 does not get the correct serial number, i.e., 0210101-0, but instead retains the original initial serial number, which creates a problem of inconsistent serial numbers when multiple rules match. It is necessary to solve this problem by serial number correction. Therefore, in one embodiment, obtaining the minimum sequence number corresponding to the successfully matched record as the new sequence number of the successfully matched record includes:

s602: and acquiring the record to be processed with the sequence number changed after the current rule is executed.

Specifically, the pending record with a changed serial number refers to that the original serial number is different from the new serial number after the current rule is executed, and is still described in the above example, referring to fig. 7, where the serial number is changed after the rule 2 is partially recorded.

The theoretically expected result is that if any rule is met, the data are judged to be the same user, and the same serial number is used as the identifier, but after two rules, the data partially meeting the rule 1 are found to have no expected serial number change, and in the above example, the original IDs 20210101-3 and 20210101-6 are judged to be the same user according to the rule 1, but only the original IDs 20210101-0, 20210101-3 and 20210101-4 are judged to be the same user according to the rule 2. Therefore, the sequence number needs to be modified, wherein according to the three-segment theory, it is known that a is equal to B, B is equal to C, and a is also equal to C.

The goal is that all data that meets the rules are judged to be the same person, i.e., all involved initial sequence numbers should be attributed to the same sequence number, i.e., the latest sequence numbers with initial sequence numbers of 20210101-6 and 20210101-1 should be judged to be 20210101-0.

According to the above evolution process of the initial sequence number corresponding to the latest sequence number, fig. 8 shows a sequence number change process, and the initialized sequence numbers of all data finally point to the same sequence number and are consistent with an expected target, so that the target is the root node of the unidirectional connectivity graph. To find the root node for each node, the algorithm implementation is detailed below.

S604: and aggregating the records to be processed according to the sequence number obtained after the last rule is executed to obtain the target association relation.

Specifically, the aggregation is performed according to the sequence numbers obtained after the last rule execution is completed. In one embodiment, aggregating the to-be-processed records according to the sequence number obtained after the execution of the previous rule is completed to obtain the target association relationship includes: acquiring records with the same sequence number obtained after the last rule is executed, and acquiring the minimum value of the sequence number after the current rule corresponding to the acquired records is executed; and aggregating the acquired records, wherein the aggregated sequence number is the minimum value. That is, when the sequence number of the last rule is the same, the corresponding minimum sequence number is taken out, because the sequence number selection rule is the minimum, and then duplication is removed after the minimum sequence number is obtained. Through this step, the data is processed into a one-to-one relationship.

In practical application, the method may be implemented by calling a code, the input of the code is a serial number obtained after the execution of the previous rule is completed and a current serial number, the serial number obtained after the execution of the previous rule is completed is aggregated by performing code processing, and a minimum value of the current serial number in an aggregated record is obtained, so that a one-to-one relationship is output, that is, the serial number obtained after the execution of the previous rule is completed and the corresponding minimum value, which may be specifically shown in fig. 9, where fig. 9 is the record obtained after aggregation.

S606: and matching the serial number after the last rule in the target association relation is executed with the current serial number after aggregation.

S608: if the matching is successful, updating the target association relation according to the serial number after the last rule execution is completed and the current serial number after aggregation, and continuing to match the serial number after the last rule execution is completed in the updated target association relation with the current serial number after aggregation until the serial number after the last rule execution is completed and the current serial number after aggregation do not exist in the target association relation.

Specifically, the record obtained after polymerization is referred to as

Table, judging whether the table exists

If the field has the same value as the last rule sequence number field, aggregation needs to be continued, so that each finally obtained record points to the root node. In practical application, the method can be realized by a code calling mode, and the input is as follows:

watch and notebook

Table;

watch and notebook

Table; code processing

When is coming into contact with

Get the last rule ID of

Last rule sequence number of and

is/are as follows

(ii) a Otherwise, get

Last rule sequence number and

is/are as follows

(ii) a And finally, outputting:

table, table structure (last rule sequence number,

). The server judges after the processing is finished

Whether or not the table can be explored further, i.e. determining

Whether a table exists

The field and the last rule sequence number field are the same value. If so, use

Repeating the steps; if not, the next step is carried out; the modified sequence number allows each piece of data to point to the root node, as shown in fig. 9, where in this embodiment the modified sequence number is the same as in fig. 9, since the last regular sequence number in this example is the same.

S610: and processing the serial numbers of all records after the current rule is executed through the updated target association relation.

In one embodiment, processing the sequence numbers of the records after the current rule is executed through the updated target association relationship includes: matching the serial number of each record after the current rule is executed with the serial number obtained after the last rule in the target association relation is executed; and when the matching is successful, acquiring an aggregated current serial number corresponding to the serial number obtained after the execution of the successfully matched last rule in the target association relation is completed, and updating the serial number of the successfully matched record after the execution of the current rule through the aggregated current serial number.

Wherein, in the embodiment, the serial numbers of all records after the current rule is executed are mainly corrected, and the mapping relationship of the processing is obtained,

the table is a table obtained by correcting the result table (referred to as t2 table) of rule 2 and associating the results with the sequence number of t2

The last rule sequence number of the table, the sequence number field at t2 and

the values of the last rule sequence number field of the table are equal, then the value is taken

Of watches

The value replacement t2 table may be specifically combined with the sequence number of the corresponding data as shown in fig. 10.

In the embodiment, the problem of inconsistent serial numbers caused by serial number change in the matching process is solved by using the idea of the unidirectional connection graph, when multiple rules are matched, matching can be performed according to the rules in sequence, because of different information, a certain piece of data originally marked with the same serial number in the rule I can be changed into another serial number due to matching of the rule II, the idea of a graph needs to be used, the serial number change tracks of the data are actually the unidirectional connection graphs, and the data related to the graph are all marked with the same serial number, so that the problem of inconsistent serial numbers can be solved.

Specifically, as shown in fig. 11, in the multi-party data integration method in the present application, first, multi-party data is processed, including the above verification and merging, and then rule matching is performed, if the serial number recorded in the matching process changes, the serial number is corrected, and after the correction is completed, the next rule is continuously obtained for processing, until all rules are traversed, the multi-party data integration is completed.

It should be understood that, although the steps in the flowcharts of fig. 2 and 11 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2 and 11 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least some of the other steps or stages.

In one embodiment, as shown in fig. 12, there is provided a multi-party data integration apparatus including: a data obtaining module 1201, a recognition module 1202, a determination module 1203, an initial identifier generating module 1204, and a serial number generating module 1205, where:

a data obtaining module 1201, configured to obtain multi-party data;

an identification module 1202 for identifying an update time and a storage location of the multi-party data;

a determining module 1203, configured to determine a partition index number and a partition number according to the storage location;

an initial identifier generating module 1204, configured to generate an initial identifier for each record in each partition according to the number of partitions and the partition index number and according to the arithmetic progression;

a serial number generating module 1205, configured to generate a serial number of each record in each partition according to the update time and the initial identifier.

In one embodiment, the multi-party data integration apparatus further includes:

the field acquisition module is used for acquiring at least one preset field;

the comparison module is used for comparing the field values of the preset fields in the multi-party data to obtain records with the same field values of the preset fields;

and the merging module is used for merging the records with the same field value of the preset field.

In one embodiment, the multi-party data integration apparatus further includes:

and the checking module is used for carrying out field checking on each field in the multi-party data so as to delete the record failed in checking.

In one embodiment, the multi-party data integration apparatus further includes:

the newly added record acquisition module is used for acquiring the newly added record and the updating time and the storage position of the newly added record;

the initial identification calculation module is also used for determining the initial identification of the serial number of the last record corresponding to the partition according to the storage position; calculating to obtain an initial identifier of the newly added record according to the initial identifier of the serial number of the last record;

the serial number generating module 1205 is further configured to calculate a serial number of the newly added record according to the update time and the initial identifier of the newly added record.

In one embodiment, the multi-party data integration apparatus further includes:

the updating record obtaining module is used for obtaining the updating record;

the original record acquisition module is used for determining a corresponding original record according to the main key corresponding to the updated record;

and the first updating module is used for acquiring the updating time and the storage position of the updated record, generating an updated serial number and replacing the original serial number of the original record by the updated serial number.

In one embodiment, the multi-party data integration apparatus further includes:

the matching module is used for matching the records through at least one rule;

and the second updating module is used for acquiring the minimum sequence number corresponding to the successfully matched record as a new sequence number of the successfully matched record.

In one embodiment, the update module includes:

the record to be processed obtaining unit is used for obtaining the record to be processed with the sequence number changed after the current rule is executed;

the aggregation unit is used for aggregating the records to be processed according to the serial number obtained after the execution of the previous rule is completed to obtain a target association relation;

the matching unit is used for matching the serial number after the execution of the last rule in the target association relation with the current serial number after aggregation;

the updating unit is used for updating the target association relationship according to the serial number after the execution of the last rule is completed and the current serial number after aggregation if the matching is successful, and continuously matching the serial number after the execution of the last rule in the updated target association relationship with the current serial number after aggregation until the serial number after the execution of the last rule after matching and the current serial number after aggregation do not exist in the target association relationship;

and the serial number processing unit is used for processing the serial numbers of all records after the current rule is executed through the updated target association relation.

In one embodiment, the polymerization unit includes:

the data acquisition subunit is used for acquiring records with the same serial number obtained after the execution of the last rule is finished and acquiring the minimum value of the serial number after the execution of the current rule corresponding to the acquired records is finished;

and the aggregation subunit is used for aggregating the acquired records, and the aggregated sequence number is the minimum value.

In one embodiment, the serial number processing unit includes:

the matching subunit is used for matching the serial number of each record after the current rule is executed with the serial number obtained after the last rule in the target association relation is executed;

and the updating subunit is used for acquiring the aggregated current serial number corresponding to the serial number obtained after the execution of the last rule successfully matched in the target association relation is completed when the matching is successful, and updating the serial number of the record after the execution of the current rule successfully matched through the aggregated current serial number.

For the specific definition of the multi-party data integration apparatus, reference may be made to the above definition of the multi-party data integration method, which is not described herein again. The modules in the multi-party data integration device can be implemented in whole or in part by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 13. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is for storing multi-party data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a multi-party data integration method.

Those skilled in the art will appreciate that the architecture shown in fig. 13 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program: acquiring multi-party data; identifying an update time and a storage location of the multi-party data; determining a partition index number and a partition number according to the storage position; generating an initial identifier of each record in each partition according to the partition number and the partition index number and the arithmetic progression; and generating a serial number of each record in each partition according to the updating time and the initial identification.

In one embodiment, the obtaining of the multi-party data, as implemented by the processor when executing the computer program, further comprises: acquiring at least one preset field; comparing the field values of preset fields in the multi-party data to obtain records with the same field values of the preset fields; and merging the records with the same field value of the preset field.

In one embodiment, the obtaining of the multi-party data, as implemented by the processor when executing the computer program, further comprises: and carrying out field check on each field in the multi-party data so as to delete the record failed in the check.

In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring a newly added record and the updating time and the storage position of the newly added record; determining an initial identifier of a serial number of the last record corresponding to the partition according to the storage position; calculating to obtain an initial identifier of the newly added record according to the initial identifier of the serial number of the last record; and calculating to obtain the serial number of the newly added record according to the updating time and the initial identifier of the newly added record.

In one embodiment, the processor, when executing the computer program, further performs the steps of: acquiring an update record; determining a corresponding original record according to a main key corresponding to the updated record; and acquiring the updating time and the storage position of the updating record, generating an updating serial number, and replacing the original serial number of the original record by the updating serial number.

In one embodiment, the generating the sequence number of each record in each partition according to the update time and the initial identification when the processor executes the computer program comprises: matching the records by at least one rule; and acquiring the minimum sequence number corresponding to the successfully matched record as a new sequence number of the successfully matched record.

In one embodiment, the obtaining of the minimum sequence number corresponding to the successfully matched record, which is implemented when the processor executes the computer program, as the new sequence number of the successfully matched record includes: acquiring a record to be processed with a sequence number changed after the current rule is executed; aggregating the records to be processed according to the serial number obtained after the last rule is executed to obtain a target association relation; matching the serial number after the last rule in the target association relation is executed with the current serial number after aggregation; if the matching is successful, updating the target association relation according to the serial number after the last rule execution is completed and the current serial number after aggregation, and continuing to match the serial number after the last rule execution is completed in the updated target association relation with the current serial number after aggregation until the serial number after the last rule execution is completed and the current serial number after aggregation do not exist in the target association relation; and processing the serial numbers of all records after the current rule is executed through the updated target association relation.

In one embodiment, the aggregating, according to a sequence number obtained after the completion of the previous rule execution, a to-be-processed record to obtain a target association relationship, which is implemented when a processor executes a computer program, includes: acquiring records with the same sequence number obtained after the last rule is executed, and acquiring the minimum value of the sequence number after the current rule corresponding to the acquired records is executed; and aggregating the acquired records, wherein the aggregated sequence number is the minimum value.

In one embodiment, the processing of the sequence numbers of the records after the current rule is executed by the updated target association relationship, which is implemented when the processor executes the computer program, includes: matching the serial number of each record after the current rule is executed with the serial number obtained after the last rule in the target association relation is executed; and when the matching is successful, acquiring an aggregated current serial number corresponding to the serial number obtained after the execution of the successfully matched last rule in the target association relation is completed, and updating the serial number of the successfully matched record after the execution of the current rule through the aggregated current serial number.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: acquiring multi-party data; identifying an update time and a storage location of the multi-party data; determining a partition index number and a partition number according to the storage position; generating an initial identifier of each record in each partition according to the partition number and the partition index number and the arithmetic progression; and generating a serial number of each record in each partition according to the updating time and the initial identification.

In one embodiment, the computer program, when executed by the processor, further comprises, after obtaining the multi-party data: acquiring at least one preset field; comparing the field values of preset fields in the multi-party data to obtain records with the same field values of the preset fields; and merging the records with the same field value of the preset field.

In one embodiment, the computer program, when executed by the processor, further comprises, after obtaining the multi-party data: and carrying out field check on each field in the multi-party data so as to delete the record failed in the check.

In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring a newly added record and the updating time and the storage position of the newly added record; determining an initial identifier of a serial number of the last record corresponding to the partition according to the storage position; calculating to obtain an initial identifier of the newly added record according to the initial identifier of the serial number of the last record; and calculating to obtain the serial number of the newly added record according to the updating time and the initial identifier of the newly added record.

In one embodiment, the computer program when executed by the processor further performs the steps of: acquiring an update record; determining a corresponding original record according to a main key corresponding to the updated record; and acquiring the updating time and the storage position of the updating record, generating an updating serial number, and replacing the original serial number of the original record by the updating serial number.

In one embodiment, the computer program, when executed by the processor, after generating a sequence number for each record within each partition based on the update time and the initial identification, comprises: matching the records by at least one rule; and acquiring the minimum sequence number corresponding to the successfully matched record as a new sequence number of the successfully matched record.

In one embodiment, the obtaining of the minimum sequence number corresponding to the successfully matched record, as a new sequence number of the successfully matched record, when the computer program is executed by the processor, includes: acquiring a record to be processed with a sequence number changed after the current rule is executed; aggregating the records to be processed according to the serial number obtained after the last rule is executed to obtain a target association relation; matching the serial number after the last rule in the target association relation is executed with the current serial number after aggregation; if the matching is successful, updating the target association relation according to the serial number after the last rule execution is completed and the current serial number after aggregation, and continuing to match the serial number after the last rule execution is completed in the updated target association relation with the current serial number after aggregation until the serial number after the last rule execution is completed and the current serial number after aggregation do not exist in the target association relation; and processing the serial numbers of all records after the current rule is executed through the updated target association relation.

In one embodiment, the aggregating, when the computer program is executed by the processor, the to-be-processed records according to the sequence number obtained after the execution of the last rule is completed to obtain the target association relationship includes: acquiring records with the same sequence number obtained after the last rule is executed, and acquiring the minimum value of the sequence number after the current rule corresponding to the acquired records is executed; and aggregating the acquired records, wherein the aggregated sequence number is the minimum value.

In one embodiment, the processing of the serial numbers of the records after the current rule is executed by the updated target association relationship, which is implemented when the computer program is executed by the processor, includes: matching the serial number of each record after the current rule is executed with the serial number obtained after the last rule in the target association relation is executed; and when the matching is successful, acquiring an aggregated current serial number corresponding to the serial number obtained after the execution of the successfully matched last rule in the target association relation is completed, and updating the serial number of the successfully matched record after the execution of the current rule through the aggregated current serial number.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for multi-party data integration, the method comprising:

acquiring multi-party data, and extracting the multi-party data into a table;

identifying an update time and a storage location of the multi-party data;

2. The method of claim 1, wherein after the obtaining of the multi-party data, further comprising:

acquiring at least one preset field;

and merging the records with the same field value of the preset fields.

3. The method of claim 2, wherein after the obtaining of the multi-party data, further comprising:

4. A method according to any one of claims 1 to 3, characterized in that the method further comprises:

5. A method according to any one of claims 1 to 3, characterized in that the method further comprises:

acquiring an update record;

6. The method according to any one of claims 1 to 3, wherein the generating a sequence number of each record in each partition according to the update time and the initial identifier comprises:

matching the records by at least one rule;

7. The method according to claim 6, wherein the obtaining a minimum sequence number corresponding to the successfully matched record as a new sequence number of the successfully matched record comprises:

8. The method according to claim 7, wherein aggregating the to-be-processed records according to the sequence number obtained after the previous rule execution is completed to obtain a target association relationship comprises:

9. The method according to claim 7, wherein the processing the serial numbers of the records after the current rule is executed through the updated target association relationship includes:

10. A multiparty data consolidation apparatus, the apparatus comprising:

the data acquisition module is used for acquiring multi-party data and extracting the multi-party data into a table;

11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor realizes the steps of the method of any one of claims 1 to 9 when executing the computer program.

12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 9.