Disclosure of Invention
Aiming at the problem that data integration is difficult to carry out under the condition of ensuring that data is not leaked in the prior art, the invention provides the auditing and auditing pricing heterogeneous data online integration method and the auditing and pricing heterogeneous data online integration system based on multi-party calculation.
The technical scheme of the invention is as follows.
The multi-party computing based auditing and auditing pricing heterogeneous data online integration method comprises the following steps of:
s1: each node collects data to be examined according to a preset task and stores the data to be examined in a heterogeneous database of the node;
s2: when a data integration requirement is generated, the corresponding node generates a data integration instruction and sends the data integration instruction to the designated node, the node receiving the data integration instruction is used as a participating node, and the rest nodes are idle nodes;
s3: the participating nodes encrypt the target data according to the data integration instruction, the designated positions of the encryption results are written into the corresponding vacant positions in the data integration instruction and forwarded to the designated idle nodes, the idle nodes are converted into the participating nodes, and the step S3 is repeated until the writing of the data integration instruction is completed to obtain spliced encryption results;
s4: broadcasting the written data integration instruction to all the participating nodes, judging whether the spliced encryption result is consistent with the own encryption result or not by each participating node, and marking inconsistent positions;
s5: if the spliced encryption result does not have an inconsistent position, the target data of each participating node is consistent, and the target data is integrated; and if the inconsistent positions exist, selecting the participating nodes with consistent target data according to the marking result and integrating the target data, wherein the rest participating nodes are marked as abnormal nodes.
The invention utilizes multi-party calculation to integrate data among nodes through a data integration instruction, each node cannot see target data of other nodes because target data is not directly uploaded, and in the transmission process of the data integration instruction, vacant positions are written by a bit, and the encryption result of the spliced part of each node through other nodes cannot accurately judge whether the target data are consistent with each other, so that the target data cannot be leaked in the process; and finally, after the written data integration instruction is broadcasted, whether the spliced encryption result is consistent with the self encryption result is judged, abnormal nodes can be judged by publicly counting inconsistent positions marked by all nodes, and the output result of normal nodes is generally consistent, so that the judgment result cannot be influenced even if the abnormal nodes are not honest, further, wrong data is eliminated, the data is confirmed and checked, and the data integration is completed.
Preferably, the data integration instruction includes: the data integration method comprises target data description, a data integration pool and an encryption protocol, wherein the encryption protocol comprises an encryption algorithm and related node numbers, the data integration pool comprises vacant positions corresponding to the node numbers, and the number of the vacant positions is consistent with that of the node numbers. The data integration pool is a reserved virtual storage space and is pre-divided into a plurality of vacant positions with numbers.
Preferably, the participating node encrypts the target data according to the data integration instruction, and writes the specified position of the encryption result into the corresponding vacant position in the data integration instruction, including:
the participating nodes search target data from self heterogeneous databases according to the target data description in the data integration instruction;
encrypting the target data according to an encryption algorithm in an encryption protocol to obtain an encryption result;
and selecting a corresponding vacant position in the data integration pool according to the number of the participating node, extracting the content of the position corresponding to the encryption result, and writing the content into the vacant position.
Although the encryption result is a string of irregular data, the data is long, and some segment is correspondingly filled in the vacant position according to the number, so that the writing operation is completed. The node numbers and the numbers of the vacant positions may be in one-to-one correspondence or may be a correspondence of other preset rules, which is not limited herein, as long as there is a preset determined correspondence, for example, if the node number is k, the kth bit of the encryption result is filled in the position of the number k in the vacant position; or the node is numbered k, and can be the positions of 10k to 20k in the empty positions filled by the 10k th to 20k th bits of the encryption result.
Preferably, the repeating step S3 until the writing of the data integration instruction is completed, to obtain a pieced encryption result, includes:
when all nodes corresponding to the node numbers involved in the encryption protocol are converted into participating nodes and all vacant positions of the data integration pool have written contents, the writing of the data integration instruction is finished, and the contents at all the positions in the data integration pool are spliced in sequence to obtain a spliced encryption result.
Preferably, the process of step S4 includes:
and broadcasting the written data integration instruction to all the participating nodes, reading the spliced encryption result by each participating node, and comparing the spliced encryption result with the encryption result of the participating node, wherein the non-coincident part is the non-coincident position.
Preferably, the inconsistent positions are marked in the following manner: and each participating node outputs a marking result which is represented as a { n }, wherein a represents a node number, and { n } represents a position set output by the node, and each element in the position set represents the number of an inconsistent position obtained by comparison of the node.
If the 20 th to 30 th bits of the pieced encryption result are inconsistent with the own encryption result for the node numbered 5, the inconsistent position output by the node is the 5 chinese opening 20, 21 … … }.
Preferably, in step S5, if there is an inconsistent position, selecting a participating node with consistent target data according to the marking result and integrating the target data, where the remaining participating nodes are marked as abnormal nodes, including:
and grouping the marking results according to the content of the position set, wherein the marking results with the completely same content of the position set are divided into a group, the group containing the most marking results is regarded as a qualified group, node numbers corresponding to the marking results in the qualified group are extracted, the participating nodes corresponding to the node numbers consider that the data check is passed, the target data integration is carried out, and the rest participating nodes are marked as abnormal nodes.
That is, the contents of { n } outputted from different nodes are divided into a group, and since the encryption results of the error information of the abnormal node are various in practice and the encryption results corresponding to the correct information are identical, the group including the most marked result is regarded as a qualified group, and all the groups except the qualified group are regarded as abnormal nodes.
Preferably, the encryption algorithm is a hash algorithm. The technology of the Hash algorithm is mature, and the requirements of the scheme can be met.
The invention also provides an auditing and auditing price-checking heterogeneous data online integration system based on multi-party calculation, which consists of a plurality of nodes, wherein the nodes in the system jointly execute the auditing and auditing price-checking heterogeneous data online integration method based on multi-party calculation.
The substantial effects of the invention include: through writing in proper order of data integration instruction, can carry out the piece together of encryption result to avoided the strict requirement to time synchronization among the operation process, rethread data integration instruction's broadcast can judge inconsistent position, thereby obtain the node and the unusual node that data integration passes through, can realize carrying out data integration under the condition that target data do not reveal, get rid of the node that data have a problem. And the output of other nodes is a decisive factor, and even if the abnormal node is not honest, the judgment result cannot be influenced, so that the error data can be eliminated, the data can be confirmed and checked, and the data integration can be completed.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions will be clearly and completely described below with reference to the embodiments, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
It should be understood that, in various embodiments of the present invention, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the internal logic of the processes, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
It should be understood that in the present application, "comprising" and "having" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that, in the present invention, "a plurality" means two or more. "and/or" is merely an association describing an associated object, meaning that three relationships may exist, for example, and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "comprises A, B and C" and "comprises A, B, C" means that all three of A, B, C comprise, "comprises A, B or C" means that one of three of A, B, C is comprised, "comprises A, B and/or C" means that any 1 or any 2 or 3 of the three of A, B, C is comprised.
The technical solution of the present invention will be described in detail with reference to specific examples. Embodiments may be combined with each other and some details of the same or similar concepts or processes may not be repeated in some embodiments.
The auditing work involves more documents, various work orders and bills, and information with the same content is distributed on different types of information carriers, and the content extraction work of the information carriers can be completed by different nodes, so that the information needs to be summarized and integrated in the auditing and pricing process to confirm whether the information is wrong or not. The present embodiment is mainly directed to information collation in data integration.
The embodiment is as follows:
the multi-party calculation-based auditing and auditing pricing heterogeneous data online integration system comprises a plurality of nodes, and the nodes in the system jointly execute the multi-party calculation-based auditing and pricing heterogeneous data online integration method. The online integration method of the auditing and auditing pricing heterogeneous data based on multi-party calculation is shown in FIG. 1 and comprises the following steps:
s1: and each node collects the data to be examined according to the preset tasks and stores the data to be examined in a heterogeneous database of the node.
The data sources collected by the nodes can be files, various work orders, bills, other existing databases and the like, the collected data to be examined can be various purchase information, purchase price, engineering information and the like, the nodes can be computers of various collection points, and the data sources are not limited. And storing the acquired data to be audited into a corresponding heterogeneous database according to the data type.
S2: when a data integration requirement is generated, the corresponding node generates a data integration instruction and sends the data integration instruction to the designated node, the node receiving the data integration instruction is used as a participating node, and the rest nodes are idle nodes.
Wherein, the data integration instruction comprises: the data integration method comprises target data description, a data integration pool and an encryption protocol, wherein the encryption protocol comprises an encryption algorithm and related node numbers, the data integration pool comprises vacant positions corresponding to the node numbers, and the number of the vacant positions is consistent with that of the node numbers. The data integration pool is a reserved virtual storage space and is pre-divided into a plurality of vacant positions with numbers.
In this embodiment, the encryption algorithm is a hash algorithm. The technology of the hash algorithm is mature, the requirements of the scheme can be met, and the obtained encryption result is a hash value.
S3: and the participating nodes encrypt the target data according to the data integration instruction, the designated positions of the encryption results are written into the corresponding vacant positions in the data integration instruction and forwarded to the designated idle nodes, the idle nodes are converted into the participating nodes, and the step S3 is repeated until the writing of the data integration instruction is completed, so that the spliced encryption results are obtained.
The method specifically comprises the following steps:
the participating nodes search target data from self heterogeneous databases according to the target data description in the data integration instruction;
encrypting the target data according to an encryption algorithm in an encryption protocol to obtain an encryption result;
selecting a corresponding vacant position in the data integration pool according to the number of the participating node, extracting the content of the position corresponding to the encryption result and writing the content into the vacant position;
when all nodes corresponding to the node numbers involved in the encryption protocol are converted into participating nodes and all vacant positions of the data integration pool have written contents, the writing of the data integration instruction is finished, and the contents at all the positions in the data integration pool are spliced in sequence to obtain a spliced encryption result.
Although the encryption result is a string of irregular data, the data is long, and some segment is correspondingly filled in the vacant position according to the number, so that the writing operation is completed. The node numbers and the numbers of the vacant positions may be in one-to-one correspondence or may be in correspondence with other preset rules, and no limitation is made here as long as there is a preset determined correspondence, for example, if the node number is k, the kth bit of the encryption result is filled in the position of the number k in the vacant positions; or the node number is k, and the 10k th bit to the 20k th bit of the encryption result can fill the positions of the numbers 10k to 20k in the vacant positions.
S4: and broadcasting the written data integration instruction to all the participating nodes, judging whether the spliced encryption result is consistent with the own encryption result or not by each participating node, and marking inconsistent positions.
The specific process comprises the following steps:
and broadcasting the written data integration instruction to all the participating nodes, reading the spliced encryption result by each participating node, and comparing the spliced encryption result with the encryption result of the participating node, wherein the non-coincident part is the non-coincident position.
Wherein, the marking mode of the inconsistent position is as follows: and each participating node outputs a marking result which is represented as a { n }, wherein a represents a node number, and { n } represents a position set output by the node, and each element in the position set represents the number of an inconsistent position obtained by comparison of the node.
If the 20 th to 30 th bits of the pieced encryption result are inconsistent with the own encryption result for the node numbered 5, the inconsistent position output by the node is the 5 chinese opening 20, 21 … … }.
S5: if the spliced encryption result does not have an inconsistent position, the target data of each participating node is consistent, and the target data is integrated; and if the inconsistent positions exist, selecting the participating nodes with consistent target data according to the marking result and integrating the target data, wherein the rest participating nodes are marked as abnormal nodes.
In step S5, if there is an inconsistent position, selecting a participating node with consistent target data according to the marking result and integrating the target data, where the remaining participating nodes are marked as abnormal nodes, including:
and grouping the marking results according to the content of the position set, wherein the marking results with the completely same content of the position set are divided into a group, the group containing the most marking results is regarded as a qualified group, node numbers corresponding to the marking results in the qualified group are extracted, the participating nodes corresponding to the node numbers consider that the data check is passed, the target data integration is carried out, and the rest participating nodes are marked as abnormal nodes.
That is, the contents of { n } outputted from different nodes are divided into a group, and since the encryption results of the error information of the abnormal node are various in practice and the encryption results corresponding to the correct information are identical, the group including the most marked result is regarded as a qualified group, and all the groups except the qualified group are regarded as abnormal nodes.
In the embodiment, multi-party calculation is utilized, data integration is performed between nodes through a data integration instruction, target data of other nodes cannot be seen by each node due to the fact that the target data are not directly uploaded, in the transmission process of the data integration instruction, the vacant positions are written in a little, and whether the target data are consistent with each other cannot be accurately judged through the encryption result of the spliced part of the other nodes by each node, so that the target data cannot be leaked in the process; and finally, after the written data integration instruction is broadcasted, whether the spliced encryption result is consistent with the self encryption result is judged, abnormal nodes can be judged by publicly counting inconsistent positions marked by all nodes, and the output result of normal nodes is generally consistent, so that the judgment result cannot be influenced even if the abnormal nodes are not honest, further, wrong data is eliminated, the data is confirmed and checked, and the data integration is completed.
Through the description of the above embodiments, those skilled in the art will understand that, for convenience and simplicity of description, only the division of the above functional modules is used as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of a specific device is divided into different functional modules to complete all or part of the above described functions.
In the embodiments provided in this application, it should be understood that the disclosed structures and methods may be implemented in other ways. For example, the above-described embodiments with respect to structures are merely illustrative, and for example, a module or a unit may be divided into only one type of logic function, and may have another division manner in actual implementation, for example, multiple units or components may be combined or may be integrated into another structure, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, structures or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed to a plurality of different places. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or partially contributed to by the prior art, or all or part of the technical solutions may be embodied in the form of a software product, where the software product is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.