CN106936572B

CN106936572B - Safe data matching method and system

Info

Publication number: CN106936572B
Application number: CN201710212664.8A
Authority: CN
Inventors: 白硕; 刘乐; 李中伟; 涂惠燕; 陈涛; 张文海
Original assignee: Lianshi Shanghai Information Technology Co ltd; Shanghai Lishen Information Technology Co ltd
Current assignee: Chuyuan (Shanghai) Information Technology Co.,Ltd.; SHANGHAI LISHEN INFORMATION TECHNOLOGY Co.,Ltd.
Priority date: 2017-04-01
Filing date: 2017-04-01
Publication date: 2020-10-27
Anticipated expiration: 2037-04-01
Also published as: CN106936572A

Abstract

The invention relates to a safe data matching method and a system, the method is directly carried out on two parties A and B which need data exchange, the two parties A and B firstly carry out Hash operation on original data to be exchanged with respective numbers, then generate a secret key respectively through operation, if values obtained after A, B crossed two-round modular exponentiation operation of the original data to be matched are equal, the original data corresponding to the values can be judged to be equal, and finally the matched original data are obtained through respective numbers. The data matching method of the invention does not need to use a third party, directly completes the data matching work between the A and B parties through the key negotiated by the two parties, and can ensure the data safety of the two parties in the data matching process; after the data matching is completed, only intersection data which are successfully matched can be obtained respectively, but any data outside the intersection cannot be known, so that the method has the data security characteristics required by practical application.

Description

Safe data matching method and system

Technical Field

The invention relates to the technical field of computer encryption, in particular to a secure data matching method and a secure data matching system.

Background

In real life, the situation that two websites log in or jointly popularize on the basis of a common registered account (such as an email and a mobile phone number) often occurs. This requires that both parties be able to find the intersection of the registered accounts. In the process of obtaining the intersection, both parties cannot give the full data of the parties to each other. In a common method, both parties submit the full amount of data to a trusted third party by virtue of the trusted third party, and the intersection is obtained by the third party and then returned to both parties. This creates several problems:

1. using the services of a trusted third party would result in a large economic cost.

2. The full data of both parties are submitted to a third party, and the risk of leakage exists.

3. A third party may use both parties' data for analysis and for other purposes.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and designs a safe data matching method and a system thereof.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a secure data matching method directly occurs in two parties A and B needing data exchange, the two parties A and B firstly carry out Hash operation on original data to be exchanged with respective numbers, then generate a secret key respectively through operation, if values obtained after A, B crossed two-round modular exponentiation operation of result data after the Hash operation are equal, the original data corresponding to the values can be judged to be equal, and finally the matched original data are obtained through respective numbers.

Further, the method comprises the following detailed steps:

the method comprises the steps that firstly, original data are numbered and subjected to Hash operation, both sides A and B respectively number the original data owned by the both sides one by one, then the original data numbered by the both sides A and B respectively are subjected to Hash operation one by one, the Hash algorithms adopted by the both sides A, B for Hash operation are the same, the original data are to-be-matched data, and the data obtained by Hash operation are Hash data;

secondly, generating a key, wherein A, B two parties negotiate to select two 1024-bit prime numbers p and q, calculate N = p q, A, B then independently select a 1024-bit integer d1 and d2 as keys respectively, so that d1, d2 and (p-1) (q-1) are mutually prime, and the keys are respectively stored;

step three, performing cross operation, wherein A and B perform first round modular exponentiation by respectively using the hash data and the secret key obtained in the step one, and sending the calculated result to the other party, A and B perform second round modular exponentiation by using the received first round modular exponentiation result of the other party and the respective secret key, and then perform hash operation on the result of the second round modular exponentiation, A, B store the result of the respective hash operation locally and send the result to the other party;

step four, data comparison, wherein A and B respectively compare the local data obtained by the Hash operation in the step three with the received data of the other side, and if the values are equal, the original data to be matched corresponding to the values are equal;

in the fifth step, A, B both obtain the equivalent original data from the fourth step from their respective data sets.

Further, the hash algorithm in the first step is sha 3.

Further, p and q in the second step are two large prime numbers which are not identical.

Further, the cross operation in the third step is divided into four sub-steps:

step 5.1: a and B respectively perform a first round of modular exponentiation: h x d ≡ c 1mod N, where x denotes a power operation, h denotes the hash data obtained in the first step, d denotes the keys held by A, B, i.e., d1 and d2 in the second step, and N is the value obtained by p x q calculation in the second step;

step 5.2: a and B respectively send data c1 calculated in the first round of modular exponentiation to the other party;

step 5.3: after receiving the data c1 sent by the other party, a and B respectively perform a second round of modular exponentiation: c1 x d ≡ c2mod N, where d denotes the keys that A, B hold respectively, i.e. d1 and d2 in the second step, and N is the value calculated by p x q in the second step;

step 5.4: and A and B respectively carry out hash operation on c2 calculated in the second round of modular exponentiation operation again, and respectively store the operation results locally and send the operation results to the opposite side.

A safe data matching system comprises a participant A and a participant B for data matching, wherein the participant A comprises a data input unit for inputting data to be matched of the participant A, and the data to be matched is database data or CSV files;

the data numbering unit is used for numbering the user data A stored in the data input unit one by one;

the data hash unit is used for carrying out hash operation on the numbered A user data to obtain A hash data;

the key negotiation unit is used for negotiating out a large prime number commonly used by the A and the B and independently selecting out an operation key;

the modular exponentiation local unit is used for performing a first round of modular exponentiation on the hash data of the A and transmitting the data after the modular exponentiation to the modular exponentiation remote unit of the other party through a network;

the modular exponentiation remote unit is used for performing a second round of modular exponentiation on data received from the modular exponentiation local unit of the opposite party, performing hash operation on a result of the second round of modular exponentiation, and storing the hash operation result locally and sending the hash operation result to the opposite party;

the result processing unit is used for matching the local data with the data received from the remote modular exponentiation unit of the opposite party to obtain intersection data;

the data number reduction unit is used for correspondingly finding out original data corresponding to the number in own user data according to the number carried by the intersection data;

the data output unit is used for outputting the original data searched by the data number restoration unit; the structure of party B is the same as that of party a.

A safe data matching system is characterized by comprising a participant A and a participant B for data matching, wherein the participant A comprises a data input unit for inputting data to be matched of the participant A, and the data to be matched is database data or CSV files;

and the hardware token is used for negotiating out a large prime number commonly used by the A and the B and independently selecting out an operation key. Then carrying out a first round of modular exponentiation on the Hash data of A, transmitting the data after modular exponentiation to the hardware token of the opposite party through a network, receiving the data after the first round of modular exponentiation of the opposite party from the hardware token of the opposite party through the network, finally carrying out a second round of modular exponentiation on the data received from the hardware token of the opposite party, carrying out Hash operation on the result of the second round of modular exponentiation, and storing the result of the Hash operation in the local and transmitting the result of the Hash operation to the opposite party by A;

the result processing unit is used for matching the local data with the data received from the hardware token of the opposite side to obtain intersection data;

The invention has the following positive beneficial effects: the data matching method of the invention does not need to use a third party, and A and B can directly complete the data matching between the two parties through the key negotiated by the two parties; in the data matching process, the data safety of both sides can be ensured; after the data matching is completed, only intersection data which are successfully matched can be obtained respectively, but any data outside the intersection cannot be known; and the special digital attack resistance is strong.

Drawings

Fig. 1 is a schematic structural diagram of a data matching system according to the present invention.

Fig. 2 is a second schematic structural diagram of the data matching system of the present invention.

Detailed Description

In order that the objects, aspects and advantages of the invention will become more apparent, the invention will be described by way of example only, and in connection with the accompanying drawings. It is to be understood that such description is merely illustrative and not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.

First embodiment, the present invention is described with reference to fig. 1, and the secure data matching method of the present invention directly occurs in the parties a and B that need to exchange data, and the present invention includes the following steps:

firstly, a participant A numbers each piece of data in a data set T1 owned by the participant A, and a participant B numbers each piece of data in a data set T2 owned by the participant B;

let data set T1= { T11, T12, T13, … }, where T11, T12, T13, … are data in data set T1, M1= { M1| M1 is a unique number of T, T e T1}, and each piece of data T1 in data set T1 owned by a corresponds to a unique number M1;

data set T2= { T21, T22, T23, … }, where T21, T22, T23, … are data in the second matching data set, M2= { M2| M2 is a unique number of T, T e T2}, and each piece of data T2 in B-owned data set T2 corresponds to a unique number M2.

Then, a hash operation is performed on each numbered data in the data set T1 to obtain a data set H1, where the data set H1= { H | H = hash (T), and T ∈ T1 }; b, carrying out hash operation on each piece of numbered data in the data set T2 to obtain a data set H2, wherein the data set H2 is = { H | H = Hash (T), and T belongs to T2 }; wherein the Hash algorithm may take sha 3.

A and B negotiate together to generate a key, and the specific steps are as follows: a and B negotiate to determine two 1024-bit large prime numbers p and q, let N = p × q; a and B respectively select an integer d1, d2 which is prime to (p-1) × (q-1), wherein d1 and d2 are also 1024 bits, A stores d1, B stores d2, and p and q are two different large prime numbers;

then, two rounds of crossed modular exponentiation operation are performed between A and B, and the specific steps are as follows: a performs modular exponentiation on data in a data set H1 according to N, d1 to obtain c1 by H1 x d1 ≡ c 1mod N, and transmits a number set (m1, c 1) to B, wherein m1 is a number corresponding to the data in a data set T1;

b performs modular exponentiation H2 x d2 ≡ c2mod N on the data in the data set H2 according to N, d2 to obtain c2, and transmits a number set (m2, c2) to A, wherein m2 is a number corresponding to the data in the data set T2;

step 2.3: a performs modular exponentiation operation c2 x d1 ≡ f 1mod N on the received number set (m2, c2) to obtain f1 with the number m2, performs hash operation on f1 to obtain g1, at the moment, g1 carries the number m2, and A saves and transmits (m2, g 1) to B;

b performs modular exponentiation operation c1 x d2 ≡ f 2mod N on the received number set (m1, c 1) to obtain f2 with the number m1, performs hash operation on f2 to obtain g2, at the moment, g2 carries the number m1, and B saves and transmits (m1, g 2) to A;

at this time, (m2, g 1) and (m1, g 2) are stored in a, the intersection of (m2, g 1) and (m1, g 2) is obtained by a, whether g1 is the same as g2 exists in (m1, g 2) or not is judged, and if g1 is the same as g2, the data t1 corresponding to the number m1 carried by g2 is the data to be matched with A and B; b also stores (m2, g 1) and (m1, g 2), B finds intersection of (m2, g 1) and (m1, g 2), and determines whether g1 is the same as g2 exists in (m2, g 1), if g1 is the same as g2, data t2 corresponding to number m2 carried by g1 is data to be matched with B.

The safe data matching system comprises parties A and B for data matching, wherein the structure of the party B is the same as that of the party A. The participators A and B comprise a data input unit, a data numbering unit, a data hashing unit, a key negotiation unit, a modular exponentiation local unit, a modular exponentiation remote unit, a result processing unit, a data numbering reduction unit and a data output unit.

The data input unit is used for inputting A, B data to be matched, and the data to be matched is database data or a CSV file;

specifically, the data input unit 11 and the data input unit 21 respectively read data user data from the data sets of the participant a and the participant B, the input data may be a database or a CSV file, and the input user data is converted into data of one strip according to a certain encoding method.

For example, the data to be matched is composed of an e-mail and a mobile phone number of the user, the data is connected in a character string connection mode, the two original records of user @ user. com and 13988889999 are combined into one user @ user. com-13988889999, and the participant A and the participant B adopt the same coding method.

By such a transcoding, the user data of each participant becomes a single data record t. Assume that the data set generated by the participant a from the data input unit 11 is T1= { T11, T12, T13, … }, and the data set generated by the participant B from the data input unit 21 is T2= { T21, T22, T23, … }.

A data numbering unit for numbering the A, B user data stored in the data input unit item by item;

specifically, for example, a piece of data T11 is taken from the data set T1 of participant a, and T11 is numbered, assuming that the number is m11, the number m11 can be obtained by any encoding method and is used for uniquely identifying the position of T11 in T1, the number cannot reflect the content of the piece of data T11, the simplest numbering method is represented by a sequence of increasing integers according to the data input order, the number of the first piece of data is 1, the number of the second piece of data is 2, and so on. Thus, after a piece of data in the data input unit is processed by the data numbering unit, the data set becomes a number pair. The data for participant a and participant B are numbered individually and are not related.

Suppose that participant a numbers by the data number unit 21 to obtain a data set M1= { (M11, t11), (M12, t12), (M13, t13) …. M represents a number, and t represents data }, and participant B numbers by the data number unit 22 to obtain a data set M2= { (M21, t21), (M22, t22), (M23, t23) …. M represents a number, and t represents data }.

suppose that the hash operation sha3(t11) corresponding to a piece of data t11 of the data set M1 of participant a results in h11, h11= sha3(t11), h12= sha3(t12), and so on, so that the data number pair becomes (M11, h11), (M12, h12) …. Thus, the result data set of participant a passing through the data hash unit 13 is H1= { (m11, H11), (m12, H12), (m13, H13) … }, and the result data set of participant B passing through the data hash unit 23 is H2= { (m21, H11), (m22, H22), (m23, H23) … }.

The key negotiation unit is used for negotiating out a large prime number commonly used by the A and the B and respectively and independently obtaining a key;

specifically, suppose that a participant a and a participant B each select an english phrase, such as A1 and a2, and send to each other through the network, and after receiving the phrases of the other, the phrases are concatenated to obtain A1a2, and this concatenated phrase is used as a seed of the generation algorithm. Since the same algorithm is used by both participant one and participant two, common prime numbers p and q can be obtained using the same phrase. In this example we use p and q both as prime numbers of 1024 bits. After both party a and party B have negotiated prime numbers p and q, party a calculates a large number N = p × q and a large number r = (p-1) × (q-1), which are used to produce the key used below. Participant a selects a 1024-bit integer d1 such that d1 and r are interdependent.

Party B may choose d2 by the same algorithm. The result is that the participating parties individually select an integer d1, d2 (both 1024 bits), so that d1, d2 and r are coprime, and the secure storage d1, d2 and the large integer N are provided.

the modular exponentiation local module 15 takes the data pairs (m11, H11) from H1= { (m11, H11), (m12, H12), (m13, H13) … } in the dataset, computes H1 ≡ d1 ≡ c11 mod N for H11, and the final result is c 11. The data pairs in H1 are subjected to modular exponentiation in turn, resulting in c12, c13 …, so that the data set generated by participant a via the modular exponentiation local unit is L1= { (m11, c11), (m12, c12), (m13, c13) … }. The resulting data set L1 is then sent to the modular exponentiation remote unit 26 of participant B over the network S2, awaiting further action.

According to the same process, the local module 25 for modular exponentiation of participant B performs modular exponentiation on the data in the data set H2= { (m21, H11), (m22, H22), (m23, H23) … } in turn, so as to obtain a new data set L2= { (m21, c21), (m22, c22), (m23, c23) … }. The data set L2 is then sent to the participant-one modular exponentiation remote unit 16 over the network S3, awaiting further operation.

the modular exponentiation remote unit 16 takes the data pair (m21, c21) from L2 and calculates c21 x d1 ≡ f21mod N for c21, the final result being f 21. Then f21 is hashed again, using the sha3 hash function, which results in g21= sha3(f 21). The data in L2 is operated on in turn, so that the modular exponentiation remote unit 16 of participant one gets a new data set G2= { (m21, G21), (m22, G22), (m23, G23) …. The data set G2 is then sent over the network S4 to the results processing unit 27 of participant two, while also maintaining a copy on the local path S6.

According to the same procedure, the modular exponentiation remote unit 26 of participant B performs modular exponentiation and sha3 hash operations once on the elements in the data set L1= { (m11, c11), (m12, c12), (m13, c13) … } to obtain a new data set G1= { (m11, G11), (m12, G12), (m13, G13) … } which is then sent to the result processing unit 17 of participant one through the network S5, while a copy is also saved on the local path S7.

the result processing unit 17 of the participant a uses the data set G1= { (m11, G11), (m12, G12), (m13, G13) … } transmitted through the network S5, and G2= { (m21, G21), (m22, G22), (m23, G23) … } stored in the local S6, where m denotes a number and G is data. The result processing unit 27 of participant B uses the data set G2= { (m21, G21), (m22, G22), (m23, G23) … } transmitted via the network S4 and G1= { (m11, G11), (m12, G12), (m13, G13) … } stored locally in S7. Participant a and participant B now have the same data sets G1 and G2.

The result processing unit 17 of participant a extracts the data component in G1, resulting in G1 '= { G11, G12, G13 … }, and also extracts the data component in G2, resulting in G2' = { G21, G22, G23 … }. Then, by performing intersection operation on G1 'and G2', a new data set G = { G1, G2, g3.. } can be obtained, and data in this data set simultaneously appears in G1 'and G2', that is, intersection is formed.

Following the same procedure, the result processing unit 27 of participant B operates on G1 and G2, and can obtain the same data set G = { G1, G2, g3..

after the data intersection G = { G1, G2, g3.. } is obtained by the data number reduction unit 18, G1, G2, g3. may be found from the G1 data set, and the corresponding data numbers M1, M2, M3.. since the number data in G1 is generated in the data number unit 12, the originally encoded data set T = { T1, T2, T3 … } may be found from the data set M1= { (M11, T11), (M12, T12), (M13, T13) …. } in the second step according to the numbers.

According to the same process, after the data number restoring unit 28 obtains the data intersection G, the same original encoded data set T = { T1, T2, T3 … } can be obtained.

And the data output unit is used for outputting the original data searched by the data number reduction unit.

After acquiring the data intersection T = { T1, T2, T3 … }, the data output unit 19 and the data output unit 29 may output the original user matching data according to the elements in T by taking reverse operations according to the encoding of the data input units 11 and 21, and the final result may be output to a database or a file.

Second embodiment, referring to fig. 2, the secure data matching system of the present invention includes parties a and B performing data matching, where the structure of party B is the same as that of party a. The participant A comprises a data input unit, a data numbering unit, a data hashing unit, a hardware token, a result processing unit, a data numbering reduction unit and a data output unit. The functions of the data input unit, the data numbering unit, the data hashing unit, the result processing unit, the data number restoring unit and the data output unit are the same as those of the first embodiment, and are not described in detail again.

The hardware token is used to replace the key agreement unit, the local unit of modular exponentiation, and the remote unit of modular exponentiation in the first embodiment. The hardware token can optimize the speed and hide the secret key, so that the safety of the system can be improved. The hardware token is used for negotiating an operation key common to the A and the B, then performing first round modular exponentiation on the Hash data of the A, transmitting the data after the modular exponentiation operation to the hardware token of the opposite side through a network, receiving the data after the first modular exponentiation operation of the opposite side from the hardware token of the opposite side, finally performing second round modular exponentiation operation on the data received from the hardware token of the opposite side, then performing Hash operation on the result of the second round modular exponentiation operation, and storing the result of the Hash operation in the local and transmitting the result of the Hash operation to the opposite side by the A.

In this embodiment, during the process of negotiating the key, the participating party a and the participating party B must use the same batch token. The tokens in the same batch are initialized by the same random phrase, a key d and a large integer N are generated in the tokens, the following modules can utilize the built-in key in the tokens, meanwhile, the tokens are provided with an operation unit, large integer data can be input into h to be operated h x d f mod N, and a large integer f is output.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention and not to limit it; although the present invention has been described in detail with reference to preferred embodiments, those skilled in the art will appreciate that; modifications to the specific embodiments of the invention or equivalent substitutions for parts of the technical features may be made; without departing from the spirit of the present invention, it is intended to cover all aspects of the invention as defined by the appended claims.

Claims

1. A safe data matching method is characterized in that the method directly occurs between two parties A and B needing data exchange, the two parties A and B firstly carry out Hash operation on original data to be exchanged with respective numbers, then generate a secret key respectively through operation, if values obtained after A, B crossed two-round modular exponentiation operation of result data after the Hash operation are equal, the original data corresponding to the values are judged to be equal, and finally the matched original data are obtained through respective numbers;

the method comprises the following detailed steps:

the cross operation is divided into four sub-steps:

step 5.3: after receiving the data c1 sent by the other party, a and B respectively perform a second round of modular exponentiation: c1 x d ≡ c2mod N, where d denotes the keys held by A, B, i.e. d1 and d2 in the second step, and N is the value calculated by p x q in the second step;

step 5.4: a and B respectively carry out hash operation on c2 calculated in the second round of modular exponentiation operation again, and respectively store the operation results locally and send the operation results to the opposite side;

in the fifth step, A, B both obtain the equivalent original data obtained in the fourth step from their respective data sets by numbering.

2. The secure data matching method of claim 1, wherein the hash algorithm in the first step is sha 3.

3. A secure data matching method according to claim 1, wherein p and q in said second step are two large prime numbers which are not identical.