CN114817943A - Data matching method, device, equipment and medium - Google Patents

Data matching method, device, equipment and medium Download PDF

Info

Publication number
CN114817943A
CN114817943A CN202210191650.3A CN202210191650A CN114817943A CN 114817943 A CN114817943 A CN 114817943A CN 202210191650 A CN202210191650 A CN 202210191650A CN 114817943 A CN114817943 A CN 114817943A
Authority
CN
China
Prior art keywords
vector
encrypted
data
target
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210191650.3A
Other languages
Chinese (zh)
Inventor
刘红宝
高鹏飞
郑建宾
佘萧寒
邱震尧
周雍恺
程栋
赵庆杭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unionpay Co Ltd
Original Assignee
China Unionpay Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unionpay Co Ltd filed Critical China Unionpay Co Ltd
Priority to CN202210191650.3A priority Critical patent/CN114817943A/en
Publication of CN114817943A publication Critical patent/CN114817943A/en
Priority to PCT/CN2022/112616 priority patent/WO2023159888A1/en
Priority to TW111135467A priority patent/TWI835300B/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/008Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0861Generation of secret information including derivation or calculation of cryptographic keys or passwords
    • H04L9/0863Generation of secret information including derivation or calculation of cryptographic keys or passwords involving passwords or one-time passwords

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Bioethics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Radar Systems Or Details Thereof (AREA)

Abstract

The application discloses a data matching method, a device, equipment and a medium, wherein first data and second data are respectively input into a vector conversion model which is trained in advance to obtain a corresponding first vector and a second vector, the distance between the first vector and the second vector after a first target public key is encrypted is obtained, the target distance between the first vector and the second vector is determined based on the encrypted distance and a first target private key, whether the two data are matched or not is determined based on the target distance, when the two data are not identical, fuzzy matching of the data can be realized, using scenes are widened, a first target public and private key pair is introduced in the fuzzy matching process to respectively perform homomorphic encryption and decryption, the safety of the matching process is ensured, and in the whole matching process, the data do not leave the corresponding equipment in the form of original data, the fuzzy matching can be realized without the raw data being exported, and the safety of the matching process is ensured.

Description

Data matching method, device, equipment and medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a data matching method, apparatus, device, and medium.
Background
The current privacy computing technology is mainly applied to security traffic and federal learning. The safety intersection is used for identifying the intersection of data of two parties, for example, common users of an organization A and an organization B are identified, wherein the safety intersection is also the first step of longitudinal federal learning, namely, the safety intersection is firstly carried out on key information such as a mobile phone number, an identity card number, a business license number and the like, and then the next step of joint modeling and the like are carried out.
In the related art, in order to identify the intersection of the two data or achieve the matching of the two data, common secure intersection algorithms include a secure intersection algorithm based on an RSA encryption algorithm, and a secure intersection algorithm based on an Oblivious Transfer (OT) protocol, and the like. However, the current secure intersection algorithm can only succeed in matching when the data of both parties are completely the same, that is, when the data types of both parties and the number of characters included in the data are completely the same. However, in actual service, there are often many usage scenarios to be matched when data are not identical, so the security intersection algorithm in the prior art greatly limits the usage scenarios, and affects the service range of matching.
Disclosure of Invention
The application provides a data matching method, a data matching device, data matching equipment and a data matching medium, which are used for solving the problems that a safety intersection algorithm in the prior art can only carry out safety intersection when data of two parties are completely the same, a use scene is limited, and the service range of data matching is influenced.
The application provides a data matching method, which is applied to first equipment and comprises the following steps:
inputting first data to be matched into a vector conversion model which is trained in advance to obtain a first vector corresponding to the first data;
homomorphic encryption is carried out on the first vector by adopting a first target public key generated by the self to generate a first encryption vector, and the first target public key is sent to second equipment;
obtaining the distance between the first vector and the second vector after encryption determined based on the first encryption vector and the second encryption vector, wherein the second encryption vector is obtained by homomorphically encrypting the second vector by adopting the first target public key; the second vector is obtained by inputting second data into a vector conversion model which is trained in advance in the second equipment;
determining the target distance of the first vector and the second vector based on the distance between the first vector and the second vector after encryption and a first target private key corresponding to the first target public key, and determining whether the first data and the second data are matched based on the target distance and a preset first distance threshold.
The application provides a data matching method, which is applied to second equipment and comprises the following steps:
inputting second data to be matched into a vector conversion model which is trained in advance to obtain a second vector corresponding to the second data;
receiving a first target public key sent by first equipment, and homomorphically encrypting the second vector by adopting the first target public key to generate a second encrypted vector;
acquiring a target distance between a first vector and a second vector determined based on a first encryption vector and the second encryption vector, wherein the first encryption vector is obtained by encrypting the first vector by using a first target public key, and the first vector is obtained by inputting first data into a vector conversion model which is trained in advance in the first device;
and determining whether the first data and the second data are matched or not based on the target distance and a preset first distance threshold.
The present application also provides a data matching device, the device including:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for inputting first data to be matched into a vector conversion model which is trained in advance to obtain a first vector corresponding to the first data;
the first processing module is used for homomorphically encrypting the first vector by adopting a first target public key generated by the first processing module to generate a first encrypted vector and sending the first target public key to the second equipment;
the first obtaining module is further configured to obtain a distance between the first vector and a second vector after encryption, where the distance is determined based on the first encrypted vector and the second encrypted vector, and the second encrypted vector is obtained by homomorphically encrypting the second vector by using the first target public key; the second vector is obtained by inputting second data into a vector conversion model which is trained in advance in the second equipment;
a first determining module, configured to determine a target distance between the first vector and the second vector based on the encrypted distance between the first vector and the second vector and a first target private key corresponding to the first target public key, and determine whether the first data and the second data match based on the target distance and a preset first distance threshold.
The present application also provides a data matching device, the device including:
the second acquisition module is used for inputting second data to be matched into a vector conversion model which is trained in advance to obtain a second vector corresponding to the second data;
the second processing module is used for receiving a first target public key sent by the first equipment and homomorphically encrypting the second vector by adopting the first target public key to generate a second encrypted vector;
the second obtaining module is further configured to obtain a target distance between a first vector and a second vector that is determined based on a first encrypted vector and the second encrypted vector, where the first encrypted vector is obtained by encrypting the first vector using the first target public key, and the first vector is obtained by inputting first data into a vector transformation model that is trained in advance in the first device;
and the second determining module is used for determining whether the first data and the second data are matched or not based on the target distance and a preset first distance threshold.
The present application further provides an electronic device comprising a processor for implementing the steps of the data matching method as described in any one of the above when executing a computer program stored in a memory.
The present application also provides a computer-readable storage medium storing a computer program executable by a terminal, which when run on the terminal, causes the terminal to perform the steps of any of the data matching methods described above.
The present application further provides an electronic device comprising a processor configured to implement the steps of the data matching method as described in any one of the above when executing a computer program stored in a memory.
The present application also provides a computer-readable storage medium storing a computer program executable by a terminal, which when run on the terminal, causes the terminal to perform the steps of any of the matching methods described above.
In the application, first data to be matched is input into a vector conversion model which is trained in advance to obtain a first vector corresponding to the first data, the first vector is homomorphically encrypted by a first target public key generated by the first data to generate a first encrypted vector, the first target public key is sent to second equipment, the distance between the encrypted first vector and a second vector which is determined based on the first encrypted vector and the second encrypted vector is obtained, the second encrypted vector is obtained by homomorphically encrypting the second vector by the first target public key, the second vector is obtained by inputting second data into the vector conversion model which is trained in advance in the second equipment, the target distance between the first vector and the second vector is determined based on the distance between the encrypted first vector and the second vector and a first target private key corresponding to the first target public key, and the target distance between the first vector and the second vector is determined based on the target distance and a preset first distance threshold, it is determined whether the first data and the second data match. In the embodiment of the application, the first data and the second data to be matched are respectively input into a vector conversion model which is trained in advance, a first vector corresponding to the first data and a second vector corresponding to the second data are obtained, a first encryption vector encrypted based on the first vector and a second encryption vector encrypted based on the second vector are obtained, the distance between the first vector and the second vector is determined, the target distance between the first vector and the second vector is determined based on the distance between the first vector and the second vector and a first target private key generated by the first vector and the second vector, whether the first data and the second data are matched or not is determined based on the target distance and a preset first distance threshold, namely when the first data and the second data are not completely identical, fuzzy matching of the first data and the second data can be realized, and the use scene is widened, and the first target public key and the first target private key are introduced to perform homomorphic encryption and decryption respectively in the fuzzy matching process, so that the safety intersection is realized, the safety of the matching process is ensured, in the whole matching process, the first data and the second data do not leave the corresponding first equipment and second equipment in the form of original data, the fuzzy matching can be realized even if the original data do not leave the library, and the safety of the matching process is further ensured.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a schematic diagram of a data matching process provided in an embodiment of the present application;
FIG. 2a is a schematic illustration of a display of a target sub-distance according to some embodiments of the present disclosure;
FIG. 2b is a schematic diagram illustrating a display of a target sub-distance matrix according to some embodiments of the present disclosure;
FIG. 3a is a schematic illustration of another example of a display of a sub-range of a target according to some embodiments of the present disclosure;
FIG. 3b is a schematic diagram illustrating another example of a display of a target sub-distance matrix according to some embodiments of the present disclosure;
fig. 4 is a schematic process diagram of a data matching method according to an embodiment of the present application;
fig. 5a is a schematic diagram of a process for obtaining a vector corresponding to text type data according to some embodiments of the present application;
FIG. 5b is a schematic diagram of a process for obtaining a vector corresponding to digital type data according to some embodiments of the present application;
fig. 6 is a schematic diagram of an overall process of fuzzy matching between two parties according to some embodiments of the present application;
fig. 7 is a schematic diagram of a specific process for fuzzy matching between two parties according to some embodiments of the present application;
FIG. 8 is a schematic diagram of a data matching apparatus according to some embodiments of the present application;
FIG. 9 is a schematic diagram of a data matching apparatus according to some embodiments of the present application;
fig. 10 is a schematic structural diagram of an electronic device according to some embodiments of the present application;
fig. 11 is a schematic structural diagram of an electronic device according to some embodiments of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In order to ensure that matching can be performed when data of two parties are not identical and to widen a service range of data matching, embodiments of the present application provide a data matching method, apparatus, device, and medium.
In the application, first data to be matched is input into a vector conversion model which is trained in advance to obtain a first vector corresponding to the first data, the first vector is homomorphically encrypted by a first target public key generated by the first data to generate a first encrypted vector, the first target public key is sent to second equipment, the distance between the encrypted first vector and a second vector which is determined based on the first encrypted vector and the second encrypted vector is obtained, the second encrypted vector is obtained by homomorphically encrypting the second vector by the first target public key, the second vector is obtained by inputting second data into the vector conversion model which is trained in advance in the second equipment, the target distance between the first vector and the second vector is determined based on the distance between the encrypted first vector and the second vector and a first target private key corresponding to the first target public key, the target distance between the first vector and the second vector is determined based on the target distance and a preset first distance threshold, it is determined whether the first data and the second data match.
Example 1:
fig. 1 is a schematic diagram of a data matching process provided in an embodiment of the present application, where the process includes the following steps:
s101: inputting first data to be matched into a vector conversion model which is trained in advance to obtain a first vector corresponding to the first data.
The data matching method provided by the embodiment of the application is applied to first equipment, and the first equipment can be intelligent terminals, PCs or servers and other equipment.
In order to ensure that fuzzy matching can be achieved even when data of the two devices are not identical, in the embodiment of the present application, a vector transformation model which is trained in advance is deployed in the first device, the vector transformation model which is trained in advance is used for obtaining a vector corresponding to data to be matched, and for different data, dimensions of vectors output by the vector transformation model which is trained in advance are identical.
In order to obtain a first vector corresponding to first data to be matched, the first data is input into a vector conversion model which is trained in advance, the vector conversion model which is trained in advance outputs the first vector corresponding to the first data, each component in the first vector is a number, and the first data is quantized through the vector conversion model which is trained in advance.
S102: and homomorphically encrypting the first vector by adopting a first target public key generated by the device to generate a first encrypted vector, and sending the first target public key to second equipment.
In the embodiment of the application, in order to improve security, a first device generates a first target public and private key pair, where the first target public and private key pair includes a first target public key and a first target private key, and encrypts a first vector according to the first target public key generated by the first device to generate a first encryption vector. Wherein, this public private key of first target pair can be the public private key pair of symmetry, also can be asymmetric public private key pair, and is specific, can set up public private key pair of target according to the demand.
The process of generating the first target public-private key pair is prior art, and is not described herein in detail.
Since the second data to be matched with the first data is obtained by the second device, in order to facilitate subsequent determination of a target distance between the first vector corresponding to the first data and the second vector corresponding to the second data, in this embodiment of the present application, the first device further sends the first target public key to the second device, so that the second device may perform homomorphic encryption on the second vector corresponding to the second data according to the first target public key to generate the second encrypted vector. Specifically, in this embodiment of the present application, when homomorphic encrypting the first vector and the second vector, each component in the first vector and each component in the second vector are homomorphic encrypted based on the first target public key, so as to obtain a first encrypted vector and a second encrypted vector.
S103: obtaining the distance between the first vector and the second vector after encryption determined based on the first encryption vector and the second encryption vector, wherein the second encryption vector is obtained by homomorphically encrypting the second vector by adopting the first target public key; the second vector is obtained by inputting second data into a vector conversion model which is trained in advance and completed in the second equipment.
In order to implement fuzzy matching of the first data and the second data, the second device is also deployed with a vector transformation model which is trained in advance and used for obtaining a second vector corresponding to the second data to be matched, that is, the second data is input into the vector transformation model which is trained in advance, the vector transformation model which is trained in advance outputs the second vector corresponding to the second data, and the second device homomorphically encrypts the second vector based on the received first target public key sent by the first device to obtain a second encrypted vector.
Since the first and second vectors are encrypted by using the first target public key generated by the first device, in order to determine the target distances of the first and second vectors, in this embodiment of the present application, the first device may first receive the second encrypted vector sent by the second device, decrypt the second encrypted vector based on the first target private key in the first target public and private key pair generated by the first device itself, obtain the second vector, and determine the target distances of the first and second vectors based on the first and second vectors.
In order to improve security, in the embodiment of the present application, in order to determine the target distance between the first vector and the second vector, the distance between the first vector and the second vector after encryption determined based on the first encrypted vector and the second encrypted vector may be obtained first. The distance between the encrypted first vector and the encrypted second vector is not a determined numerical value, but a determined expression that the target distance between the first vector and the second vector can be obtained only after decryption, wherein the distance between the encrypted first vector and the encrypted second vector can be determined by the first device, or can be determined by the second device and then sent to the first device.
S104: determining the target distance of the first vector and the second vector based on the distance between the first vector and the second vector after encryption and a first target private key corresponding to the first target public key, and determining whether the first data and the second data are matched based on the target distance and a preset first distance threshold.
In the embodiment of the present application, in order to determine whether the first data and the second data match, after determining the distance between the encrypted first vector and the encrypted second vector, the distance between the encrypted first vector and the encrypted second vector is decrypted, and the target distance between the encrypted first vector and the encrypted second vector is determined. Since the first encrypted vector and the second encrypted vector are both generated by encrypting the first target public key generated by the first device, in order to determine the target distance of the first vector and the second vector, in this embodiment of the present application, the distance between the encrypted first vector and the second vector may be decrypted according to the first target private key of the first target private key pair generated by the first device, so as to determine the target distance of the first vector and the second vector.
In order to determine whether the first data and the second data match, in the embodiment of the present application, the target distance between the first vector and the second vector and a preset first distance threshold are compared, and according to the comparison result, it is determined whether the first data and the second data match. Wherein the smaller the target distance, the more matched the first data and the second data.
In the embodiment of the application, the first data and the second data to be matched are respectively input into a vector conversion model which is trained in advance, a first vector corresponding to the first data and a second vector corresponding to the second data are obtained, a first encryption vector encrypted based on the first vector and a second encryption vector encrypted based on the second vector are obtained, the distance between the first vector and the second vector is determined, the target distance between the first vector and the second vector is determined based on the distance between the first vector and the second vector and a first target private key generated by the first vector and the second vector, whether the first data and the second data are matched or not is determined based on the target distance and a preset first distance threshold, namely when the first data and the second data are not completely identical, fuzzy matching of the first data and the second data can be realized, and the use scene is widened, and the first target public key and the first target private key are introduced to perform homomorphic encryption and decryption respectively in the fuzzy matching process, so that the safety intersection is realized, the safety of the matching process is ensured, in the whole matching process, the first data and the second data do not leave the corresponding first equipment and second equipment in the form of original data, the fuzzy matching can be realized even if the original data do not leave the library, and the safety of the matching process is further ensured.
Example 2:
for determining a first vector corresponding to first data, on the basis of the foregoing embodiment, in this embodiment of the present application, the inputting the first data to be matched into a vector conversion model that is trained in advance, and obtaining the first vector corresponding to the first data includes:
determining a first target data type corresponding to first data to be matched;
determining a pre-finished first target vector conversion model corresponding to the first data according to the first target data type and the corresponding relation between the pre-stored data type and the pre-trained vector conversion model;
and inputting the first data into the first target vector conversion model which is trained in advance to obtain a first vector corresponding to the first data.
In the embodiment of the present application, since the first data to be matched may be text data, for example, the first data may be name, gender, address, and the like, or may also be digital data, for example, the first data may be an identity card number, a bank card number, a reference card number, and the like, the vector conversion models trained in advance for obtaining the corresponding first vectors are different for the first data of different data types.
Specifically, a corresponding relationship between a data type and a vector conversion model which is trained in advance may be stored in the first device, and according to a first target data type corresponding to the acquired first data to be matched, a corresponding vector conversion model which is trained in advance is adopted to acquire a first vector corresponding to the first data; the corresponding pre-trained vector transformation model is the pre-trained first target vector transformation model.
In order to accurately determine a vector transformation model for transforming first data into a first vector, in the embodiments of the present application, if the first target data type is a text type, the corresponding pre-trained first target vector transformation model is a word vector model or a sentence vector model; and if the first target data type is a digital type, the corresponding pre-trained first target vector conversion model is a One-Hot (One-Hot) coding model.
Specifically, if the first data is text data, that is, the first target data type of the first data is a text type, determining a pre-trained first target vector conversion model corresponding to the first target data type according to a correspondence between a pre-stored data type and the pre-trained vector conversion model, where the first target vector conversion model is a word vector model or a sentence vector model, and obtaining a first vector corresponding to the first data based on the pre-trained word vector model or sentence vector model; if the first data is digital data, namely the first target data type of the first data is a digital type, determining a pre-trained first target vector conversion model corresponding to the first target data type according to the corresponding relation between the pre-stored data type and the pre-trained vector conversion model, wherein the first target vector conversion model is a pre-trained One-Hot coding model, and acquiring a first vector corresponding to the first data based on the pre-trained One-Hot coding model.
Taking the dimension of the vector output by the vector conversion model which is trained in advance as 5, taking the vector conversion model which is trained in advance as a word vector model for explanation, if the first data is text data and the first data is the 'Pudong New area clear-sky canteen' in Shanghai City, then the 'Pudong New area clear-sky canteen' in Shanghai city is input into the word vector model which is trained in advance, and the output first vector corresponding to the 'Pudong New area clear-sky canteen' in Shanghai city is (1.0, 2.0, 1.5, 2.0, 3.5).
If the vector conversion model after the pre-training is the One-Hot coding model, the corresponding unique Hot code may be set in advance for each number, for example, if the number includes 0 to 9, in each number of 0 to 9, the unique Hot code corresponding to 0 is 00000000000001, the unique Hot code corresponding to 1 is 0000000010, the unique Hot code corresponding to 2 is 0000000100, the unique Hot code corresponding to 3 is 0000001000, the unique Hot code corresponding to 4 is 0000010000, the unique Hot code corresponding to 5 is 0000100000, the unique Hot code corresponding to 6 is 1000000000, the unique Hot code corresponding to 7 is 0010000000, the unique Hot code corresponding to 8 is 0100000000, and the unique Hot code corresponding to 1000000000. And inputting the digital data into a One-Hot coding model, wherein each first component in a first vector output by the One-Hot coding model is the One-Hot coding of each corresponding digit in the first data.
If the first data is digital data and the digital data is "12345", the "12345" is input to the word vector model trained in advance, and the first vector corresponding to the output "12345" is (0000000010, 0000000100, 0000001000, 0000010000, 0000100000).
When the vector conversion model is trained, each data and a label vector corresponding to the data can be labeled in advance, each data and the corresponding label vector are input into the original vector conversion model, parameters of the original vector conversion model are adjusted according to a prediction vector output by the original vector conversion model and the corresponding label vector, and when a convergence condition is met, the vector conversion model is determined to be trained.
In the embodiment of the application, the fuzzy matching can be realized no matter whether the first data and the second data are digital data or text data, and the scene application is further widened.
Example 3:
in order to determine the distance between the encrypted first vector and the encrypted second vector, on the basis of the foregoing embodiments, in this embodiment of the present application, the obtaining the distance between the encrypted first vector and the encrypted second vector determined based on the first encrypted vector and the second encrypted vector includes:
receiving the second encrypted vector sent by the second device, wherein the second encrypted vector is obtained by homomorphic encryption of the second vector by the second device based on the first target public key;
determining a distance between the encrypted first vector and the second vector based on the first encrypted vector and the second encrypted vector.
In this embodiment, the distance between the first encrypted vector and the second encrypted vector determined by the first device based on the first encrypted vector and the second encrypted vector may be determined by the first device, or may be determined by the second device and sent to the first device.
If the distance between the encrypted first vector and the encrypted second vector is determined by the first device, in order to determine the distance between the encrypted first vector and the encrypted second vector, the first device needs to obtain a second encrypted vector sent by the second device, specifically, in order to obtain the second encrypted vector, after the first device sends the first target public key to the second device, the second device receives the first target public key sent by the first device, encrypts the second vector based on the first target public key to obtain the second encrypted vector, and sends the second encrypted vector to the first device, wherein the second vector is obtained by inputting second data to be matched into a vector conversion model trained in the second device in advance.
And the first equipment receives the second encrypted vector sent by the second equipment, and locally determines the distance between the encrypted first vector and the second vector at the first equipment based on the received second encrypted vector and the first encrypted vector determined by the first equipment.
If each component in the first vector and each component in the second vector are homomorphically encrypted based on the first target public key to obtain a first encrypted vector and a second encrypted vector, in order to determine the distance between the encrypted first vector and the encrypted second vector, in one possible implementation, the first device determines the distance between the encrypted first vector and the encrypted second vector based on the first encrypted vector and the second encrypted vector and a euclidean distance formula. In particular, according to
Figure BDA0003525138050000121
Determining a distance between the encrypted first vector and the second vector, wherein E pka (x i ) For the ith component in the first encrypted vector, E pka (x y ) For the ith component in the second encrypted vector, E pka (d) The distance between the encrypted first vector and the encrypted second vector is N, which is the number of components contained in the first encrypted vector or the second encrypted vector, and the number of components contained in the first encrypted vector is the same as the number of components contained in the second encrypted vector, i.e. the length of the first encrypted vector is equal to the length of the second encrypted vector.
In another possible implementation, the first device may further determine the distance between the encrypted first vector and the encrypted second vector according to the first encrypted vector and the second encrypted vector and a cosine distance formula or a hamming distance formula.
It should be noted that, since each component of the first encrypted vector and the second encrypted vector is homomorphically encrypted by the first target public key, the determined distance between the encrypted first vector and the second vector is not an actual value, but a determined expression.
In the embodiment of the application, the second data does not leave the second device in the form of the original data, so that the fuzzy matching can be realized without the original data being exported, and the safety of the matching process is further ensured.
Example 4:
in order to determine the distance between the encrypted first vector and the encrypted second vector, on the basis of the foregoing embodiments, in this embodiment, the sending the first target public key to the second device includes:
sending the first encrypted vector and the first target public key to the second device;
the obtaining the distance between the first and second encrypted vectors determined based on the first and second encrypted vectors comprises:
and receiving the distance between the first vector and the second vector which are sent by the second device and determined based on the first encryption vector and the second encryption vector after encryption, wherein the second encryption vector is obtained by the second device after homomorphic encryption of the second vector based on the first target public key.
In order to determine the distance between the encrypted first vector and the encrypted second vector, in this embodiment of the application, the distance between the encrypted first vector and the encrypted second vector obtained by the first device may also be determined for the second device and sent to the first device.
Specifically, in order to ensure that the second device can generate the distance between the encrypted first vector and the encrypted second vector, in this embodiment of the present application, when the first device sends the generated first target public key to the second device, the first encrypted vector may also be sent to the second device together, after receiving the first target public key and the first encrypted vector sent by the first device, the second device performs homomorphic encryption on the second vector based on the first target public key to generate the second encrypted vector, the second device determines the distance between the encrypted first vector and the encrypted second vector based on the second encrypted vector and the received first encrypted vector sent by the first device, and sends the distance between the encrypted first vector and the encrypted second vector to the first device, and the first device obtains the distance between the encrypted first vector and the encrypted second vector sent by the second device.
If homomorphic encryption is respectively carried out on each component in the first vector and each component in the second vector based on the first target public key, a first encryption vector and a second encryption vector are obtainedVector, then to determine the distance between the encrypted first and second vectors, in one possible implementation, the second device determines the distance between the encrypted first and second vectors based on the first and second encrypted vectors and the euclidean distance formula. In particular, according to
Figure BDA0003525138050000141
Determining a distance between the encrypted first vector and the second vector, wherein E pka (x i ) For the ith component in the first encrypted vector, E pka (x y ) For the ith component in the second encrypted vector, E pka (d) The distance between the encrypted first vector and the encrypted second vector is N, which is the number of components contained in the first encrypted vector or the second encrypted vector, and the number of components contained in the first encrypted vector is the same as the number of components contained in the second encrypted vector, i.e. the length of the first encrypted vector is equal to the length of the second encrypted vector.
In another possible implementation, the second device may further determine the distance between the encrypted first and second vectors according to the first and second encrypted vectors and a cosine distance formula or a hamming distance formula.
It should be noted that, since each component of the first encrypted vector and the second encrypted vector is homomorphically encrypted by the first target public key, the determined distance between the encrypted first vector and the second vector is not an actual value, but a determined expression.
In the embodiment of the application, the first data does not leave the first device in the form of the original data, so that the fuzzy matching can be realized without the original data being exported, and the safety of the matching process is further ensured.
Example 5:
in order to enable the second device to determine whether the first data and the second data match, on the basis of the foregoing embodiments, the method further includes:
receiving a third encryption vector sent by the second device and a second target public key generated by the second device; the third encryption vector is obtained after the second device adopts the second target public key to homomorphically encrypt the second vector;
homomorphically encrypting the first vector based on the second target public key to generate a fourth encrypted vector;
determining the distance between the encrypted second vector and the encrypted first vector based on the third encrypted vector and the fourth encrypted vector, and sending the distance between the encrypted second vector and the encrypted first vector to the second device, so that the second device decrypts the distance between the encrypted second vector and the encrypted first vector according to the distance between the encrypted second vector and the encrypted first vector and a second target private key corresponding to the second target public key, determines the target distance between the encrypted second vector and the encrypted first vector, and determines whether the first data and the second data are matched according to the target distance between the encrypted second vector and the encrypted first vector and a preset first distance threshold.
In order to enable both the first device and the second device to determine whether the first data and the second data match, in the embodiment of the present application, the second device also needs to obtain the target distances of the first vector and the second vector, that is, the first device needs to implement data synchronization with the second device. The target distance between the first vector and the second vector may be determined by the second device itself, or may be sent to the second device after the first device determines the target distance.
If the target distance between the first vector and the second vector is determined by the second device itself, specifically, the first device may receive a second target public key sent by the second device and a third encrypted vector sent by the second device, where the second target public key is generated by the second device itself, and the third encrypted vector is obtained by homomorphically encrypting the second vector by the second device using the second target public key. And after receiving the second target public key sent by the second equipment, the first equipment performs homomorphic encryption on the first vector based on the second target public key to generate a fourth encrypted vector. In order to determine the distance between the encrypted first vector and the encrypted second vector, in the embodiment of the application, the first device determines the distance between the encrypted first vector and the encrypted second vector based on the third encrypted vector sent by the second device and the fourth encrypted vector generated by the first device, and sending the determined distance between the encrypted first vector and the second vector to the second device, so that the second device can generate a second target private key corresponding to the second target public key according to the received distance between the encrypted first vector and the second target private key generated by the second device, decrypting the encrypted distance between the first vector and the second vector, determining the target distance between the second vector and the first vector, and determining whether the first data and the second data are matched according to the target distance between the second vector and the first vector and a preset first distance threshold.
The following is described with reference to a specific example:
the first equipment inputs first data to be matched into a vector conversion model which is deployed in the first equipment and is trained in advance, a first vector corresponding to the first data is obtained, and the second equipment inputs second data to be matched into the vector conversion model which is deployed in the second equipment and is trained in advance, and a second vector corresponding to the second data is obtained. If the first vector corresponding to the first data U1 is (x1, x2, x3 … …, xm), the second vector corresponding to the second data U2 is (y1, y2, y3 … …, ym).
The first device generates a first target public and private key pair A (pka1, ska1), wherein pka1 is a first target public key, ska1 is a first target private key, homomorphic encryption is performed on a first vector based on the first target public key to generate a first encryption vector, and the first encryption vector corresponding to the first vector (x1, x2, x3 … …, xm) is (E) pka1 (x1),E pka1 (x2),E pka1 (x3)……,E pka1 (xm)) and sends the first encrypted vector and the first target public key to the second device.
After receiving the first target public key and the first encryption vector sent by the first equipment, the second equipment bases on the first target public key and the first encryption vectorAnd the target public key performs homomorphic encryption on the second vector to obtain a second encrypted vector. Specifically, the second encryption vector corresponding to the second vector (y1, y2, y3 … …, ym) is (E) pka1 (y1),E pka1 (y2),E pka1 (y3)……,E pka1 (ym)), the second device determines the distance between the encrypted first vector and the second vector according to the second encrypted vector and the received first encrypted vector, and sends the distance to the first device.
The second device generates a second target public and private key pair B (pka2, ska2), wherein pka2 is a second target public key, ska2 is a second target private key, homomorphic encryption is performed on a second vector based on the second target public key to generate a third encryption vector, and the third encryption vector corresponding to the second vector (y1, y2, y3 … …, ym) is (E1, y2, y3 … …, ym) pka2 (y1),E pka2 (y2),E pka2 (y3)……,E pka2 (ym)), the second device sends the second target public key and the third encrypted vector to the first device.
After receiving the second target public key and the third encrypted vector sent by the second device, the first device performs homomorphic encryption on the first vector based on the second target public key to obtain a fourth encrypted vector, specifically, the fourth encrypted vector corresponding to the first vector (x1, x2, x3 … …, xm) is (E) pka2 (x1),E pka2 (x2),E pka2 (x3)……,E pka2 (xm)). And the first equipment determines the distance between the encrypted first vector and the second vector based on the fourth encrypted vector and the third encrypted vector, and sends the distance between the encrypted first vector and the second vector to the second equipment.
After receiving the distance between the encrypted first vector and the encrypted second vector, the first device decrypts the distance between the encrypted first vector and the encrypted second vector according to a first target private key corresponding to a first target public key generated by the first device, determines the target distance between the first vector and the second vector, and determines whether the first data and the second data are matched according to the target distance between the second vector and the first vector and a preset first distance threshold.
And after receiving the distance between the encrypted first vector and the second vector, the second device decrypts the distance between the encrypted first vector and the second vector according to a second target private key corresponding to a second target public key generated by the second device, determines the target distance between the first vector and the second vector, and determines whether the first data and the second data are matched according to the target distance between the second vector and the first vector and a preset first distance threshold.
Example 6:
in order to enable the second device to determine whether the first data and the second data match, on the basis of the foregoing embodiments, in this embodiment of the present application, after determining the target distance between the first vector and the second vector, the method further includes:
and sending the target distance of the first vector and the second vector to the second equipment, so that the second equipment determines whether the first data and the second data are matched based on the target distance and a preset first distance threshold.
In order to enable the second device to determine whether the first data and the second data match, in this embodiment of the application, the second device also needs to obtain a target distance between the first vector and the second vector, and specifically, the target distance between the first vector and the second vector obtained by the second device may be sent to the second device after the first device obtains the target distance. After receiving the target distance between the first vector and the second vector sent by the first device, the second device determines whether the first data and the second data are matched based on the target distance and a preset first distance threshold.
The following is a specific example:
the first equipment inputs first data to be matched into a vector conversion model which is deployed in the first equipment and is trained in advance, a first vector corresponding to the first data is obtained, and the second equipment inputs second data to be matched into the vector conversion model which is deployed in the second equipment and is trained in advance, and a second vector corresponding to the second data is obtained. If the first vector corresponding to the first data U1 is (x1, x2, x3 … …, xm), the second vector corresponding to the second data U2 is (y1, y2, y3 … …, ym).
The first device generates a first target public and private key pair A (pka1, ska1), wherein pka1 is a first target public key, ska1 is a first target private key, homomorphic encryption is performed on a first vector based on the first target public key to generate a first encryption vector, and the first encryption vector corresponding to the first vector (x1, x2, x3 … …, xm) is (E) pka1 (x1),E pka1 (x2),E pka1 (x3)……,E pka1 (xm)) and sends the first encrypted vector and the first target public key to the second device.
And after receiving the first target public key and the first encryption vector sent by the first equipment, the second equipment performs homomorphic encryption on the second vector based on the first target public key to obtain a second encryption vector. Specifically, the second encryption vector corresponding to the second vector (y1, y2, y3 … …, ym) is (E) pka1 (y1),E pka1 (y2),E pka1 (y3)……,E pka1 (ym)), the second device determines the distance between the encrypted first vector and the second vector according to the second encrypted vector and the received first encrypted vector, and sends the distance to the first device.
After receiving the distance between the encrypted first vector and the encrypted second vector, the first device decrypts the distance between the encrypted first vector and the encrypted second vector according to a first target private key corresponding to a first target public key generated by the first device, and determines the target distance between the first vector and the second vector, the first device may determine whether the first data and the second data are matched based on the target distance and a preset first distance threshold, and the first device sends the target distance between the first vector and the second vector to the second device, and the second device determines whether the first data and the second data are matched according to the target distance between the second vector and the first vector and the preset first distance threshold.
Example 7:
in order to determine the target distances of the first vector and the second vector, on the basis of the foregoing embodiments, in this embodiment of the present application, the determining the target distances of the first vector and the second vector based on the encrypted distance between the first vector and the second vector and the first target private key corresponding to the first target public key includes:
and decrypting the distance between the encrypted first vector and the second vector by adopting a first target private key corresponding to the first target public key generated by the first equipment, and determining the target distance between the first vector and the second vector.
In this embodiment of the present application, since the distance between the encrypted first vector and the encrypted second vector obtained by the first device is determined based on the first encrypted vector and the second encrypted vector, and the first encrypted vector and the second encrypted vector are determined according to the first target public key generated by the first device, in order to determine the target distance between the first vector and the second vector, after obtaining the distance between the encrypted first vector and the second vector, the first device decrypts the distance between the encrypted first vector and the second vector by using the first target private key corresponding to the first target public key generated by the first device, so as to determine the target distance between the first vector and the second vector.
Example 8:
for determining the first encryption vector, on the basis of the foregoing embodiments, in an embodiment of the present application, the inputting the first data to be matched into a vector conversion model that is trained in advance, and obtaining the first vector corresponding to the first data includes:
for each first subdata in the first data, inputting the first subdata into a vector conversion model which is trained in advance to obtain a first subvector corresponding to the first subdata; the length of the first sub-vector corresponding to each first sub-data is a first preset length;
and splicing the first sub-vectors corresponding to each first sub-data to obtain the first vector corresponding to the first data.
In this embodiment of the present application, one first data may include one first sub-data, or may include a plurality of first sub-data, for example, the first data includes one first sub-data of "shanghai city, purdong new area sunny canteen", and the first data may further include three first sub-data, for example, the three first sub-data are respectively: the Shanghai Shandong New region sunny canteen, the Shanghai City everyday restaurant and the Gaoku Luohou Yang Guosfu spicy soup.
In order to determine a first vector corresponding to first data, for each first subdata in the first data, the first subdata is input into a vector conversion model which is trained in advance to obtain a first subvector corresponding to the first subdata, where lengths of texts, numbers, or characters included in each first subdata may be different, but lengths of the first subvectors corresponding to each first subdata are first preset lengths, where the first preset lengths may be 3, 4, or 6, and the like, and specifically, the first preset lengths may be set according to requirements.
To illustrate, the dimension of the vector output by the vector conversion model which is trained in advance is 5, the vector conversion model which is trained in advance is a word vector model, if the first data comprises three first subdata, each of which is text data, the three first subdata are respectively 'Shanghai City Pudong New zone fine day canteen', 'Shanghai City heaven restaurant' and 'Koukong Luo Yang spicy and hot', the 'Shanghai City Pudong New zone fine day canteen' is input into the word vector model which is trained in advance, the first subvectors corresponding to the output 'Shanghai City Pudong New zone fine day canteen' are (1.0, 2.0, 1.5, 2.0, 3.5), the 'Shanghai City heaven restaurant' is input into the word vector model which is trained in advance, the first subvectors corresponding to the output 'Shanghai City heaven restaurant' are (3.0, 4.0, 2.5, 2.5, 1.5), inputting the "Gao Ke Lu Yang Guofu hotpot" into a word vector model which is trained in advance, and outputting a first sub-vector corresponding to the "Gao Ke Lu Yang Guofu hotpot" as (4.5, 5.5, 7.5, 1.5, 0.5).
If the first data includes three first sub-data, each of which is digital data, and the three digital data are "12345", "11111", and "22233", respectively, "12345" is input into the word vector model that is trained in advance, the first sub-vector corresponding to the output "12345" is (00000000000010, 0000000100, 0000001000, 0000010000, 0000100000), "11111" is input into the word vector model that is trained in advance, the first sub-vector corresponding to the output "11111" is (0000000010, 0000000010, 0000000010, 0000000010, 0000000010, 0000000010), and "22233" is input into the word vector model that is trained in advance, and the first sub-vector corresponding to the output "22233" is (0000000100, 0000000100, 00000000000010, 0000000010).
In order to determine the first vector corresponding to the first data, in this embodiment of the application, after obtaining the first sub-vector corresponding to each first sub-data in the first data, the first sub-vectors corresponding to each first sub-data are spliced, and a splicing result is determined as the first vector corresponding to the first data.
For example, when the first data includes three first sub-data of "shanghai pu-dong new region sunny canteen", "shanghai city heaven restaurant", and "gaku luo yang fu hotpot", the first sub-vector corresponding to "shanghai city pu-dong new region sunny canteen" is (1.0, 2.0, 1.5), "shanghai city heaven restaurant" is (3.0, 4.0, 2.5), "gaku luo yang fu hotpot" is (4.5, 5.5, 7.5), the first sub-vector can be randomly ordered for each first sub-data included in the first data, and the ordering result obtained is "shanghai pu-dong new region sunny canteen", "gaku luo yang fu country hotpot", "shanghai city heaven restaurant", and the first sub-data obtained by stitching the three first sub-vectors corresponding to the first sub-vectors according to the ordering result is 1.0, 2.0,1.5,4.5,5.5,7.5,1.0,2.0,1.5).
In order to implement fuzzy matching of the first data and the second data, on the basis of the foregoing embodiments, in this embodiment of the application, lengths of the first vector and the second vector are both a second preset length.
In this embodiment of the present application, in order to implement fuzzy matching between first data and second data, lengths of a first vector corresponding to the obtained first data and a second vector corresponding to the obtained second data must be the same and both have a second preset length, where the second preset length is not less than the first preset length, the second preset length is an integer multiple of the first preset length, and if only one first sub-data is included in the first data, the first preset length is equal to the second preset length.
Because the length of the first vector corresponding to the first data and the second length corresponding to the second data are both the second preset length, even if the first data is different from the second data, fuzzy matching can be realized, and the use scene is widened.
Example 9:
for determining the first encryption vector, on the basis of the foregoing embodiments, in an embodiment of the present application, the homomorphic encrypting the first vector by using the self-generated first target public key to generate the first encryption vector includes:
for each first component in the first vector, determining a first square component corresponding to the first component;
inserting a first square component corresponding to each first component into the first vector according to a preset insertion rule, and updating a vector obtained after the first square component is inserted into the first vector;
and performing homomorphic encryption on each first component and each first square component in the first vector respectively based on the first target public key to generate the first encrypted vector.
To generate the first encrypted vector, in this embodiment of the application, the first vector may be homomorphically encrypted directly based on the first target public key to obtain an encrypted first encrypted vector. In order to ensure that the distance between the encrypted first vector and the encrypted second vector can be further determined based on the first encrypted vector and the second encrypted vector without decrypting the first encrypted vector and the second encrypted vector, in this embodiment of the present application, a first square component corresponding to each first component in the first vector may be further determined first.
For example, if the first vector is (1, 2, 4, 5, 3), the first square component corresponding to the first component of 1 in the first vector is 1, the first square component corresponding to the first component of 2 in the first vector is 4, the first square component corresponding to the first component of 4 in the first vector is 16, the first square component corresponding to the first component of 5 in the first vector is 25, and the first square component corresponding to the first component of 3 in the first vector is 9.
In this embodiment, after determining the first square component corresponding to each first component in the first vector, the first square component corresponding to each first component may be inserted into the first vector according to a preset rule, and updating the vector obtained after the insertion of the first square component into a first vector, specifically, for the first square component corresponding to each first component, the first square component corresponding to the first component may be inserted anywhere in the first vector, e.g., the first square component corresponding to the first component is inserted before the first component in the first vector, or a first square component corresponding to the first component is inserted into the first vector after the first component, or the first squared component is inserted sequentially after the first component as long as it is ensured that the first device and the second device can identify the first component and the first squared component in each vector.
For example, if the first vector is (1, 2, 4, 5, 3), after determining the first square component of each first component in the first vector, the first square component is inserted into the first vector, and the obtained updated first vector is (1, 9, 2, 16, 4, 4, 5, 25, 1, 3).
In order to facilitate determining the distance between the encrypted first vector and the encrypted second vector based on the first vector inserted with the first square component, in this embodiment of the present application, for each first component in the first vector, after determining the first square component corresponding to the first component, the first square component corresponding to each first component may also be inserted into the first vector according to a preset insertion rule, and the vector obtained after inserting the first square component is updated to the first vector, specifically, the first square component corresponding to the first component may be inserted into a position in the first vector after and adjacent to the first component.
For example, if the first vector is (1, 2, 4, 5, 3), after determining the first square component of each first component in the first vector, the first square component is inserted into the first vector, and the obtained updated first vector is (1, 1, 2, 4, 4, 16, 5, 25, 3, 9).
After the updated first vector is determined, in order to determine the first encrypted vector, in this embodiment of the application, each first component and each first square component in the first vector may be homomorphically encrypted based on the first target public key, so as to generate the first encrypted vector.
For example, if the updated first vector is (2, 4, 3, 9), the first encrypted vector is (E) pka (2),E pka (4),E pka (3),E pka (9) Wherein, the E pka (2) Characterizing the result of homomorphically encrypting a first component of 2 in the first vector based on the target public key, E pka (4) Characterizing the result of homomorphically encrypting a first component of 4 in the first vector based on the target public key, E pka (3) Characterizing the result of homomorphically encrypting a first component of 3 in the first vector based on the target public key, E pka (9) The characterization is based on the result of homomorphic encryption of the first component of 9 in the first vector based on the target public key.
In order to determine the distance between the encrypted first vector and the encrypted second vector, on the basis of the foregoing embodiments, in this embodiment of the present application, the obtaining the distance between the encrypted first vector and the encrypted second vector determined based on the first encrypted vector and the second encrypted vector includes:
acquiring each group of first encryption components and first encryption square components according to the preset insertion rule and each first component in the first encryption vector; acquiring each group of second encryption components and second encryption square components according to the preset insertion rule and each second component in the second encryption vector;
determining each corresponding group of a first encryption component, a first encryption square component, a second encryption component and a second encryption square component according to the preset insertion rule;
determining each encrypted sub-distance according to each group of the first encrypted components, the first encrypted square components, the second encrypted components and the second encrypted square components;
and determining the distance between the encrypted first vector and the encrypted second vector according to the sum of each sub-distance.
In this embodiment of the present application, in order to determine a distance between an encrypted first vector and an encrypted second vector, the first device may obtain, for a preset insertion rule and each first component in the first encrypted vector, each group of first encrypted components and first encrypted square components, where one group of the first encrypted components and the first encrypted square components is obtained by homomorphically encrypting the same component and the square component of the component. That is, in each group of the first encrypted component and the first encrypted square component, the first encrypted square component is obtained by homomorphically encrypting the first square component corresponding to the first encrypted component before encryption.
Specifically, if the predetermined insertion rule is to insert the first square component corresponding to the first component into the first vector at a position behind and adjacent to the first component. When determining each group of the first encrypted component and the first encrypted component, directly starting from the first component of the first vector, determining the component of the packet to be determined and the component of the packet to be determined next to the component of the packet to be determined in the first vector as a group, and sequentially dividing until determining the first encrypted component and the first encrypted square component corresponding to all the packets.
For example, the first encrypted vector is (E) pka (2),E pka (4),E pka (3),E pka (9) Two sets of a first encrypted component and a first encrypted squared component may be obtained, where the first set is E) pka (2)、E pka (4) The second group is E pka (3)、E pka (9). Wherein E in the first group pka (2) For the first encrypted component in the first group, E pka (4) For the first encrypted square component in the first group, E in the second group pka (3) For the first encrypted component in the second group, E pka (9) Is the first encrypted square component in the second set.
In this embodiment of the present application, after obtaining the second encryption vector sent by the second device, the first device may also obtain each group of second encryption components and second encryption square components according to a preset insertion rule and each second component in the second encryption vector, where the preset insertion rule corresponding to obtaining the first vector is the same as the preset insertion rule corresponding to obtaining the second vector, and a process of obtaining each group of second encryption components and second encryption square components is the same as a process of obtaining each group of first encryption components and first encryption square components, which is not described herein again.
In order to determine the distance between the encrypted first vector and the encrypted second vector, in this embodiment of the present application, each encrypted sub-distance may be determined according to each group of the first encrypted component, the first encrypted square component, the second encrypted component, and the second encrypted square component, specifically, for each group, a target sum of the first encrypted square component and the second encrypted square component in the group may be determined first, and then a target product of the first encrypted component, the second encrypted component, and a preset value in the group may be determined, where in this embodiment of the present application, the preset value is 2, and finally a target difference of the target sum and the target product is determined, and the target difference is determined as the group of encrypted sub-distances.
After each set of encrypted sub-distances is determined, the distance between the encrypted first vector and the encrypted second vector may be determined according to the sum of each sub-distance.
For example, if it is determined that the first data corresponds toThe first vector of (1, 5), the second vector corresponding to the second data is (2, 3), after the first square component of each first component in the first vector is determined, each first square component is inserted into the first vector, the obtained updated first vector is (1, 1, 5, 25), and the first encryption vector is determined to be (E) pka (1),E pka (1),E pka (5),E pka (25) After determining a second square component of each second component in the second vector, each second square component is inserted into the second vector, the obtained updated second vector is (2, 4, 3, 9), and the second encryption vector is determined to be (E) pka (2),E pka (4),E pka (3),E pka (9) Determine the distance of the first and second encrypted vectors as [ E ] pka (1)+E pka (4)-2E pka (1)*E pka (2)]+[E pka (25)+E pka (9)-2E pka (5)*E pka (3)]。
Because the first data to be matched may include a plurality of first sub-data, and the second data to be matched may also include a plurality of second sub-data, for each of the first sub-data and the second sub-data, a distance between a first sub-vector of the encrypted first sub-data and a second sub-vector of the second sub-data may also be determined, and then it is determined whether each of the first sub-data and each of the second sub-data are matched. And after the encrypted distance matrix is decrypted based on the target private key, determining the target sub-distance matrix, wherein each element in the target sub-distance matrix is the target sub-distance of the corresponding first sub-vector and second sub-vector.
After the target sub-distance of each first sub-vector and each second sub-vector is determined, the target distance of the first vector and the second vector can be determined according to the sum of the target sub-distances, and whether each first sub-data is matched with each second sub-data can be determined.
Fig. 2a is a schematic diagram illustrating a target sub-distance provided in some embodiments of the present application, fig. 2b is a schematic diagram illustrating a target sub-distance matrix provided in some embodiments of the present application, fig. 3a is a schematic diagram illustrating another target sub-distance provided in some embodiments of the present application, and fig. 3b is a schematic diagram illustrating another target distance sub-matrix provided in some embodiments of the present application, which are now described with reference to fig. 2a, fig. 2b, fig. 3a, and fig. 3 b.
D (x, y) represents a target sub-distance, a first vector represented by x and a second vector represented by y, if three first subdata exist in the first data and three second subdata also exist in the second data, the first subvectors corresponding to the three first data subvectors are respectively represented as a1, a2 and A3, the second subvectors corresponding to the three second subdata are respectively represented as B1, B2 and B3, the target subvectors corresponding to the a1 and the B1 are 1, the target subvectors corresponding to the a1 and the B2 are 3.64, the target subvectors corresponding to the a1 and the B3 are 7.66, the target subvectors corresponding to the a2 and the B1 are 3.9, the target subvectors corresponding to the a2 and the B2 are 0, the target subvectors corresponding to the a2 and the B3 are 5.7, the target subvectors corresponding to the A3 and the B1 are 8.35, the target subvectors corresponding to the A3 and the B2 are 5.16, the target subvectors corresponding to the A3 and B3 are 8.18, as shown in fig. 2a, and as shown in fig. 2B, and as shown in fig. 2a and a corresponding to a.
If there are three first sub-data in the first data, there are also three second sub-data in the second data, and the first sub-vectors corresponding to the three first sub-data are respectively represented as a1, a2, and A3, the second sub-vectors corresponding to the three second sub-data are respectively represented as B1, B2, and B3, the target sub-distance corresponding to a1 and B1 is 2.82, the target sub-distance corresponding to a1 and B2 is 1, the target sub-distance corresponding to a1 and B3 is 3.16, the target sub-distance corresponding to a2 and B1 is 3, the target sub-distance corresponding to a2 and B2 is 1.73, the target sub-distance corresponding to a2 and B3 is 3, the target sub-distance corresponding to A3 and B1 is 0, the target sub-distance corresponding to A3 and B2 is 2.83, the target sub-distance corresponding to A3 and B3 is 3, as shown in a target matrix shown in fig. 3B.
Example 10:
in order to determine a second vector corresponding to second data, on the basis of the foregoing embodiments, in this application embodiment, the determining whether the first data and the second data match based on the target distance and a preset first distance threshold includes:
determining whether the target distance is smaller than a preset first distance threshold value;
if yes, determining that the first data is matched with the second data;
otherwise, it is determined that the first data does not match the second data.
In order to determine whether the first data and the second data are matched, in this embodiment of the application, the target distance and a preset distance threshold are compared, if the target distance is smaller than the preset first distance threshold, it is determined that the first data and the second data are matched, and if the target distance is not smaller than the preset first distance threshold, it is determined that the first data and the second data are not matched. The preset first distance threshold may be 1, may be 1.5, and the like, and specifically, the preset first distance threshold may be set according to a requirement. Wherein the smaller the target distance, the more matched the first vector and the second vector.
In order to determine whether the second data and the first data are completely matched, on the basis of the foregoing embodiments, in this embodiment of the present application, after determining that the first data and the second data are matched, the method further includes:
and determining whether the target distance is equal to a preset second distance threshold, and if so, determining that the first data is the same as the second data.
In this embodiment, if the target distance is equal to a preset second distance threshold, it indicates that the first data is the same as the second data, that is, the first data and the second data are completely matched, where the preset second distance threshold is smaller than the preset first distance threshold, and the preset second distance threshold is equal to 0.
Example 11:
in order to ensure that matching can be performed when data of two parties are not identical and to widen a service range of data matching, embodiments of the present application provide a data matching method, apparatus, device, and medium.
Fig. 4 is a schematic process diagram of a data matching method according to an embodiment of the present application, where the process includes the following steps:
s401: and inputting second data to be matched into a vector conversion model which is trained in advance to obtain a second vector corresponding to the second data.
The data matching method provided by the embodiment of the application is applied to the second device, the second device can be a device such as an intelligent terminal, a PC or a server, and the second device is different from the first device in the application.
In the embodiment of the present application, in order to ensure that fuzzy matching can be achieved even when data of both parties are not identical, in the embodiment of the present application, a vector transformation model which is trained in advance is deployed in the second device, and is used for obtaining a vector corresponding to data to be matched, and for different data, dimensions of vectors output by the vector transformation model which is trained in advance are identical.
In order to obtain a second vector corresponding to second data to be matched, the second data is input into a vector conversion model which is trained in advance, and the vector conversion model which is trained in advance outputs the second vector corresponding to the second data.
In the embodiment of the present application, the first data and the second data are generally the same type of data, for example, both text data or both digital data.
S402: and receiving a first target public key sent by first equipment, and homomorphically encrypting the second vector by adopting the first target public key to generate a second encrypted vector.
In this embodiment of the application, in order to determine the second encrypted vector, after receiving the first target public key sent by the first device, the second device performs homomorphic encryption on the second vector by using the first target public key to generate the second encrypted vector.
S403: and acquiring a target distance between a first vector and a second vector determined based on the first encrypted vector and the second encrypted vector, wherein the first encrypted vector is obtained by encrypting the first vector by using the first target public key, and the first vector is obtained by inputting first data into a vector conversion model which is trained in advance in the first device.
In the embodiment of the present application, in order to implement that the second device can also determine whether the first data and the second data match, the second device also obtains a target distance of the first vector and the second vector determined based on the first encryption vector and the second encryption vector, where the target distance may be determined by the first device and then sent to the second device, and may also be determined by the second device.
The first encryption vector is obtained by encrypting the first vector by the first equipment by adopting a first target public key generated by the first equipment, and the first vector is obtained by inputting the first data into a vector conversion model which is trained in advance in the first equipment.
S404: and determining whether the first data and the second data are matched or not based on the target distance and a preset first distance threshold.
In order to determine whether the first data and the second data match, in the embodiment of the present application, the target distance is compared with a preset first distance threshold, and whether the first data and the second data match is determined according to a comparison result.
In the embodiment of the application, the first data and the second data to be matched are respectively input into a vector conversion model which is trained in advance, a first vector corresponding to the first data and a second vector corresponding to the second data are obtained, a first encryption vector encrypted based on the first vector and a second encryption vector encrypted based on the second vector are obtained, the distance between the first vector and the second vector is determined, the target distance between the first vector and the second vector is determined based on the distance between the first vector and the second vector and a first target private key generated by the first vector and the second vector, whether the first data and the second data are matched or not is determined based on the target distance and a preset first distance threshold, namely when the first data and the second data are not completely identical, fuzzy matching of the first data and the second data can be realized, and the use scene is widened, and the first target public key and the first target private key are introduced to perform homomorphic encryption and decryption respectively in the fuzzy matching process, so that the safety intersection is realized, the safety of the matching process is ensured, in the whole matching process, the first data and the second data do not leave the corresponding first equipment and second equipment in the form of original data, the fuzzy matching can be realized even if the original data do not leave the library, and the safety of the matching process is further ensured.
Example 12:
in order to determine a second vector corresponding to second data, on the basis of each of the foregoing embodiments, in this application embodiment, inputting the acquired second data to be matched into a vector transformation model that is trained in advance, and obtaining the second vector corresponding to the second data includes:
determining a second target data type corresponding to second data to be matched;
determining a pre-finished second target vector conversion model corresponding to the second data according to the second target data type and the corresponding relation between the pre-stored data type and the pre-trained vector conversion model;
and inputting the second data into the pre-trained second target vector conversion model to obtain a second vector corresponding to the second data.
In the embodiment of the present application, since the second data to be matched may be text data, for example, the first data may be name, gender, address, and the like, or may also be digital data, for example, the second data may be identity card number, bank card number, reference card number, and the like, the vector conversion models trained in advance to obtain the corresponding second vectors are also different for the second data of the different data types.
Specifically, the corresponding relationship between the data type and the vector conversion model completed in advance may be stored in the second device, and according to the second target data type corresponding to the obtained second data to be matched, the corresponding vector conversion model completed in advance is adopted to obtain the second vector corresponding to the second data; the corresponding pre-trained vector transformation model is the pre-trained second target vector transformation model.
Fig. 5a is a schematic diagram of a process for obtaining a vector corresponding to text type data according to some embodiments of the present application, and fig. 5b is a schematic diagram of a process for obtaining a vector corresponding to numeric type data according to some embodiments of the present application, which will now be described with reference to fig. 5a and 5 b.
If the data to be matched is text data, and the data to be matched can be first data or second data, the vector conversion model which is trained firstly is converted into a word vector model, the text data is input into the word vector model which is trained in advance, and the word vector model outputs a vector corresponding to the text data.
If the data to be matched is digital data, and the data to be matched can be first data or second data, the vector conversion model which is trained in advance is a One-Hot coding model, the digital data is input into the One-Hot coding model which is trained in advance, and the One-Hot coding model outputs a vector corresponding to the digital data.
In order to accurately determine a model for converting second data into a second vector, on the basis of the foregoing embodiments, if the second target data type is a text type, the corresponding pre-trained second target vector conversion model is a word vector model or a sentence vector model; and if the second target data type is a digital type, converting the corresponding pre-trained second target vector into a one-hot coding model.
In this embodiment of the application, if the second data is text data, in order to obtain a second vector corresponding to the text data, the pre-trained vector conversion model deployed in the second device may be a word vector model or a sentence vector model, and if the second data is digital data, in order to determine the second vector corresponding to the digital data, the pre-trained vector conversion model deployed in the second device may be a One-Hot (One-Hot) coding model.
Specifically, if the second data is text data, determining a pre-trained second target vector conversion model corresponding to the second data as a word vector model according to a correspondence between a pre-stored data type and a pre-trained vector conversion model, and acquiring a second vector corresponding to the second data based on the pre-trained word vector model; and if the second data is digital data, determining that a pre-trained second target vector conversion model corresponding to the second data is a One-Hot coding model according to the corresponding relation between the pre-stored data type and the pre-trained vector conversion model, and acquiring a second vector corresponding to the second data based on the pre-trained One-Hot coding model.
When the vector conversion model is trained, each data and a label vector corresponding to the data can be labeled in advance, each data and the corresponding label vector are input into the original vector conversion model, parameters of the original vector conversion model are adjusted according to a prediction vector output by the original vector conversion model and the corresponding label vector, and when a convergence condition is met, the vector conversion model is determined to be trained.
Example 13:
in order to obtain a target distance between a first vector and a second vector determined based on the first encrypted vector and the second encrypted vector, on the basis of the foregoing embodiments, the receiving the target public key sent by the first device includes:
receiving the first target public key and the first encryption vector sent by the first device, wherein the first encryption vector is obtained by homomorphically encrypting the first vector by adopting the first target public key;
the obtaining a target distance of a first vector and a second vector determined based on the first encrypted vector and the second encrypted vector comprises:
determining the distance between the encrypted first vector and the second vector according to the first encrypted vector and the second encrypted vector;
sending the encrypted distance between the first vector and the second vector to the first device;
and receiving the target distance sent by the first device, wherein the target distance is obtained by the first device decrypting the distance between the encrypted first vector and the encrypted second vector by using a first target private key corresponding to the first target public key generated by the first device.
In this embodiment, the target distance of the first vector and the second vector determined based on the first encryption vector and the second encryption vector obtained by the second device may be determined by the first device and sent to the second device, or may be determined by the second device.
Because the first encrypted vector and the second encrypted vector are generated by homomorphic encryption of the first vector and the second vector based on the first target public key generated by the first device, after the distance between the encrypted first vector and the encrypted second vector is determined, the distance between the encrypted first vector and the encrypted second vector also needs to be decrypted based on the first target private key generated by the first device, if the target distance between the encrypted first vector and the encrypted second vector is determined by the second device, the second device needs to receive the first target private key sent by the first device, and if a bug or an attack exists in the sending process, the security of information is affected, and the security is not high.
In order to improve security, in this embodiment of the application, the target distance may be determined by the first device and sent to the second device, specifically, the second device may also receive a first encryption vector determined by the first device while receiving the first target public key sent by the first device, where the first encryption vector is obtained by the first device performing homomorphic encryption on the first vector by using the first target public key.
After receiving the first encrypted vector and the first target public key sent by the first device, the second device may first homomorphically encrypt the second vector based on the first target public key, determine a second encrypted vector, and then determine a distance between the encrypted first vector and the second vector based on the first encrypted vector and the second encrypted vector, since the first encrypted vector and the second encrypted vector are both generated by encrypting based on the first target public key in the first target public and private key pair generated by the first device, in this embodiment of the present application, in order to determine the target distance between the first vector and the second vector, the second device may send the determined distance between the encrypted first vector and the second vector to the first device, so that the first device decrypts the encrypted first vector and the second vector according to the generated first target private key in the first target public and private key pair to generate the target distance, and sending the target distance to the second device, and receiving the target distance sent by the first device by the second device.
In the embodiment of the application, the first data does not leave the first device in the form of the original data, so that the fuzzy matching can be realized without the original data being exported, and the safety of the matching process is further ensured.
Example 14:
in order to obtain the target distance between the first vector and the second vector determined based on the first encrypted vector and the second encrypted vector, on the basis of the foregoing embodiments, the obtaining the target distance between the first vector and the second vector determined based on the first encrypted vector and the second encrypted vector includes:
sending the second encrypted vector to the first device, so that the first device determines the distance between the encrypted first encrypted vector and the second encrypted vector based on the second encrypted vector and the first encrypted vector obtained by homomorphically encrypting the first vector by using the first target public key generated by the first device;
receiving a target distance between the first vector and the second vector sent by the first device, wherein the target distance is obtained by decrypting the distance between the encrypted first vector and the encrypted second vector by the first device based on a target private key corresponding to the first target public key generated by the first device.
In this embodiment, in order to obtain the target distance between the first vector and the second vector determined based on the first encrypted vector and the second encrypted vector, the first device may send the first target public key and the first encrypted vector to the second device, so that the second device may determine the distance between the encrypted first vector and the second vector based on the first encrypted vector and the second encrypted vector generated by encrypting the second vector based on the first target public key, and may send only the first target public key to the second device.
After receiving a first target public key sent by a first device, a second device performs homomorphic encryption on a second vector based on the first target public key to generate a second encrypted vector, and then sends the second encrypted vector to the first device, so that the first device determines the distance between the encrypted first encrypted vector and the second encrypted vector based on the second encrypted vector and the first encrypted vector, decrypts the distance between the encrypted first vector and the second vector based on a first target private key in a first target public and private key pair generated by the second device, determines the target distance between the first vector and the second vector and sends the target distance to the second device, and the second device receives the target distance between the first vector and the second vector sent by the first device and can determine whether first data and second data are matched based on the target distance.
Fig. 6 is a schematic diagram of an overall process of fuzzy matching between two pieces of data according to some embodiments of the present application, and the description is now made with reference to fig. 6.
Inputting first data and second data to be matched into a vector conversion model which is trained in advance, outputting a first vector corresponding to the first data to be matched and a second vector corresponding to the second data to be matched, homomorphically encrypting the first vector and the second vector based on a first target public key respectively to obtain a first encryption vector and a second encryption vector, determining the distance between the encrypted first vector and the second vector according to the first encryption vector and the second encryption vector, decrypting the distance between the encrypted first vector and the second vector based on a first target private key, determining the target distance between the first vector and the second vector, comparing according to the target distance and a preset first distance threshold value, and determining a matching result.
Fig. 7 is a schematic diagram of a specific process for fuzzy matching between two pieces of data according to some embodiments of the present application, and the description is now made with reference to fig. 7.
The first equipment inputs first data to be matched into a vector conversion model which is deployed in the first equipment and is trained in advance, a first vector corresponding to the first data is obtained, and the second equipment inputs second data to be matched into the vector conversion model which is deployed in the second equipment and is trained in advance, and a second vector corresponding to the second data is obtained. As shown in fig. 7, the first data includes four first sub-data, which are U1, U2, U3, and U4, respectively, and the first sub-vector corresponding to U1 is (x11, x12, x13 … …, and x1m), the first sub-vector corresponding to U2 is (x21, x22, x23 … …, and x2m), the first sub-vector corresponding to U3 is (x31, x32, x33 … …, and x3m), and the first sub-vector corresponding to U4 is (x41, x42, x43 … …, and x4 m). The first vector corresponding to the first data is (x11, x12, x13 … …, x1m, x21, x22, x23 … …, x2m, x31, x32, x33 … …, x3m, x41, x42, x43 … …, x4 m). The second data includes four second sub-data, which are U5, U6, U7, and U8, respectively, and the second sub-vector corresponding to U5 is (y11, y12, y13 … …, y1m), the second sub-vector corresponding to U6 is (y21, y22, y23 … …, y2m), the second sub-vector corresponding to U7 is (y31, y32, y33 … …, y3m), the second sub-vector corresponding to U8 is (y 8, y 8, y 8), and the second vector corresponding to the second data is (y 8, y 8, y 8, y 8, y 8, y 8, and 8).
The first device generates a first target public-private key pair a (pka, ska), where the pka is the first target public keyAnd ska is a first target private key that is a homomorphic encrypted target public-private key pair. Homomorphic encrypting the first vector based on the first target public key to generate a first encrypted vector, (x11, x12, x13 … …, x1m, x21, x22, x23 … …, x2m, x31, x32, x33 … …, x3m, x41, x42, x43 … …, x4m), the corresponding first encrypted vector is (E) pka (x11),E pka (x12),E pka (x13)……,E pka (x1m),E pka (x21),E pka (x22),E pka (x23)……,E pka (x2m),E pka (x31),E pka (x32),E pka (x33)……,E pka (x3m),E pka (x41),E pka (x42),E pka (x43)……,E pka (x4 m)). Wherein the first encrypted vector comprises m/2 groups of encrypted first encrypted components and first encrypted square components.
The first device sends the first target public key and the first encrypted vector to the second device, and the second device performs homomorphic encryption on the second vector based on the first target public key after receiving the first target public key and the first encrypted vector to generate a second encrypted vector, where the second encrypted vector corresponding to (y11, y12, y13 … …, y1m, y21, y22, y23 … …, y2m, y31, y32, y33 … …, y3m, y41, y42, y43 … …, y4m) is (E3683, y3, y m, y41, y42, y43 … …, y4m) pka (y11),E pka (y12),E pka (y13)……,E pka (y1m),E pka (y21),E pka (y22),E pka (y23)……,E pka (y2m),E pka (y31),E pka (y32),E pka (y33)……,E pka (y3m),E pka (y41),E pka (y42),E pka (y43)……,E pka (y4 m)). Wherein the second encrypted vector contains m/2 sets of encrypted second encrypted components and second encrypted square components.
And aiming at each encrypted first sub-vector and each corresponding encrypted second sub-vector, the second equipment determines the distance between the encrypted first sub-vector and the second sub-vector, determines an encrypted distance matrix based on the distance between each encrypted first sub-vector and the second sub-vector, and sends the encrypted distance matrix to the first equipment.
After receiving the encrypted distance matrix sent by the second device, the first device decrypts the encrypted distance matrix based on a first target private key generated by the first device, that is, decrypts a distance between each encrypted first sub-vector and each encrypted second sub-vector to obtain a target sub-distance matrix, the first device determines whether each first sub-data matches each second sub-data based on the target sub-distance matrix, where each element in the target sub-distance matrix is a corresponding target sub-distance, sends the target sub-distance matrix to the second device, the second device also obtains the target sub-distance matrix, and the second device determines whether each first sub-data matches each second sub-data based on the target sub-distance matrix.
Example 15:
in order to obtain the target distance between the first vector and the second vector determined based on the first encrypted vector and the second encrypted vector, on the basis of the foregoing embodiments, the obtaining the target distance between the first vector and the second vector determined based on the first encrypted vector and the second encrypted vector includes:
homomorphically encrypting the second vector by using a second target public key generated by the second device to obtain a third encrypted vector, and sending the second target public key and the third encrypted vector to the first device;
receiving a third encrypted vector and a fourth encrypted vector sent by the first device, and determining a distance between the encrypted second vector and the encrypted first vector, wherein the fourth encrypted vector is generated by homomorphically encrypting the first vector by using the second target public key;
determining the target distance of the first vector and the second vector based on the encrypted distance between the first vector and the second vector and a second target private key corresponding to the second target public key, and determining whether the first data and the second data are matched based on the target distance and a preset first distance threshold.
In order to enable both the first device and the second device to determine whether the first data and the second data match, in this embodiment of the application, the second device may also obtain a target distance between the first vector and the second vector, where the target distance between the first vector and the second vector may be obtained by the second device decrypting the distance between the first vector and the second vector after receiving the encryption sent by the first device.
Specifically, the second device may perform homomorphic encryption on the second vector by using a second target public key generated by the second device to obtain a third encrypted vector, and send the third encrypted vector and the second target public key to the first device, and after receiving the third encrypted vector and the second target public key, the first device performs homomorphic encryption on the first vector based on the second target public key to obtain a fourth encrypted vector. The first device may determine, according to the third encrypted vector and the fourth encrypted vector, a distance between the encrypted first vector and the encrypted second vector, and send the distance between the encrypted first vector and the encrypted second vector to the second device, after the second device receives the distance between the encrypted first vector and the encrypted second vector, the second device decrypts the distance between the encrypted first vector and the encrypted second vector by using a second target private key corresponding to a second target public key generated by the second device, obtains a target distance between the first vector and the second vector, and determines whether the first data and the second data are matched according to the target distance between the first vector and the second vector and a preset first distance threshold.
The process of determining whether the first data and the second data are matched by the second device according to the target distance between the first vector and the second vector and the preset first distance threshold is the same as the process of determining whether the first data and the second data are matched by the first device according to the target distance between the first vector and the second vector and the preset first distance threshold, which is not described herein again.
In the embodiment of the application, the second data does not leave the second device in the form of the original data, so that the fuzzy matching can be realized without the original data being exported, and the safety of the matching process is further ensured.
Example 16:
in order to obtain a target distance between a first vector and a second vector determined based on a first encrypted vector and a second encrypted vector, on the basis of the foregoing embodiments, inputting second data to be matched into a vector transformation model that is trained in advance, and obtaining a second vector corresponding to the second data includes:
for each second subdata in the second data, inputting the second subdata into a vector conversion model which is trained in advance to obtain a second subvector corresponding to the second subdata; the length of a second sub-vector corresponding to each second sub-data is a first preset length;
and splicing the second sub-vectors corresponding to each second sub-data to obtain the second vectors corresponding to the second data.
In this embodiment of the present application, one second data may include one second sub-data, or may include a plurality of second sub-data, for example, the second data includes one second sub-data of "shanghai city, purdong new area sunny canteen", and the second data may further include three second sub-data, for example, the three second sub-data are respectively: the Shanghai Shandong New region sunny canteen, the Shanghai City everyday restaurant and the Gaoku Luohou Yang Guosfu spicy soup.
In order to determine a second vector corresponding to second data, for each second subdata in the second data, the second subdata is input into a vector conversion model which is trained in advance to obtain a second subvector corresponding to the second subdata, where lengths of texts, numbers, or characters included in each second subdata may be different, but lengths of the second subvectors corresponding to each second subdata are first preset lengths, where the first preset length may be 3, or 4 or 6, and the like, and specifically, the first preset length may be set according to requirements.
For example, when the second data includes three second subdata of "Ministry of sunny canteen of Pudong New region in Shanghai city", "restaurant on sky in Shanghai city", and "Gaoku Lu Yang Guofu hotpot", inputting 'Wenshiking department in Pudong new area of Shanghai city' into a word vector model which is trained in advance, the second subvectors corresponding to the output ' Wenshiji of Pudong New region in Shanghai City are (2.0, 3.0, 2.5, 1.0, 1.5), the ' dining shop on the sky in Shanghai City ' is input into the word vector model which is trained in advance, the second subvectors corresponding to the output 'Shanghai city everyday restaurant' are (2.3, 4.4, 3.5, 4.5, 2.5), the 'Gao Ke Lu Yang Guofu hotpot' is input into the word vector model which is trained in advance, the second subvector corresponding to the outputted "Gaoku Lu Yang Guofu hotpot" is (2.5, 2.7, 8.3, 4.5, 1.5).
If the second data includes three second sub-data, each of which is digital data, and the three digital data are "54321", "00001", and "33322", respectively, "54321" is input into the word vector model that is pre-trained, the output second sub-vector corresponding to "54321" is (0000100000, 0000010000, 0000001000, 0000000100, 0000000010), "00001" is input into the word vector model that is pre-trained, the output second sub-vector corresponding to "000011" is (0000000000, 0000000000, 0000000000, 0000000010, 0000000010), and "33322" is input into the word vector model that is pre-trained, and the output second sub-vector corresponding to "22233" is (000000000000000010, 00000000000010, 0000000100, 0000000100).
In order to determine the second vector corresponding to the second data, in this embodiment of the application, after the second sub-vector corresponding to each second sub-data in the second data is obtained, the second sub-vectors corresponding to each second sub-data are spliced, and a splicing result is determined as the second vector corresponding to the second data.
For example, when the second data includes three second sub-data of "shanghai pu-dong new region sunny canteen", "shanghai city heaven restaurant", and "gaku luo yang fu hotpot", the second sub-vector corresponding to "shanghai city heaven new region sunny canteen" is (1.0, 2.0, 1.5), "shanghai city heaven restaurant" is (3.0, 4.0, 2.5), "gaku luo yang fu hotpot" is (4.5, 5.5, 7.5), and then after randomly ordering each second sub-data included in the second data, the ordering result obtained is "shanghai pu-dong new region sunny canteen", "gaku luo yang fu country hotpot", "shanghai city heaven restaurant", and the corresponding second sub-data obtained by stitching the three second sub-vectors according to the ordering result is 1.0, 2.0,1.5,4.5,5.5,7.5,1.0,2.0,1.5).
In order to implement fuzzy matching of the first data and the second data, on the basis of the above embodiments, the lengths of the first vector and the second vector are both the second preset length.
In this embodiment of the present application, in order to implement fuzzy matching between first data and second data, lengths of a first vector corresponding to the obtained first data and a second vector corresponding to the obtained second data must be the same and both have a second preset length, where the second preset length is not less than the first preset length, the second preset length is an integer multiple of the first preset length, and if only one second sub-data is included in the second data, the first preset length is equal to the second preset length.
Because the length of the first vector corresponding to the first data and the second length corresponding to the second data are both the second preset length, even if the first data is different from the second data, fuzzy matching can be realized, and the use scene is widened.
Example 17:
to determine a second encrypted vector, on the basis of the foregoing embodiments, the homomorphically encrypting the second vector by using the first target public key to generate a second encrypted vector includes:
for each second component in the second vectors, determining a second square component corresponding to the second component;
inserting a second square component corresponding to each second component into the second vectors according to a preset insertion rule, and updating the vectors obtained after the second square components are inserted into the second vectors;
and performing homomorphic encryption on each second component and each second square component in the second vector respectively based on the first target public key to generate a second encrypted vector.
To generate the second encrypted vector, in this embodiment of the present application, the second vector may be homomorphically encrypted directly based on the first target public key, so as to obtain an encrypted second encrypted vector. In order to ensure that the distance between the encrypted first vector and the encrypted second vector can be further determined based on the first encrypted vector and the second encrypted vector without decrypting the second encrypted vector and the second encrypted vector, in this embodiment of the present application, a second square component corresponding to each second component in the second vector may be first determined.
For example, if the second vector is (1, 2, 4, 5, 3), the second square component corresponding to the second component of 1 in the second vector is 1, the second square component corresponding to the second component of 2 in the second vector is 4, the second square component corresponding to the second component of 4 in the second vector is 16, the second square component corresponding to the second component of 5 in the second vector is 25, and the second square component corresponding to the second component of 3 in the second vector is 9.
In this embodiment, after determining the second square component corresponding to each second component in the second vector, the second square component corresponding to each second component may be inserted into the second vector according to a preset rule, and updating the vector obtained after the second square component is inserted into the second vector, specifically, for the second square component corresponding to each second component, the second square component corresponding to the second component may be inserted anywhere in the second vector, e.g., the second square component corresponding to the second component is inserted before the second component in the second vector, or a second square component corresponding to the second component is inserted into the second vector after the second component, or the second squared component is inserted sequentially after the second component as long as it is guaranteed that the first device and the second device can identify the second component and the second squared component in each vector.
For example, if the second vector is (1, 2, 4, 5, 3), after the second square component of each second component in the second vector is determined, the second square component is inserted into the second vector, and the obtained updated second vector is (1, 9, 2, 16, 4, 4, 5, 25, 1, 3).
In order to facilitate the subsequent determination of the distance between the encrypted first vector and the encrypted second vector based on the second vector into which the second square component is inserted, in this embodiment of the present application, after determining the second square component corresponding to the second component, for each second component in the second vector, the second square component corresponding to each second component may also be inserted into the second vector according to a preset insertion rule, and the vector obtained after inserting the second square component is updated to the second vector, specifically, the second square component corresponding to the second component may be inserted into a position in the second vector after and adjacent to the second component.
For example, if the second vector is (1, 2, 4, 5, 3), after the second square component of each second component in the second vector is determined, the second square component is inserted into the second vector, and the obtained updated second vector is (1, 1, 2, 4, 4, 16, 5, 25, 3, 9).
After determining the updated second vector, in order to determine the second encrypted vector, in this embodiment of the application, each second component and each second square component in the second vector may be homomorphically encrypted based on the first target public key, so as to generate a second encrypted vector.
For example, if the updated second vector is (2, 4, 3, 9), the second encrypted vector is (E) pka (2),E pka (4),E pka (3),E pka (9) Therein), whereinThe E of pka (2) Characterizing the result of homomorphically encrypting a second component of 2 in the second vector based on the target public key, E pka (4) Characterizing the result of homomorphically encrypting a second component of 4 in the second vector based on the target public key, E pka (3) Characterizing the result of homomorphically encrypting a second component of 3 in the second vector based on the target public key, E pka (9) The characterization is based on the result of homomorphic encryption of the second component of 9 in the second vector by the target public key.
Example 18:
in order to obtain a target distance between a first vector and a second vector determined based on a first encrypted vector and a second encrypted vector, on the basis of the foregoing embodiments, the determining a distance between the first vector and the second vector after encryption according to the first encrypted vector and the second encrypted vector includes:
acquiring each group of second encryption components and second encryption square components according to the preset insertion rule and each second component in the second encryption vector; acquiring each group of first encryption components and first encryption square components according to the preset insertion rule and each first component in the first encryption vector;
determining each corresponding group of a first encryption component, a first encryption square component, a second encryption component and a second encryption square component according to the preset insertion rule;
determining each encrypted sub-distance according to each group of the first encrypted component, the first encrypted square component, the second encrypted component and the second encrypted square component;
and determining the distance between the encrypted first vector and the encrypted second vector according to the sum of each sub-distance.
In this embodiment of the application, in order to determine a distance between the encrypted first vector and the encrypted second vector, the second device may obtain, for a preset insertion rule and each second component in the second encrypted vector, each set of the second encrypted component and the second encrypted square component, where one set of the second encrypted component and the second encrypted square component is obtained by homomorphically encrypting the same component and the square component of the component. That is, in each group of the second encrypted component and the second encrypted square component, the second encrypted square component is obtained by homomorphically encrypting the second square component corresponding to the second encrypted component before encryption.
Specifically, if the predetermined insertion rule is to insert a second square component corresponding to the second component into the second vector at a position behind and adjacent to the second component. When determining each group of the second encrypted components and the second encrypted components, directly starting from the first component of the second vector, determining the component of the packet to be determined and the component of the packet to be determined next to the component of the packet to be determined in the second vector as a group, and sequentially dividing until determining the second encrypted components and the second encrypted square components corresponding to all the packets.
For example, the second encryption vector is (E) pka (2),E pka (4),E pka (3),E pka (9) Two sets of a second encrypted component and a second encrypted squared component may be obtained, where the second set is E) pka (2)、E pka (4) The second group is E pka (3)、E pka (9). Wherein E in the second group pka (2) For the second encrypted component in the second group, E pka (4) For the second encrypted square component in the second group, E in the second group pka (3) For the second encrypted component in the second group, E pka (9) Is the second encrypted square component in the second set.
In this embodiment of the application, after obtaining the first encryption vector sent by the first device, the second device may also obtain each group of first encryption components and first encryption square components according to a preset insertion rule and each first component in the first encryption vector, where the preset insertion rule corresponding to obtaining the second vector is the same as the preset insertion rule corresponding to obtaining the first vector, and a process of obtaining each group of second encryption components and second encryption square components is the same as a process of obtaining each group of first encryption components and first encryption square components, which is not described herein again.
In order to determine the distance between the encrypted first vector and the encrypted second vector, in this embodiment of the present application, each encrypted sub-distance may be determined according to each group of the first encrypted component, the first encrypted square component, the second encrypted component, and the second encrypted square component, specifically, for each group, a target sum of the first encrypted square vector and the second encrypted square component in the group and a target product of the first encrypted component, the second encrypted component, and a preset value in the group may be determined, where in this embodiment of the present application, the preset value is 2, a target difference of the target sum and the target product is determined, and the target difference is determined as the encrypted sub-distance determined by the group.
After determining the encrypted sub-distances for each group, the distances of the encrypted first vector and the encrypted second vector may be determined based on the sum of each sub-distance.
Example 19:
in order to obtain a target distance between a first vector and a second vector determined based on a first encrypted vector and a second encrypted vector, on the basis of the foregoing embodiments, the determining whether the first data and the second data match based on the target distance and a preset first distance threshold includes:
determining whether the target distance is smaller than a preset first distance threshold value;
if yes, determining that the first data is matched with the second data;
otherwise, it is determined that the first data does not match the second data.
In order to determine whether the first data and the second data are matched, in this embodiment of the application, the target distance and a preset distance threshold are compared, if the target distance is smaller than the preset first distance threshold, it is determined that the first data and the second data are matched, and if the target distance is not smaller than the preset first distance threshold, it is determined that the first data and the second data are not matched. The preset first distance threshold may be 1, may be 1.5, and the like, and specifically, the preset first distance threshold may be set according to a requirement. Wherein the smaller the target distance, the more matched the first vector and the second vector.
In order to obtain the target distance of the first vector and the second vector determined based on the first encrypted vector and the second encrypted vector, on the basis of the foregoing embodiments, after determining that the first data matches the second data, the method further includes:
and determining whether the target distance is equal to a preset second distance threshold, and if so, determining that the first data is the same as the second data.
In this embodiment, if the target distance is equal to a preset second distance threshold, it indicates that the first data is the same as the second data, that is, the first data and the second data are completely matched, where the preset second distance threshold is smaller than the preset first distance threshold, and the preset second distance threshold is equal to 0.
Example 20:
fig. 8 is a schematic structural diagram of a data matching apparatus according to some embodiments of the present application, where the apparatus includes:
a first obtaining module 801, configured to input first data to be matched into a vector transformation model that is trained in advance, and obtain a first vector corresponding to the first data;
the first processing module 802 is configured to homomorphically encrypt the first vector by using a first target public key generated by the first processing module, so as to generate a first encrypted vector, and send the first target public key to the second device;
the first obtaining module 801 is further configured to obtain a distance between the first vector and a second vector after encryption, where the distance is determined based on the first encrypted vector and the second encrypted vector, and the second encrypted vector is obtained by homomorphically encrypting the second vector by using the first target public key; the second vector is obtained by inputting second data into a vector conversion model which is trained in advance in the second equipment;
a first determining module 803, configured to determine a target distance between the first vector and the second vector based on the encrypted distance between the first vector and the second vector and a first target private key corresponding to the first target public key, and determine whether the first data and the second data match based on the target distance and a preset first distance threshold.
In a possible implementation manner, the first obtaining module 801 is specifically configured to determine a first target data type corresponding to the first data; determining a pre-trained first target vector conversion model corresponding to the first data according to the first target data type and the corresponding relation between the pre-stored data type and the pre-trained vector conversion model; and inputting the first data into the first target vector conversion model which is trained in advance to obtain the first vector corresponding to the first data.
In a possible implementation manner, the first processing module 802 is specifically configured to send the first encrypted vector and the first target public key to the second device;
the first obtaining module 801 is specifically configured to receive a distance between the first vector and the second vector, which is sent by the second device and determined based on the first encrypted vector and the second encrypted vector, where the second encrypted vector is obtained by the second device homomorphically encrypting the second vector based on the first target public key.
In a possible implementation manner, the first processing module 802 is further configured to receive a third encrypted vector sent by the second device and a second target public key generated by the second device; the third encryption vector is obtained after the second device adopts the second target public key to homomorphically encrypt the second vector; homomorphically encrypting the first vector based on the second target public key to generate a fourth encrypted vector; determining the distance between the encrypted second vector and the encrypted first vector based on the third encrypted vector and the fourth encrypted vector, and sending the distance between the encrypted second vector and the encrypted first vector to the second device, so that the second device decrypts the distance between the encrypted second vector and the encrypted first vector according to the distance between the encrypted second vector and the encrypted first vector and a second target private key corresponding to the second target public key, determines the target distance between the encrypted second vector and the encrypted first vector, and determines whether the first data and the second data are matched according to the target distance between the encrypted second vector and the encrypted first vector and a preset first distance threshold.
In a possible implementation manner, the first processing module 801 is further configured to send a target distance between the first vector and the second vector to the second device, so that the second device determines whether the first data and the second data match based on the target distance and a preset first distance threshold.
In a possible implementation manner, the first obtaining module 801 is specifically configured to decrypt the encrypted distance between the first vector and the second vector by using a first target private key corresponding to the first target public key generated by the first device itself, and determine the target distance between the first vector and the second vector.
In a possible implementation manner, the first obtaining module 801 is specifically configured to, for each first sub-data in the first data, input the first sub-data into a vector transformation model that is trained in advance, and obtain a first sub-vector corresponding to the first sub-data; the length of a first sub-vector corresponding to each first subdata is a first preset length; and splicing the first sub-vectors corresponding to each first sub-data to obtain the first vector corresponding to the first data.
In a possible implementation manner, the first processing module 802 is specifically configured to determine, for each first component in the first vector, a first square component corresponding to the first component; inserting a first square component corresponding to each first component into the first vector according to a preset insertion rule, and updating the vector obtained after the first square component is inserted into the first vector; and performing homomorphic encryption on each first component and each first square component in the first vector respectively based on the first target public key to generate the first encrypted vector.
In a possible implementation manner, the first obtaining module 801 is specifically configured to obtain each group of first encrypted components and first encrypted square components according to the preset insertion rule and each first component in the first encrypted vector; acquiring each group of second encryption components and second encryption square components according to the preset insertion rule and each second component in the second encryption vector; determining each corresponding group of a first encryption component, a first encryption square component, a second encryption component and a second encryption square component according to the preset insertion rule; determining each encrypted sub-distance according to each group of the first encrypted component, the first encrypted square component, the second encrypted component and the second encrypted square component; and determining the distance between the encrypted first vector and the encrypted second vector according to the sum of each sub-distance.
In a possible implementation manner, the first determining module 803 is specifically configured to determine whether the target distance is smaller than a preset first distance threshold; if yes, determining that the first data is matched with the second data; otherwise, it is determined that the first data does not match the second data.
In a possible implementation manner, the first determining module 803 is further configured to determine whether the target distance is equal to a preset second distance threshold, and if so, determine that the first data is the same as the second data.
Example 21:
fig. 9 is a schematic structural diagram of a data matching apparatus according to some embodiments of the present application, where the apparatus includes:
a second obtaining module 901, configured to input second data to be matched into a vector transformation model that is trained in advance, and obtain a second vector corresponding to the second data;
a second processing module 902, configured to receive a first target public key sent by a first device, and perform homomorphic encryption on the second vector by using the first target public key to generate a second encrypted vector;
the second obtaining module 901 is further configured to obtain a target distance between a first vector and a second vector determined based on a first encrypted vector and the second encrypted vector, where the first encrypted vector is obtained by encrypting the first vector with the first target public key, and the first vector is obtained by inputting first data into a vector transformation model in the first device after being trained in advance;
a second determining module 903, configured to determine whether the first data and the second data match based on the target distance and a preset first distance threshold.
In a possible implementation manner, the second obtaining module 901 is specifically configured to determine a second target data type corresponding to the second data; determining a pre-trained second target vector conversion model corresponding to the second data according to the second target data type and the corresponding relation between the pre-stored data type and the pre-trained vector conversion model; and inputting the second data into the pre-trained second target vector conversion model to obtain the second vector corresponding to the second data.
In a possible implementation manner, the second processing module 902 is specifically configured to receive the first target public key and the first encryption vector sent by the first device, where the first encryption vector is obtained by homomorphically encrypting the first vector by using the first target public key;
the second obtaining module 901 is specifically configured to determine, according to the first encrypted vector and the second encrypted vector, a distance between the encrypted first vector and the encrypted second vector; sending the encrypted distance between the first vector and the second vector to the first device; and receiving the target distance sent by the first device, wherein the target distance is obtained by the first device decrypting the distance between the encrypted first vector and the encrypted second vector by using a first target private key corresponding to the first target public key generated by the first device.
In a possible implementation manner, the second obtaining module 901 is specifically configured to send the second encrypted vector to the first device, so that the first device determines, based on the second encrypted vector and the first encrypted vector, a distance between the first encrypted vector and the second encrypted vector after encryption; receiving a target distance between the first vector and the second vector sent by the first device, wherein the target distance is obtained by decrypting the distance between the encrypted first encrypted vector and the encrypted second encrypted vector by the first device based on a first target private key corresponding to the first target public key generated by the first device.
In a possible implementation manner, the second obtaining module 901 is specifically configured to perform homomorphic encryption on the second vector by using a second target public key generated by the second obtaining module to obtain a third encrypted vector, and send the second target public key and the third encrypted vector to the first device; receiving a third encryption vector and a fourth encryption vector sent by the first device, and determining a distance between the encrypted second vector and the encrypted first vector, wherein the fourth encryption vector is generated by homomorphically encrypting the first vector by using the second target public key; determining the target distance of the first vector and the second vector based on the encrypted distance between the first vector and the second vector and a second target private key corresponding to the second target public key, and determining whether the first data and the second data are matched based on the target distance and a preset first distance threshold.
In a possible implementation manner, the second obtaining module 901 is specifically configured to, for each second sub-data in the second data, input the second sub-data into a vector transformation model that is trained in advance, and obtain a second sub-vector corresponding to the second sub-data; the length of a second sub-vector corresponding to each second sub-data is a first preset length; and splicing the second sub-vectors corresponding to each second sub-data to obtain the second vectors corresponding to the second data.
In a possible implementation manner, the second processing module 902 is specifically configured to determine, for each second component in the second vector, a second square component corresponding to the second component; inserting a second square component corresponding to each second component into the second vectors according to a preset insertion rule, and updating the vectors obtained after the second square components are inserted into the second vectors; and performing homomorphic encryption on each second component and each second square component in the second vector respectively based on the first target public key to generate a second encrypted vector.
In a possible implementation manner, the second obtaining module 901 is specifically configured to obtain each group of second encrypted components and second encrypted square components according to the preset insertion rule and each second component in the second encrypted vector; acquiring each group of first encryption components and first encryption square components according to the preset insertion rule and each first component in the first encryption vector; determining each corresponding group of a first encryption component, a first encryption square component, a second encryption component and a second encryption square component according to the preset insertion rule; determining each encrypted sub-distance according to each group of the first encrypted component, the first encrypted square component, the second encrypted component and the second encrypted square component; and determining the distance between the encrypted first vector and the encrypted second vector according to the sum of each sub-distance.
In a possible implementation manner, the second determining module 903 is specifically configured to determine whether the target distance is smaller than a preset first distance threshold; if yes, determining that the first data is matched with the second data; otherwise, it is determined that the first data does not match the second data.
In a possible implementation manner, the second determining module 903 is further configured to determine whether the target distance is equal to a preset second distance threshold, and if so, determine that the first data is the same as the second data.
Example 22:
on the basis of the foregoing embodiments, some embodiments of the present application further provide an electronic device, as shown in fig. 10, including: the system comprises a processor 1001, a communication interface 1002, a memory 1003 and a communication bus 1004, wherein the processor 1001, the communication interface 1002 and the memory 1003 are communicated with each other through the communication bus 1004.
The memory 1003 has stored therein a computer program which, when executed by the processor 1001, causes the processor 1001 to perform the steps of:
inputting first data to be matched into a vector conversion model which is trained in advance to obtain a first vector corresponding to the first data;
homomorphically encrypting the first vector by using a first target public key generated by the device to generate a first encrypted vector, and sending the first target public key to second equipment;
obtaining the distance between the first vector and the second vector after encryption determined based on the first encryption vector and the second encryption vector, wherein the second encryption vector is obtained by homomorphically encrypting the second vector by adopting the first target public key; the second vector is obtained by inputting second data into a vector conversion model which is trained in advance in the second equipment;
determining the target distance of the first vector and the second vector based on the distance between the first vector and the second vector after encryption and a first target private key corresponding to the first target public key, and determining whether the first data and the second data are matched based on the target distance and a preset first distance threshold.
Further, the processor 1001 is further configured to determine a first target data type corresponding to the first data; determining a pre-trained first target vector conversion model corresponding to the first data according to the first target data type and the corresponding relation between the pre-stored data type and the pre-trained vector conversion model; and inputting the first data into the first target vector conversion model which is trained in advance to obtain the first vector corresponding to the first data.
Further, the processor 1001 is further configured to receive the second encrypted vector sent by the second device, where the second encrypted vector is obtained by the second device through homomorphic encryption on the second vector based on the first target public key; determining a distance between the encrypted first vector and the second vector based on the first encrypted vector and the second encrypted vector.
Further, the processor 1001 is further configured to send the first encrypted vector and the first target public key to the second device; and receiving the distance between the first vector and the second vector which are sent by the second device and determined based on the first encryption vector and the second encryption vector after encryption, wherein the second encryption vector is obtained by the second device after homomorphic encryption of the second vector based on the first target public key.
Further, the processor 1001 is further configured to receive a third encryption vector sent by the second device and a second target public key generated by the second device; the third encryption vector is obtained after the second device adopts the second target public key to homomorphically encrypt the second vector; homomorphically encrypting the first vector based on the second target public key to generate a fourth encrypted vector; determining the distance between the encrypted second vector and the encrypted first vector based on the third encrypted vector and the fourth encrypted vector, and sending the distance between the encrypted second vector and the encrypted first vector to the second device, so that the second device decrypts the distance between the encrypted second vector and the encrypted first vector according to the distance between the encrypted second vector and the encrypted first vector and a second target private key corresponding to the second target public key, determines the target distance between the encrypted second vector and the encrypted first vector, and determines whether the first data and the second data are matched according to the target distance between the encrypted second vector and the encrypted first vector and a preset first distance threshold.
Further, the processor 1001 is further configured to send a target distance between the first vector and the second vector to the second device, so that the second device determines whether the first data and the second data match based on the target distance and a preset first distance threshold.
Further, the processor 1001 is further configured to decrypt the encrypted distance between the first vector and the second vector by using a first target private key corresponding to the first target public key generated by the first device itself, and determine a target distance between the first vector and the second vector.
Further, the processor 1001 is further configured to, for each first subdata in the first data, input the first subdata into a vector transformation model that is trained in advance, and obtain a first subvector corresponding to the first subdata; the length of the first sub-vector corresponding to each first sub-data is a first preset length; and splicing the first sub-vectors corresponding to each first sub-data to obtain the first vector corresponding to the first data.
Further, the processor 1001 is further configured to determine, for each first component in the first vector, a first square component corresponding to the first component; inserting a first square component corresponding to each first component into the first vector according to a preset insertion rule, and updating a vector obtained after the first square component is inserted into the first vector; and performing homomorphic encryption on each first component and each first square component in the first vector respectively based on the first target public key to generate the first encrypted vector.
Further, the processor 1001 is further configured to obtain each group of first encrypted components and first encrypted square components according to the preset insertion rule and each first component in the first encrypted vector; acquiring each group of second encryption components and second encryption square components according to the preset insertion rule and each second component in the second encryption vector; determining each corresponding group of a first encryption component, a first encryption square component, a second encryption component and a second encryption square component according to the preset insertion rule; determining each encrypted sub-distance according to each group of the first encrypted component, the first encrypted square component, the second encrypted component and the second encrypted square component; and determining the distance between the encrypted first vector and the encrypted second vector according to the sum of each sub-distance.
Further, the processor 1001 is further configured to determine whether the target distance is smaller than a preset first distance threshold; if yes, determining that the first data is matched with the second data; otherwise, it is determined that the first data does not match the second data.
Further, the processor 1001 is further configured to determine whether the target distance is equal to a preset second distance threshold, and if so, determine that the first data is the same as the second data.
The communication bus mentioned in the above server may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface 1002 is used for communication between the electronic apparatus and other apparatuses.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Alternatively, the memory may be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a central processing unit, a Network Processor (NP), and the like; but may also be a Digital instruction processor (DSP), an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.
Example 23:
on the basis of the foregoing embodiments, some embodiments of the present application further provide an electronic device, as shown in fig. 11, including: the device comprises a processor 1101, a communication interface 1102, a memory 1103 and a communication bus 1104, wherein the processor 1101, the communication interface 1102 and the memory 1103 are communicated with each other through the communication bus 1104.
The memory 1103 has stored therein a computer program that, when executed by the processor 1101, causes the processor 1101 to perform the steps of:
inputting second data to be matched into a vector conversion model which is trained in advance to obtain a second vector corresponding to the second data;
receiving a first target public key sent by first equipment, and homomorphically encrypting the second vector by adopting the first target public key to generate a second encrypted vector;
acquiring a target distance between a first vector and a second vector determined based on a first encryption vector and the second encryption vector, wherein the first encryption vector is obtained by encrypting the first vector by using a first target public key, and the first vector is obtained by inputting first data into a vector conversion model which is trained in advance in the first device;
and determining whether the first data and the second data are matched or not based on the target distance and a preset first distance threshold.
Further, the processor 1101 is further configured to determine a second target data type corresponding to the second data; determining a pre-trained second target vector conversion model corresponding to the second data according to the second target data type and the corresponding relation between the pre-stored data type and the pre-trained vector conversion model; and inputting the second data into the pre-trained second target vector conversion model to obtain the second vector corresponding to the second data.
Further, the processor 1101 is further configured to receive the first target public key and the first encrypted vector that are sent by the first device, where the first encrypted vector is obtained by homomorphically encrypting the first vector by using the first target public key; determining the distance between the encrypted first vector and the encrypted second vector according to the first encrypted vector and the second encrypted vector; sending the encrypted distance between the first vector and the second vector to the first device; and receiving the target distance sent by the first device, wherein the target distance is obtained by the first device decrypting the distance between the encrypted first vector and the encrypted second vector by using a first target private key corresponding to the first target public key generated by the first device.
Further, the processor 1101 is further configured to send the second encrypted vector to the first device, so that the first device determines a distance between the first encrypted vector and the second encrypted vector after encryption based on the second encrypted vector and the first encrypted vector; receiving a target distance between the first vector and the second vector sent by the first device, wherein the target distance is obtained by decrypting the distance between the encrypted first encrypted vector and the encrypted second encrypted vector by the first device based on a first target private key corresponding to the first target public key generated by the first device.
Further, the processor 1101 is further configured to perform homomorphic encryption on the second vector by using a second target public key generated by the processor 1101, obtain a third encrypted vector, and send the second target public key and the third encrypted vector to the first device; receiving a third encryption vector and a fourth encryption vector sent by the first device, and determining a distance between the encrypted second vector and the encrypted first vector, wherein the fourth encryption vector is generated by homomorphically encrypting the first vector by using the second target public key; determining the target distance of the first vector and the second vector based on the encrypted distance between the first vector and the second vector and a second target private key corresponding to the second target public key, and determining whether the first data and the second data are matched based on the target distance and a preset first distance threshold.
Further, the processor 1101 is further configured to, for each second sub-data in the second data, input the second sub-data into a vector transformation model that is trained in advance, and obtain a second sub-vector corresponding to the second sub-data; the length of a second sub-vector corresponding to each second sub-data is a first preset length; and splicing the second sub-vectors corresponding to each second sub-data to obtain the second vectors corresponding to the second data.
Further, the processor 1101 is further configured to determine, for each second component in the second vector, a second square component corresponding to the second component; inserting a second square component corresponding to each second component into the second vectors according to a preset insertion rule, and updating the vectors obtained after the second square components are inserted into the second vectors; and performing homomorphic encryption on each second component and each second square component in the second vector respectively based on the first target public key to generate a second encrypted vector.
Further, the processor 1101 is further configured to obtain each group of second encrypted components and second encrypted square components according to the preset insertion rule and each second component in the second encrypted vector; acquiring each group of first encryption components and first encryption square components according to the preset insertion rule and each first component in the first encryption vector; determining each corresponding group of a first encryption component, a first encryption square component, a second encryption component and a second encryption square component according to the preset insertion rule; determining each encrypted sub-distance according to each group of the first encrypted component, the first encrypted square component, the second encrypted component and the second encrypted square component; and determining the distance between the encrypted first vector and the encrypted second vector according to the sum of each sub-distance.
Further, the processor 1101 is further configured to determine whether the target distance is smaller than a preset first distance threshold; if yes, determining that the first data is matched with the second data; otherwise, it is determined that the first data does not match the second data.
Further, the processor 1101 is further configured to determine whether the target distance is equal to a preset second distance threshold, and if so, determine that the first data is the same as the second data.
The communication bus mentioned in the above server may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this is not intended to represent only one bus or type of bus.
The communication interface 1102 is used for communication between the electronic apparatus and other apparatuses.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Alternatively, the memory may be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a central processing unit, a Network Processor (NP), and the like; but may also be a Digital instruction processor (DSP), an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.
Example 24:
on the basis of the foregoing embodiments, an embodiment of the present application further provides a computer-readable storage medium, in which a computer program executable by an electronic device is stored, and when the program is run on the electronic device, the electronic device is caused to perform the following steps:
inputting first data to be matched into a vector conversion model which is trained in advance to obtain a first vector corresponding to the first data;
homomorphically encrypting the first vector by using a first target public key generated by the device to generate a first encrypted vector, and sending the first target public key to second equipment;
obtaining the distance between the first vector and the second vector which are determined based on the first encrypted vector and the second encrypted vector after encryption, wherein the second encrypted vector is obtained by adopting the first target public key to carry out homomorphic encryption on the second vector; the second vector is obtained by inputting second data into a vector conversion model which is trained in advance in the second equipment;
determining the target distance of the first vector and the second vector based on the distance between the first vector and the second vector after encryption and a first target private key corresponding to the first target public key, and determining whether the first data and the second data are matched based on the target distance and a preset first distance threshold.
Further, the inputting the first data to be matched into a vector conversion model which is trained in advance, and obtaining the first vector corresponding to the first data includes:
determining a first target data type corresponding to the first data;
determining a pre-trained first target vector conversion model corresponding to the first data according to the first target data type and the corresponding relation between the pre-stored data type and the pre-trained vector conversion model;
and inputting the first data into the first target vector conversion model which is trained in advance to obtain the first vector corresponding to the first data.
Further, the first target data type is a text type or a number type.
Further, if the first target data type is a text type, the corresponding pre-trained first target vector conversion model is a word vector model or a sentence vector model; and if the first target data type is a digital type, converting the corresponding pre-trained first target vector into a one-hot coding model.
Further, the obtaining the distance between the first and second encrypted vectors determined based on the first and second encrypted vectors comprises:
receiving the second encrypted vector sent by the second device, wherein the second encrypted vector is obtained by the second device through homomorphic encryption on the second vector based on the first target public key; determining a distance between the encrypted first vector and the second vector based on the first encrypted vector and the second encrypted vector.
Further, the sending the first target public key to the second device includes:
sending the first encrypted vector and the first target public key to the second device;
the obtaining the distance between the first and second encrypted vectors determined based on the first and second encrypted vectors comprises:
and receiving the distance between the first vector and the second vector which are sent by the second device and determined based on the first encryption vector and the second encryption vector after encryption, wherein the second encryption vector is obtained by the second device after homomorphic encryption of the second vector based on the first target public key.
Further, the method further comprises:
receiving a third encryption vector sent by the second device and a second target public key generated by the second device; the third encryption vector is obtained after the second device adopts the second target public key to homomorphically encrypt the second vector;
homomorphically encrypting the first vector based on the second target public key to generate a fourth encrypted vector;
determining the distance between the encrypted second vector and the encrypted first vector based on the third encrypted vector and the fourth encrypted vector, and sending the distance between the encrypted second vector and the encrypted first vector to the second device, so that the second device decrypts the distance between the encrypted second vector and the encrypted first vector according to the distance between the encrypted second vector and the encrypted first vector and a second target private key corresponding to the second target public key, determines the target distance between the encrypted second vector and the encrypted first vector, and determines whether the first data and the second data are matched according to the target distance between the encrypted second vector and the encrypted first vector and a preset first distance threshold.
Further, after determining the target distance of the first vector and the second vector, the method further comprises:
and sending the target distance of the first vector and the second vector to the second equipment, so that the second equipment determines whether the first data and the second data are matched based on the target distance and a preset first distance threshold.
Further, the determining the target distance of the first vector and the second vector based on the encrypted distance of the first vector and the second vector and the first target private key corresponding to the first target public key includes:
and decrypting the distance between the encrypted first vector and the second vector by adopting a first target private key corresponding to the first target public key generated by the first equipment, and determining the target distance between the first vector and the second vector.
Further, the inputting the first data to be matched into a vector conversion model which is trained in advance, and obtaining the first vector corresponding to the first data includes:
for each first subdata in the first data, inputting the first subdata into a vector conversion model which is trained in advance to obtain a first subvector corresponding to the first subdata; the length of the first sub-vector corresponding to each first sub-data is a first preset length;
and splicing the first sub-vectors corresponding to each first sub-data to obtain the first vector corresponding to the first data.
Further, the lengths of the first vector and the second vector are both a second preset length.
Further, the homomorphic encrypting the first vector by using the self-generated first target public key to generate a first encrypted vector includes:
for each first component in the first vector, determining a first square component corresponding to the first component;
inserting a first square component corresponding to each first component into the first vector according to a preset insertion rule, and updating a vector obtained after the first square component is inserted into the first vector;
and performing homomorphic encryption on each first component and each first square component in the first vector respectively based on the first target public key to generate the first encrypted vector.
Further, the obtaining the distance between the first and second encrypted vectors determined based on the first and second encrypted vectors comprises:
acquiring each group of first encryption components and first encryption square components according to the preset insertion rule and each first component in the first encryption vector; acquiring each group of second encryption components and second encryption square components according to the preset insertion rule and each second component in the second encryption vector;
determining each corresponding group of a first encryption component, a first encryption square component, a second encryption component and a second encryption square component according to the preset insertion rule;
determining each encrypted sub-distance according to each group of the first encrypted component, the first encrypted square component, the second encrypted component and the second encrypted square component;
and determining the distance between the encrypted first vector and the encrypted second vector according to the sum of each sub-distance.
Further, the determining whether the first data and the second data match based on the target distance and a preset first distance threshold comprises:
determining whether the target distance is smaller than a preset first distance threshold value;
if yes, determining that the first data is matched with the second data;
otherwise, it is determined that the first data does not match the second data.
Further, after determining that the first data matches the second data, the method further comprises:
and determining whether the target distance is equal to a preset second distance threshold, and if so, determining that the first data is the same as the second data.
Example 25:
on the basis of the foregoing embodiments, an embodiment of the present application further provides a computer-readable storage medium, in which a computer program executable by an electronic device is stored, and when the program is run on the electronic device, the electronic device is caused to perform the following steps:
inputting second data to be matched into a vector conversion model which is trained in advance to obtain a second vector corresponding to the second data;
receiving a first target public key sent by first equipment, and homomorphically encrypting the second vector by adopting the first target public key to generate a second encrypted vector;
acquiring a target distance between a first vector and a second vector determined based on a first encryption vector and the second encryption vector, wherein the first encryption vector is obtained by encrypting the first vector by using a first target public key, and the first vector is obtained by inputting first data into a vector conversion model which is trained in advance in the first device;
and determining whether the first data and the second data are matched or not based on the target distance and a preset first distance threshold.
Further, the inputting the second data to be matched into the vector transformation model which is trained in advance, and the obtaining of the second vector corresponding to the second data includes:
determining a second target data type corresponding to the second data;
determining a pre-trained second target vector conversion model corresponding to the second data according to the second target data type and the corresponding relation between the pre-stored data type and the pre-trained vector conversion model;
and inputting the second data into the pre-trained second target vector conversion model to obtain the second vector corresponding to the second data.
Further, the second target data type is a text type or a number type.
Further, if the second target data type is a text type, the corresponding pre-trained second target vector conversion model is a word vector model or a sentence vector model; and if the second target data type is a digital type, converting the corresponding pre-trained second target vector into a one-hot coding model.
Further, the receiving the first target public key sent by the first device includes:
receiving the first target public key and the first encryption vector sent by the first device, wherein the first encryption vector is obtained by homomorphically encrypting the first vector by using the first target public key;
the obtaining a target distance of a first vector and a second vector determined based on the first encrypted vector and the second encrypted vector comprises:
determining the distance between the encrypted first vector and the second vector according to the first encrypted vector and the second encrypted vector;
sending the encrypted distance between the first vector and the second vector to the first device;
and receiving the target distance sent by the first device, wherein the target distance is obtained by the first device decrypting the distance between the encrypted first vector and the encrypted second vector by using a first target private key corresponding to the first target public key generated by the first device.
Further, the obtaining the target distance of the first vector and the second vector determined based on the first encrypted vector and the second encrypted vector comprises:
sending the second encrypted vector to the first device to cause the first device to determine a distance between the encrypted first encrypted vector and the second encrypted vector based on the second encrypted vector and the first encrypted vector;
receiving a target distance between the first vector and the second vector sent by the first device, wherein the target distance is obtained by decrypting the distance between the encrypted first encrypted vector and the encrypted second encrypted vector by the first device based on a first target private key corresponding to the first target public key generated by the first device.
Further, the obtaining the target distance of the first vector and the second vector determined based on the first encrypted vector and the second encrypted vector comprises:
homomorphically encrypting the second vector by using a second target public key generated by the second device to obtain a third encrypted vector, and sending the second target public key and the third encrypted vector to the first device;
receiving a third encryption vector and a fourth encryption vector sent by the first device, and determining a distance between the encrypted second vector and the encrypted first vector, wherein the fourth encryption vector is generated by homomorphically encrypting the first vector by using the second target public key;
determining the target distance of the first vector and the second vector based on the encrypted distance between the first vector and the second vector and a second target private key corresponding to the second target public key, and determining whether the first data and the second data are matched based on the target distance and a preset first distance threshold.
Further, the inputting the second data to be matched into the vector transformation model which is trained in advance, and the obtaining of the second vector corresponding to the second data includes:
for each second subdata in the second data, inputting the second subdata into a vector conversion model which is trained in advance to obtain a second subvector corresponding to the second subdata; the length of a second sub-vector corresponding to each second sub-data is a first preset length;
and splicing the second sub-vectors corresponding to each second sub-data to obtain the second vectors corresponding to the second data.
Further, the lengths of the first vector and the second vector are both a second preset length.
Further, the homomorphic encrypting the second vector with the first target public key to generate a second encrypted vector includes:
for each second component in the second vectors, determining a second square component corresponding to the second component;
inserting a second square component corresponding to each second component into the second vectors according to a preset insertion rule, and updating the vectors obtained after the second square components are inserted into the second vectors;
and performing homomorphic encryption on each second component and each second square component in the second vector respectively based on the first target public key to generate a second encrypted vector.
Further, the determining the distance between the encrypted first vector and the encrypted second vector according to the first encrypted vector and the second encrypted vector comprises:
acquiring each group of second encryption components and second encryption square components according to the preset insertion rule and each second component in the second encryption vector; acquiring each group of first encryption components and first encryption square components according to the preset insertion rule and each first component in the first encryption vector;
determining each corresponding group of a first encryption component, a first encryption square component, a second encryption component and a second encryption square component according to the preset insertion rule;
determining each encrypted sub-distance according to each group of the first encrypted component, the first encrypted square component, the second encrypted component and the second encrypted square component;
and determining the distance between the encrypted first vector and the encrypted second vector according to the sum of each sub-distance.
Further, the determining whether the first data and the second data match based on the target distance and a preset first distance threshold comprises:
determining whether the target distance is smaller than a preset first distance threshold value;
if yes, determining that the first data is matched with the second data;
otherwise, it is determined that the first data does not match the second data.
Further, after determining that the first data matches the second data, the method further comprises:
and determining whether the target distance is equal to a preset second distance threshold, and if so, determining that the first data is the same as the second data.
In the embodiment of the application, first data and second data to be matched are respectively input into a vector conversion model which is trained in advance, a first vector corresponding to the first data and a second vector corresponding to the second data are obtained, a first encryption vector after the first vector is encrypted and a second encryption vector after the second vector is encrypted are obtained based on a first target public key, the distance between the encrypted first vector and the encrypted second vector is determined, the target distance between the first vector and the second vector is determined based on the distance between the encrypted first vector and the encrypted second vector and a first target private key generated by the first target private key, whether the first data and the second data are matched is determined based on the target distance and a preset first distance threshold, namely when the first data and the second data are not identical, fuzzy matching of the first data and the second data can be realized, the use scenes are widened, the first target public key and the first target private key are introduced to perform homomorphic encryption and decryption respectively in the fuzzy matching process, the safe intersection is realized, the safety of the matching process is ensured, in the whole matching process, the first data and the second data do not leave the corresponding first equipment and second equipment in the form of original data, the fuzzy matching can be realized without the original data going out of a warehouse, and the safety of the matching process is further ensured.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (32)

1. A data matching method applied to a first device, the method comprising:
inputting first data to be matched into a vector conversion model which is trained in advance to obtain a first vector corresponding to the first data;
homomorphically encrypting the first vector by using a first target public key generated by the device to generate a first encrypted vector, and sending the first target public key to second equipment;
obtaining the distance between the first vector and the second vector after encryption determined based on the first encryption vector and the second encryption vector, wherein the second encryption vector is obtained by homomorphically encrypting the second vector by adopting the first target public key; the second vector is obtained by inputting second data into a vector conversion model which is trained in advance in the second equipment;
determining the target distance of the first vector and the second vector based on the distance between the first vector and the second vector after encryption and a first target private key corresponding to the first target public key, and determining whether the first data and the second data are matched based on the target distance and a preset first distance threshold.
2. The method according to claim 1, wherein the first data to be matched is input into a vector conversion model which is trained in advance, and obtaining a first vector corresponding to the first data comprises:
determining a first target data type corresponding to the first data;
determining a pre-trained first target vector conversion model corresponding to the first data according to the first target data type and the corresponding relation between the pre-stored data type and the pre-trained vector conversion model;
and inputting the first data into the first target vector conversion model which is trained in advance to obtain the first vector corresponding to the first data.
3. The method of claim 2, wherein the first target data type is a text type or a numeric type.
4. The method of claim 3, wherein if the first target data type is a text type, the corresponding pre-trained first target vector conversion model is a word vector model or a sentence vector model; and if the first target data type is a digital type, converting the corresponding pre-trained first target vector into a one-hot coding model.
5. The method of claim 1, wherein obtaining the distance between the first and second encrypted vectors determined based on the first and second encrypted vectors comprises:
receiving the second encrypted vector sent by the second device, wherein the second encrypted vector is obtained by the second device through homomorphic encryption on the second vector based on the first target public key;
determining a distance between the encrypted first vector and the second vector based on the first encrypted vector and the second encrypted vector.
6. The method of claim 1, wherein sending the first target public key to a second device comprises:
sending the first encrypted vector and the first target public key to the second device;
the obtaining the distance between the first and second encrypted vectors determined based on the first and second encrypted vectors comprises:
and receiving the distance between the first vector and the second vector which are sent by the second device and determined based on the first encryption vector and the second encryption vector after encryption, wherein the second encryption vector is obtained by the second device after homomorphic encryption of the second vector based on the first target public key.
7. The method of claim 6, further comprising:
receiving a third encryption vector sent by the second device and a second target public key generated by the second device; the third encryption vector is obtained after the second device adopts the second target public key to homomorphically encrypt the second vector;
homomorphically encrypting the first vector based on the second target public key to generate a fourth encrypted vector;
determining the distance between the encrypted second vector and the encrypted first vector based on the third encrypted vector and the fourth encrypted vector, and sending the distance between the encrypted second vector and the encrypted first vector to the second device, so that the second device decrypts the distance between the encrypted second vector and the encrypted first vector according to the distance between the encrypted second vector and the encrypted first vector and a second target private key corresponding to the second target public key, determines the target distance between the encrypted second vector and the encrypted first vector, and determines whether the first data and the second data are matched according to the target distance between the encrypted second vector and the encrypted first vector and a preset first distance threshold.
8. The method of claim 1, 5 or 6, wherein after determining the target distance of the first vector and the second vector, the method further comprises:
and sending the target distance of the first vector and the second vector to the second equipment, so that the second equipment determines whether the first data and the second data are matched based on the target distance and a preset first distance threshold.
9. The method of claim 1, 5 or 6, wherein the determining the target distance of the first vector and the second vector based on the encrypted distance of the first vector and the second vector and a first target private key corresponding to the first target public key comprises:
and decrypting the distance between the encrypted first vector and the second vector by adopting a first target private key corresponding to the first target public key generated by the first equipment, and determining the target distance between the first vector and the second vector.
10. The method according to claim 1, wherein the first data to be matched is input into a vector conversion model which is trained in advance, and obtaining a first vector corresponding to the first data comprises:
for each first subdata in the first data, inputting the first subdata into a vector conversion model which is trained in advance to obtain a first subvector corresponding to the first subdata; the length of the first sub-vector corresponding to each first sub-data is a first preset length;
and splicing the first sub-vectors corresponding to each first sub-data to obtain the first vector corresponding to the first data.
11. The method of claim 1 or 10, wherein the lengths of the first vector and the second vector are both a second predetermined length.
12. The method of claim 1, wherein homomorphically encrypting the first vector using the self-generated first target public key to generate a first encrypted vector comprises:
for each first component in the first vector, determining a first square component corresponding to the first component;
inserting a first square component corresponding to each first component into the first vector according to a preset insertion rule, and updating a vector obtained after the first square component is inserted into the first vector;
and performing homomorphic encryption on each first component and each first square component in the first vector respectively based on the first target public key to generate the first encrypted vector.
13. The method of claim 12, wherein obtaining the distance between the encrypted first and second vectors determined based on the first and second encrypted vectors comprises:
acquiring each group of first encryption components and first encryption square components according to the preset insertion rule and each first component in the first encryption vector; acquiring each group of second encryption components and second encryption square components according to the preset insertion rule and each second component in the second encryption vector;
determining each corresponding group of a first encryption component, a first encryption square component, a second encryption component and a second encryption square component according to the preset insertion rule;
determining each encrypted sub-distance according to each group of the first encrypted component, the first encrypted square component, the second encrypted component and the second encrypted square component;
and determining the distance between the encrypted first vector and the encrypted second vector according to the sum of each sub-distance.
14. The method of claim 1, wherein determining whether the first data and the second data match based on the target distance and a preset first distance threshold comprises:
determining whether the target distance is smaller than a preset first distance threshold value;
if yes, determining that the first data is matched with the second data;
otherwise, it is determined that the first data does not match the second data.
15. The method of claim 14, wherein after determining that the first data matches the second data, the method further comprises:
and determining whether the target distance is equal to a preset second distance threshold, and if so, determining that the first data is the same as the second data.
16. A data matching method applied to a second device, the method comprising:
inputting second data to be matched into a vector conversion model which is trained in advance to obtain a second vector corresponding to the second data;
receiving a first target public key sent by first equipment, and homomorphically encrypting the second vector by adopting the first target public key to generate a second encrypted vector;
obtaining a target distance between a first vector and a second vector determined based on the first encrypted vector and the second encrypted vector, wherein the first encrypted vector is obtained by homomorphically encrypting the first vector by using the first target public key, and the first vector is obtained by inputting first data into a vector conversion model which is trained in advance in the first device;
and determining whether the first data and the second data are matched or not based on the target distance and a preset first distance threshold.
17. The method according to claim 16, wherein the second data to be matched is input into a vector conversion model trained in advance, and obtaining a second vector corresponding to the second data comprises:
determining a second target data type corresponding to the second data;
determining a pre-trained second target vector conversion model corresponding to the second data according to the second target data type and the corresponding relation between the pre-stored data type and the pre-trained vector conversion model;
and inputting the second data into the pre-trained second target vector conversion model to obtain the second vector corresponding to the second data.
18. The method of claim 17, wherein the second target data type is a text type or a numeric type.
19. The method of claim 18, wherein if the second target data type is a text type, the corresponding pre-trained second target vector conversion model is a word vector model or a sentence vector model; and if the second target data type is a digital type, converting the corresponding pre-trained second target vector into a one-hot coding model.
20. The method of claim 16, wherein receiving the first target public key sent by the first device comprises:
receiving the first target public key and the first encryption vector sent by the first device, wherein the first encryption vector is obtained by homomorphically encrypting the first vector by using the first target public key;
the obtaining a target distance of a first vector and a second vector determined based on the first encrypted vector and the second encrypted vector comprises:
determining the distance between the encrypted first vector and the encrypted second vector according to the first encrypted vector and the second encrypted vector;
sending the encrypted distance between the first vector and the second vector to the first device;
and receiving the target distance sent by the first device, wherein the target distance is obtained by the first device decrypting the distance between the encrypted first vector and the encrypted second vector by using a first target private key corresponding to the first target public key generated by the first device.
21. The method of claim 16, wherein obtaining the target distance of the first vector and the second vector determined based on the first encrypted vector and the second encrypted vector comprises:
sending the second encrypted vector to the first device to cause the first device to determine a distance between the encrypted first encrypted vector and the second encrypted vector based on the second encrypted vector and the first encrypted vector;
receiving a target distance between the first vector and the second vector sent by the first device, wherein the target distance is obtained by decrypting the distance between the encrypted first encrypted vector and the encrypted second encrypted vector by the first device based on a first target private key corresponding to the first target public key generated by the first device.
22. The method of claim 16, wherein obtaining the target distance of the first vector and the second vector determined based on the first encrypted vector and the second encrypted vector comprises:
homomorphically encrypting the second vector by using a second target public key generated by the second device to obtain a third encrypted vector, and sending the second target public key and the third encrypted vector to the first device;
receiving a third encryption vector and a fourth encryption vector sent by the first device, and determining a distance between the encrypted second vector and the encrypted first vector, wherein the fourth encryption vector is generated by homomorphically encrypting the first vector by using the second target public key;
determining the target distance of the first vector and the second vector based on the encrypted distance between the first vector and the second vector and a second target private key corresponding to the second target public key, and determining whether the first data and the second data are matched based on the target distance and a preset first distance threshold.
23. The method of claim 16, wherein the second data to be matched is input into a vector transformation model trained in advance, and obtaining a second vector corresponding to the second data comprises:
for each second subdata in the second data, inputting the second subdata into a vector conversion model which is trained in advance to obtain a second subvector corresponding to the second subdata; the length of a second sub-vector corresponding to each second sub-data is a first preset length;
and splicing the second sub-vectors corresponding to each second sub-data to obtain the second vectors corresponding to the second data.
24. The method of claim 16 or 23, wherein the lengths of the first vector and the second vector are both a second predetermined length.
25. The method of claim 16, wherein homomorphically encrypting the second vector using the first target public key to generate a second encrypted vector comprises:
for each second component in the second vectors, determining a second square component corresponding to the second component;
inserting a second square component corresponding to each second component into the second vectors according to a preset insertion rule, and updating the vectors obtained after the second square components are inserted into the second vectors;
and performing homomorphic encryption on each second component and each second square component in the second vector respectively based on the first target public key to generate a second encrypted vector.
26. The method of claim 20 or 25, wherein determining the distance between the encrypted first and second vectors according to the first and second encrypted vectors comprises:
acquiring each group of second encryption components and second encryption square components according to the preset insertion rule and each second component in the second encryption vector; acquiring each group of first encryption components and first encryption square components according to the preset insertion rule and each first component in the first encryption vector;
determining each corresponding group of a first encryption component, a first encryption square component, a second encryption component and a second encryption square component according to the preset insertion rule;
determining each encrypted sub-distance according to each group of the first encrypted component, the first encrypted square component, the second encrypted component and the second encrypted square component;
and determining the distance between the encrypted first vector and the encrypted second vector according to the sum of each sub-distance.
27. The method of claim 16, wherein determining whether the first data and the second data match based on the target distance and a preset first distance threshold comprises:
determining whether the target distance is smaller than a preset first distance threshold value;
if yes, determining that the first data is matched with the second data;
otherwise, it is determined that the first data does not match the second data.
28. The method of claim 27, wherein after determining that the first data matches the second data, the method further comprises:
and determining whether the target distance is equal to a preset second distance threshold, and if so, determining that the first data is the same as the second data.
29. A data matching apparatus, applied to a first device, the apparatus comprising:
the device comprises a first acquisition module, a second acquisition module and a matching module, wherein the first acquisition module is used for inputting first data to be matched into a vector conversion model which is trained in advance to obtain a first vector corresponding to the first data;
the first processing module is used for homomorphically encrypting the first vector by adopting a first target public key generated by the first processing module to generate a first encrypted vector and sending the first target public key to the second equipment;
the first obtaining module is further configured to obtain a distance between the first vector and a second vector after encryption, where the distance is determined based on the first encrypted vector and the second encrypted vector, and the second encrypted vector is obtained by homomorphically encrypting the second vector by using the first target public key; the second vector is obtained by inputting second data into a vector conversion model which is trained in advance in the second equipment;
a first determining module, configured to determine a target distance between the first vector and the second vector based on the encrypted distance between the first vector and the second vector and a first target private key corresponding to the first target public key, and determine whether the first data and the second data match based on the target distance and a preset first distance threshold.
30. A data matching apparatus, applied to a second device, the apparatus comprising:
the second acquisition module is used for inputting second data to be matched into a vector conversion model which is trained in advance to obtain a second vector corresponding to the second data;
the second processing module is used for receiving a first target public key sent by the first equipment and homomorphically encrypting the second vector by adopting the first target public key to generate a second encrypted vector;
the second obtaining module is further configured to obtain a target distance between a first vector and a second vector that is determined based on a first encrypted vector and the second encrypted vector, where the first encrypted vector is obtained by encrypting the first vector using the first target public key, and the first vector is obtained by inputting first data into a vector transformation model that is trained in advance in the first device;
and the second determining module is used for determining whether the first data and the second data are matched or not based on the target distance and a preset first distance threshold.
31. An electronic device, characterized in that the electronic device comprises a processor and a memory, the memory being adapted to store program instructions, the processor being adapted to carry out the steps of the data matching method of any of the preceding claims 1-15 or the steps of the data matching method of any of the 16-28 when executing a computer program stored in the memory.
32. A computer-readable storage medium, characterized in that it stores a computer program which, when being executed by a processor, carries out the steps of the data matching method of any one of the preceding claims 1 to 15 or the steps of the data matching method of any one of the 16 to 28.
CN202210191650.3A 2022-02-28 2022-02-28 Data matching method, device, equipment and medium Pending CN114817943A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202210191650.3A CN114817943A (en) 2022-02-28 2022-02-28 Data matching method, device, equipment and medium
PCT/CN2022/112616 WO2023159888A1 (en) 2022-02-28 2022-08-15 Data matching method and apparatus, device, and medium
TW111135467A TWI835300B (en) 2022-02-28 2022-09-20 A data matching method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210191650.3A CN114817943A (en) 2022-02-28 2022-02-28 Data matching method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN114817943A true CN114817943A (en) 2022-07-29

Family

ID=82528992

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210191650.3A Pending CN114817943A (en) 2022-02-28 2022-02-28 Data matching method, device, equipment and medium

Country Status (3)

Country Link
CN (1) CN114817943A (en)
TW (1) TWI835300B (en)
WO (1) WO2023159888A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115687870A (en) * 2023-01-03 2023-02-03 四川易利数字城市科技有限公司 Place name matching method based on matrix operation
WO2023159888A1 (en) * 2022-02-28 2023-08-31 中国银联股份有限公司 Data matching method and apparatus, device, and medium
WO2024027066A1 (en) * 2022-08-04 2024-02-08 中国银联股份有限公司 Data matching method, apparatus and system, and device and medium
WO2024031886A1 (en) * 2022-08-09 2024-02-15 中国银联股份有限公司 Data matching method, apparatus and system, and device and medium

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9749128B2 (en) * 2014-05-15 2017-08-29 Xerox Corporation Compact fuzzy private matching using a fully-homomorphic encryption scheme
US11430434B1 (en) * 2017-02-15 2022-08-30 Amazon Technologies, Inc. Intelligent privacy protection mediation
CN108881204A (en) * 2018-06-08 2018-11-23 浙江捷尚人工智能研究发展有限公司 Secret protection cluster data mining method, electronic equipment, storage medium and system
CN108897810A (en) * 2018-06-19 2018-11-27 苏州大学 A kind of Methodology for Entities Matching, system, medium and equipment
CN111027981B (en) * 2019-12-13 2021-04-27 支付宝(杭州)信息技术有限公司 Method and device for multi-party joint training of risk assessment model for IoT (Internet of things) machine
CN113434878B (en) * 2021-06-25 2023-07-07 平安科技(深圳)有限公司 Modeling and application method, device, equipment and storage medium based on federal learning
CN113722753B (en) * 2021-08-25 2024-05-10 银清科技有限公司 Private data processing method, device and system based on blockchain
CN113904808A (en) * 2021-09-08 2022-01-07 北京信安世纪科技股份有限公司 Private key distribution and decryption method, device, equipment and medium
CN114817943A (en) * 2022-02-28 2022-07-29 中国银联股份有限公司 Data matching method, device, equipment and medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023159888A1 (en) * 2022-02-28 2023-08-31 中国银联股份有限公司 Data matching method and apparatus, device, and medium
WO2024027066A1 (en) * 2022-08-04 2024-02-08 中国银联股份有限公司 Data matching method, apparatus and system, and device and medium
TWI833528B (en) * 2022-08-04 2024-02-21 大陸商中國銀聯股份有限公司 Data matching methods, devices, systems, equipment and media
WO2024031886A1 (en) * 2022-08-09 2024-02-15 中国银联股份有限公司 Data matching method, apparatus and system, and device and medium
CN115687870A (en) * 2023-01-03 2023-02-03 四川易利数字城市科技有限公司 Place name matching method based on matrix operation

Also Published As

Publication number Publication date
TW202336617A (en) 2023-09-16
TWI835300B (en) 2024-03-11
WO2023159888A1 (en) 2023-08-31

Similar Documents

Publication Publication Date Title
CN114817943A (en) Data matching method, device, equipment and medium
US10284372B2 (en) Method and system for secure management of computer applications
CN108463968B (en) Fast format-preserving encryption of variable length data
US10476662B2 (en) Method for operating a distributed key-value store
CN106612172B (en) A kind of data tampering recovery algorithms can verify that restoring data authenticity in cloud storage
CN113159327A (en) Model training method and device based on federal learning system, and electronic equipment
CN114036565B (en) Private information retrieval system and private information retrieval method
CN111191255B (en) Information encryption processing method, server, terminal, device and storage medium
Hu et al. Batch image encryption using generated deep features based on stacked autoencoder network
CN112527273A (en) Code completion method, device and related equipment
CN114661318A (en) Efficient post-quantum security software updates customized for resource constrained devices
CN113542228A (en) Data transmission method and device based on federal learning and readable storage medium
CN115730333A (en) Security tree model construction method and device based on secret sharing and homomorphic encryption
CN113836559A (en) Sample alignment method, device, equipment and storage medium in federated learning
CN110704875B (en) Method, device, system, medium and electronic equipment for processing client sensitive information
CN110234082B (en) Addressing method and device of mobile terminal, storage medium and server
Mahdioui et al. On a System of Generalized Mixed Equilibrium Problems Involving Variational‐Like Inequalities in Banach Spaces: Existence and Algorithmic Aspects
TWI832640B (en) A data matching method, device, system, equipment and medium
CN111368314A (en) Modeling and predicting method, device, equipment and storage medium based on cross features
CN112417468B (en) Data processing method, device, electronic equipment and computer storage medium
CN115643090A (en) Longitudinal federal analysis method, device, equipment and medium based on privacy retrieval
KR20180005578A (en) Apparatus and method for detecting leakage of information
US10630470B2 (en) Zone based key version encoding
US20200228310A1 (en) Circuit concealing apparatus, calculation apparatus, and program
CN115801258B (en) Data processing method, device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40069149

Country of ref document: HK