CN114398671B - Privacy calculation method, system and readable storage medium based on feature engineering IV value - Google Patents
Privacy calculation method, system and readable storage medium based on feature engineering IV value Download PDFInfo
- Publication number
- CN114398671B CN114398671B CN202111654397.2A CN202111654397A CN114398671B CN 114398671 B CN114398671 B CN 114398671B CN 202111654397 A CN202111654397 A CN 202111654397A CN 114398671 B CN114398671 B CN 114398671B
- Authority
- CN
- China
- Prior art keywords
- value
- sample data
- ciphertext
- participant
- group
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Bioethics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Computer Security & Cryptography (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Computer Hardware Design (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Complex Calculations (AREA)
Abstract
The invention provides a privacy calculation method, a privacy calculation system and a readable storage medium based on a feature engineering IV value, wherein the method comprises the following steps: the participant A generates a public and private key pair of the participant A and discloses the public key to the participant B; the participant A encrypts the tag value of each sample data by using the public key and sends the ciphertext tag value to the participant B; the participant B groups a plurality of sample data based on the characteristic values, and calculates a ciphertext IV value of each group by combining the ciphertext tag value of each sample data and the public key of the participant A; the participant B accumulates the ciphertext IV values of each group to obtain a final ciphertext IV value; the participant B scrambles the final ciphertext IV value and sends the final ciphertext IV value to the participant A; the participant A decrypts the final ciphertext IV scrambling value by using the private key to obtain a plaintext IV scrambling value and sends the plaintext IV scrambling value to the participant B; and the party B descrambles the plaintext IV scrambling value to obtain the plaintext IV value of the feature. The invention can realize privacy calculation of the IV value of the multiparty feature engineering.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a privacy calculation method, a privacy calculation system and a privacy calculation readable storage medium based on a characteristic engineering IV value.
Background
In the traditional machine learning algorithm, when the feature input quantity is large, feature engineering IV values (IV values for short) need to be calculated for screening, and the higher the IV values are, the larger the feature information content is, and the more suitable for model training of the machine learning algorithm is. With the increasing day importance of data privacy and security protection, the method for calculating IV values based on plaintext data has been difficult to meet the privacy protection requirement, so technical methods for implementing IV value calculation based on privacy protection technology are receiving increasing attention. The following two privacy protection schemes, MPC and TEE, are mostly adopted in the traditional.
The MPC is used as a privacy protection technology and can be used for encryption calculation of the IV value of the characteristic engineering, and the basic idea is that input data is secret shared, a plurality of participants respectively hold one piece of ciphertext sharing data, and then encryption calculation is carried out based on the ciphertext sharing data. Finally, the ciphertext results of a plurality of participants are summarized together to recover the plaintext results, and each participant cannot deduce plaintext input data of other participants in the ciphertext calculation process; however, the MPC technology has a high requirement on communication bandwidth, and when the network bandwidth is low and the delay is high, the algorithm running time is far longer than the plaintext algorithm running time, and the bandwidth requirement increases exponentially with the increase of the participants, so that the MPC technology is not suitable for algorithms with high data transmission quantity. In the characteristic engineering IV value calculation process, a large amount of data exchange is needed, and the ciphertext calculation time is too long by adopting the MPC technology.
The TEE is used as a privacy protection technology of a trusted execution environment, allows data plaintext calculation to be performed in a secure environment, and plaintext data is invisible to any attacker outside the secure environment, so that the privacy security of the data can be ensured. However, the privacy computation program based on the TEE technology needs to run on the CPU supporting the TEE technology, and thus needs to be absolutely trusted by the CPU manufacturer, which also results in the security model of the TEE technology needing to be dependent on the CPU manufacturer.
In addition, conventionally, after calculating the IV value of the feature engineering, sample data larger than a preset value is directly screened out for warehousing training, and the training mode is too much dependent on sample data of a single library, so that the model training optimization effect is poor, meanwhile, the model training optimization method is limited by the limitation of machine learning of a self neural network, even if the training is performed by adopting sample data with larger IV value, the output predicted result value has corresponding error, how to further optimize model parameters and improve the accuracy of the predicted result value are the problems to be solved currently urgently.
Disclosure of Invention
In order to solve at least one technical problem, the invention provides a privacy calculation method, a privacy calculation system and a privacy calculation readable storage medium based on a characteristic engineering IV value, which can realize privacy calculation of the multiparty characteristic engineering IV value, and select a proper training sample based on the characteristic engineering IV value so as to further optimize model parameters and improve the prediction precision of a machine learning algorithm.
The first aspect of the present invention proposes a privacy calculation method based on a feature engineering IV value, the method comprising:
presetting a participant A and a participant B which are subjected to characteristic engineering IV value joint calculation, and a plurality of sample data, wherein the participant A holds tag values of the sample data, and the participant B holds characteristic values of the sample data;
the participant A generates a public and private key pair of the participant A and discloses the public key to the participant B;
the participant A encrypts the tag value of each sample data by using the public key to generate a ciphertext tag value of each sample data, and sends the ciphertext tag value of each sample data to the participant B;
aiming at a certain characteristic, a participant B groups a plurality of sample data based on the characteristic value, and calculates a ciphertext IV value of each group by combining the ciphertext tag value of each sample data and the public key of the participant A;
the participant B accumulates the ciphertext IV values of each group to obtain a final ciphertext IV value of the feature;
the participant B scrambles the final ciphertext IV value of the feature to obtain a final ciphertext IV scrambling value and sends the final ciphertext IV scrambling value to the participant A;
the participant A decrypts the final ciphertext IV scrambling value of the feature by using the private key of the participant A, obtains the plaintext IV scrambling value of the feature and sends the plaintext IV scrambling value to the participant B;
And the party B descrambles the plaintext IV scrambling value of the feature to obtain the plaintext IV value of the feature.
In this scheme, for a certain feature, the participant B groups a plurality of sample data based on the feature value, and calculates a ciphertext IV value of each group by combining the ciphertext tag value of each sample data and the public key of the participant a, and specifically includes:
presetting m sample data, and marking the label value of each sample as L i E {0,1}, n features, L i =1 denotes positive sample, L i =0 denotes negative samples, and the total positive sample number is denoted G total The total negative sample number is denoted as B total I is the serial number of a certain sample data in m sample data, and the total positive sample number is recorded as G total And total negative sample count B total Known to party a and party B;
dividing m sample data of the feature into N groups, wherein the number of samples in each group is m l L is the sequence number of the group, and l is [1, N ]]All positive samples contained in each groupThe number is denoted as G l The total negative sample number is denoted as B l And meet the following
Acquiring a characteristic value F of each sample data in each group l,j And employing the public key of party a to characteristic value F of each sample data in each group l,j Encrypting to obtain ciphertext feature valueWherein l represents the first group, j represents the j-th sample data of the first group, F l,j A feature value representing the j-th sample data of the first group;
obtaining a tag value ciphertext value of each sample data in each group according to the ciphertext tag value of each sample data received from party AAnd combine ciphertext feature value->Calculating all positive sample number ciphertext values of each group
Sample number m of each group using public key of party a l Encrypting to obtain the sample number ciphertext value of each groupCiphertext value of sample number combined with each group +.>And all positive sample number ciphertext values of each group +.>Calculating the total negative sample number ciphertext value of each group +.>
Calculate WOE value of each group, recordAnd calculating the IV value of each group according to the WOE value of each group, < >> Record->Then party a's public key pair a is employed l Obtaining ciphertext value->
For a pair ofTransforming to obtain-> For->Transforming to obtain-> Record ln (1+g) l ) The ciphertext value of/ln 10 is +.>Record ln (1+b) l ) The ciphertext value of/ln 10 is +.>
in the scheme, a party B scrambles a final ciphertext IV value of the feature to obtain a final ciphertext IV scrambling value and sends the final ciphertext IV scrambling value to a party A, and the scheme specifically comprises the following steps:
the party B generates a random number e, and encrypts the random number e by adopting the public key of the party A to obtain a ciphertext value e of the random number e enc Presetting the final ciphertext IV value as IV enc Then the final ciphertext IV scrambling value IV is calculated according to the calculation formula enc_ert =IV enc +e enc ;
Scrambling of the final ciphertext IV by party B enc_err To party a.
In this scheme, party B generates a random number e, which specifically includes:
presetting a random number supporting party C, wherein K random sources are arranged; the random number support C holds K character strings R x Wherein x represents the x-th random source number and x ε [1, K]The method comprises the steps of carrying out a first treatment on the surface of the Each character string comprises p characters which are arranged in sequence, wherein p is an even number;
the random number supporting party C carries out pairwise pairing on p characters of each character string in a random mode to form p/2 pairing groups, wherein each pairing group comprises front characters and rear characters;
the random number supporting party C sends the front characters of p/2 pairing groups of each character string to the party B and performs local pre-storage;
when the party B needs to generate a random number, instruction information is sent to the random number supporting party C;
the random number supporting party C triggers the adoption of K character strings R based on instruction information x The modulating photon string is specifically as follows: the front characters of p/2 pairing groups in each character string are respectively used as modulationThe method comprises the steps that a first selection source for randomly selecting each modulation basis of a photon string is adopted, and p/2 post characters of pairing groups in each character string are used as a second selection source for modulating initial signals of the modulation photon strings; for each character string, randomly selecting a corresponding front character from a first selection source as a modulation base, selecting a rear character corresponding to the front character from a second selection source as a modulation initial signal, and respectively modulating the corresponding modulation initial signal into the polarization state of photons by each modulation base; based on the same character strings, the polarization states of all photons are combined to form corresponding photon strings, and K photon strings and a random selection mode of a modulation base corresponding to each character string are transmitted to a participant B through quantum communication;
The method comprises the steps that a participant B receives a random selection mode of K photon strings and modulation bases corresponding to each character string, a corresponding character string identifier is matched based on each photon string, a corresponding front character is found out from p/2 front characters of pairing groups of each character string in a random selection mode of the modulation bases corresponding to each character string to serve as a measurement base, and the corresponding photon string is measured by the aid of the measurement base, so that a measurement result is obtained;
taking the measurement result as the input of the xth random source, processing the measurement result through a chaotic function, and outputting the initial random number of the xth random source;
and obtaining the random number of the current moment of the x random source through a plurality of chaotic motions, and carrying out exclusive OR operation on the random numbers of the current moment of each random source to output the random number e of the current moment.
In this solution, after the party B descrambles the plaintext IV scrambling value of the feature to obtain the plaintext IV value of the feature, the method further includes:
presetting a plurality of sample data sets, and respectively calculating IV values of the characteristic in the plurality of sample data sets;
judging whether the IV value of a certain sample data set is larger than a first preset threshold value, if so, taking the IV value as a selected sample data set, extracting the selected sample data set into a sample training library, and presetting S selected sample data sets in the sample training library;
Training a preset neural network machine learning model by adopting each selected sample data set in a sample training library in sequence to obtain corresponding optimization parameters;
selecting a sample data set by taking a certain selected sample data set in the sample training library as a target, and sequentially differencing the optimization parameters corresponding to the selected sample data set with the optimization parameters corresponding to other selected sample data sets one by one to obtain S-1 difference values;
judging whether the absolute value of the difference value is larger than a second preset threshold value, if so, judging that the target selected sample data set is a suspected invalid sample data set once, and recording the total times that the target selected sample data set is the suspected invalid sample data set after the target selected sample data set and the rest selected sample data sets are subjected to difference comparison;
judging whether the total times are larger than a third preset threshold value, if so, judging that the sample data set selected by the target is an invalid sample data set, and if not, judging that the sample data set selected by the target is an valid sample data set;
comparing each selected sample data set in the sample training library with the optimized parameters corresponding to the rest selected sample data sets, and screening out all invalid sample data sets;
eliminating the optimization parameters corresponding to all the invalid sample data sets, and reserving the optimization parameters corresponding to all the remaining valid sample data sets;
And carrying out average value calculation on the optimized parameters corresponding to all the effective sample data sets to obtain an optimized parameter average value, and placing the optimized parameter average value into a neural network machine learning model to complete the training process.
In this aspect, after the optimized parameter average value is placed in the neural network machine learning model, the method further includes:
acquiring current appearance data information, inputting the appearance data information into a neural network machine learning model for machine learning, and obtaining a result predicted value;
acquiring a plurality of historical data from a historical database, wherein each historical data at least comprises historical appearance data information and corresponding historical result true values;
performing feature analysis on the historical representation data information of each piece of historical data to respectively obtain first feature quantities of the historical representation data information of each piece of historical data;
performing feature analysis on the current appearance data information to obtain a second feature quantity;
comparing a difference rate between the first feature quantity of the history image data information of each history data and the second feature quantity of the current image data information;
adding historical data with the difference rate smaller than a fourth preset threshold value into a correction library;
Performing neural network machine learning on the history image data information of each history data in the correction library by using a neural network machine learning model, and outputting a history result predicted value corresponding to each history image data information in the correction library;
for each historical data in the correction library, respectively carrying out difference between a corresponding historical result predicted value and a corresponding historical result true value to obtain a corresponding difference value;
carrying out average calculation on the difference values between the historical result predicted values of all the historical data in the correction library and the corresponding historical result true values to obtain correction values;
and correcting and optimizing the prediction result of the neural network machine learning model according to the correction value.
The second aspect of the present invention further provides a privacy computing system based on a feature engineering IV value, including a memory and a processor, where the memory includes a privacy computing method program based on a feature engineering IV value, and when the privacy computing method program based on a feature engineering IV value is executed by the processor, the following steps are implemented:
presetting a participant A and a participant B which are subjected to characteristic engineering IV value joint calculation, and a plurality of sample data, wherein the participant A holds tag values of the sample data, and the participant B holds characteristic values of the sample data;
The participant A generates a public and private key pair of the participant A and discloses the public key to the participant B;
the participant A encrypts the tag value of each sample data by using the public key to generate a ciphertext tag value of each sample data, and sends the ciphertext tag value of each sample data to the participant B;
aiming at a certain characteristic, a participant B groups a plurality of sample data based on the characteristic value, and calculates a ciphertext IV value of each group by combining the ciphertext tag value of each sample data and the public key of the participant A;
the participant B accumulates the ciphertext IV values of each group to obtain a final ciphertext IV value of the feature;
the participant B scrambles the final ciphertext IV value of the feature to obtain a final ciphertext IV scrambling value and sends the final ciphertext IV scrambling value to the participant A;
the participant A decrypts the final ciphertext IV scrambling value of the feature by using the private key of the participant A, obtains the plaintext IV scrambling value of the feature and sends the plaintext IV scrambling value to the participant B;
and the party B descrambles the plaintext IV scrambling value of the feature to obtain the plaintext IV value of the feature.
In this scheme, for a certain feature, the participant B groups a plurality of sample data based on the feature value, and calculates a ciphertext IV value of each group by combining the ciphertext tag value of each sample data and the public key of the participant a, and specifically includes:
Presetting m sample data, and marking the label value of each sample as L i E {0,1}, n features, L i =1 denotes positive sample, L i =0 denotes negative samples, and the total positive sample number is denoted G total The total negative sample number is denoted as B total I is the serial number of a certain sample data in m sample data, and the total positive sample number is recorded as G total And total negative sample count B total Known to party a and party B;
dividing m sample data of the feature into N groups, wherein the number of samples in each group is m l L is the sequence number of the group, and l is [1, N ]]The total positive sample number contained in each group is denoted as G l The total negative sample number is denoted as B l And meet the following
Acquiring a characteristic value F of each sample data in each group l,j And employing the public key of party a to characteristic value F of each sample data in each group l,j Encrypting to obtain ciphertext feature valueWherein l represents the first group, j represents the j-th sample data of the first group, F l,j A feature value representing the j-th sample data of the first group;
obtaining a tag value ciphertext value of each sample data in each group according to the ciphertext tag value of each sample data received from party AAnd combine ciphertext feature value->Calculating all positive sample number ciphertext values of each group
Sample number m of each group using public key of party a l Encrypting to obtain the sample number ciphertext value of each groupCiphertext value of sample number combined with each group +.>And all positive sample number ciphertext values of each group +.>Calculating the total negative sample number ciphertext value of each group +.>
Calculate WOE value of each group, recordAnd calculating the IV value of each group according to the WOE value of each group, < >> Recording deviceThen party a's public key pair a is employed l Obtaining ciphertext value
For a pair ofTransforming to obtain-> For->Transforming to obtain-> Record ln (1+g) l ) The ciphertext value of/ln 10 is +.>Record ln (1+b) l ) The ciphertext value of/ln 10 is +.>
in the scheme, a party B scrambles a final ciphertext IV value of the feature to obtain a final ciphertext IV scrambling value and sends the final ciphertext IV scrambling value to a party A, and the scheme specifically comprises the following steps:
the party B generates a random number e, and encrypts the random number e by adopting the public key of the party A to obtain a ciphertext value e of the random number e enc Presetting the final ciphertext IV value as IV enc Then the final ciphertext IV scrambling value IV is calculated according to the calculation formula enc_err =IV enc +e enc ;
Scrambling of the final ciphertext IV by party B enc_err To party a.
The third aspect of the present invention also proposes a computer readable storage medium, in which a privacy calculation method program based on a feature engineering IV value is included, which when executed by a processor, implements the steps of a privacy calculation method based on a feature engineering IV value as described above.
The privacy calculation method, the privacy calculation system and the privacy calculation computer-readable storage medium based on the characteristic engineering IV value can realize privacy calculation of the multiparty characteristic engineering IV value, and a proper training sample is selected based on the characteristic engineering IV value so as to further optimize model parameters and improve prediction accuracy of a machine learning algorithm.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
FIG. 1 illustrates a flow chart of a privacy calculation method based on feature engineering IV values of the present invention;
fig. 2 illustrates a block diagram of a privacy computing system based on feature engineering IV values of the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments may be combined with each other.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those described herein, and therefore the scope of the present invention is not limited to the specific embodiments disclosed below.
Fig. 1 shows a flowchart of a privacy calculation method based on feature engineering IV values of the present invention.
As shown in fig. 1, a first aspect of the present invention proposes a privacy calculation method based on a feature engineering IV value, the method comprising:
s102, presetting a participant A and a participant B of the characteristic engineering IV value joint calculation and a plurality of sample data, wherein the participant A holds the tag value of each sample data, and the participant B holds the characteristic value of each sample data;
s104, the party A generates a public and private key pair of the party A and discloses the public key to the party B;
s106, the participant A encrypts the tag value of each sample data by using the public key to generate a ciphertext tag value of each sample data, and sends the ciphertext tag value of each sample data to the participant B;
s108, aiming at a certain characteristic, the participant B groups a plurality of sample data based on the characteristic value, and calculates a ciphertext IV value of each group by combining the ciphertext label value of each sample data and the public key of the participant A;
s110, accumulating the ciphertext IV values of each group by the participant B to obtain a final ciphertext IV value of the characteristic;
s112, the participant B scrambles the final ciphertext IV value of the feature to obtain a final ciphertext IV scrambling value and sends the final ciphertext IV scrambling value to the participant A;
S114, the participant A decrypts the final ciphertext IV scrambling value of the feature by using the private key of the participant A, obtains the plaintext IV scrambling value of the feature and sends the plaintext IV scrambling value to the participant B;
s116, the party B descrambles the plaintext IV scrambling value of the feature to obtain the plaintext IV value of the feature.
It should be noted that, the present invention calculates the characteristic IV value by using public-private key encryption technology, so that it can be ensured that the data of each participant cannot be leaked to other participants under the condition that the IV value result is correct, specifically: the tag value data provider (i.e., party a) cannot learn the feature value of the feature data provider (i.e., party B), and as such, the feature data provider (i.e., party B) cannot learn the tag value data of the tag value provider (i.e., party a).
The privacy calculation effect on the characteristic engineering IV value can be achieved.
It can be understood that the present invention is a privacy protection technology that generates a public key and a private key by using an encryption technology, encrypts plaintext data by using the public key, calculates ciphertext to obtain a ciphertext result, and decrypts the ciphertext result by using the private key to obtain a plaintext result. The invention calculates the characteristic IV value by adopting public-private key encryption technology, and the bandwidth overhead among all the participants is far smaller than that of MPC technology, thereby effectively saving the calculation cost and improving the privacy calculation efficiency.
Based on the above, the invention applies public-private key encryption technology to IV value calculation, realizes IV value encryption calculation of multiparty characteristic data, well protects privacy of data of each participant on the premise of correct IV value calculation result, and has small network transmission bandwidth requirement in the calculation process, further improves calculation efficiency and saves calculation cost.
According to an embodiment of the present invention, for a certain feature, a party B groups a plurality of sample data based on a feature value, and calculates a ciphertext IV value of each group by combining a ciphertext tag value of each sample data and a public key of a party a, including:
presetting m sample data, and marking the label value of each sample as L i E {0,1}, n features, L i =1 denotes positive sample, L i =0 denotes negative samples, and the total positive sample number is denoted G total The total negative sample number is denoted as B total I is the serial number of a certain sample data in m sample data, and the total positive sample number is recorded as G total And total negative sample count B total Known to party a and party B;
dividing m sample data of the feature into N groups, wherein the number of samples in each group is m l L is the sequence number of the group, and l is [1, N ]]The total positive sample number contained in each group is denoted as G l The total negative sample number is denoted as B l And meet the following
Acquiring a characteristic value F of each sample data in each group l,j And employing the public key of party a to characteristic value F of each sample data in each group l,j Encrypting to obtain ciphertext feature valueWherein l represents the first group, j represents the j-th sample data of the first group, F l,j A feature value representing the jth sample data of the kth group;
obtaining a tag value ciphertext value of each sample data in each group according to the ciphertext tag value of each sample data received from party AAnd combine ciphertext feature value->Calculating all positive sample number ciphertext values of each group
Sample number m of each group using public key of party a l Encrypting to obtain the sample number ciphertext value of each groupCiphertext value of sample number combined with each group +.>And all positive sample number ciphertext values of each group +.>Calculating the total negative sample number ciphertext value of each group +.>
Calculate WOE value of each group, recordAnd calculating the IV value of each group according to the WOE value of each group, < >> Record->Then party a's public key pair a is employed l Obtaining ciphertext value->
For a pair ofTransforming to obtain-> For->Transforming to obtain-> Record ln (1+g) l ) The ciphertext value of/ln 10 is +.>Record ln (1+b) l ) The ciphertext value of/ln 10 is +.>
Preferably, in the process of grouping the plurality of sample data based on a certain characteristic, an equidistant grouping mode is adopted, that is, the plurality of sample data are uniformly grouped according to the size of the characteristic value. Assuming the number of packets is N, there are N groups, denoted as G l The number of samples per group isBut is not limited thereto.
In practical applications, the logarithmic ciphertext calculation process of the present invention uses taylor series expansion (s=10): recording device
Record ln (1+g) l ) Ciphertext value of/ln 10Record ln (1+b) l ) The ciphertext value of/ln 10 is +.>
According to a specific embodiment of the present invention, the adding up of the ciphertext IV values of each group by the party B, to obtain the final ciphertext IV value of the feature, specifically includes:
party B follows the formulaAccumulating the ciphertext IV values of each group to obtain a final ciphertext IV value IV of the feature enc 。
According to the embodiment of the invention, the party B scrambles the final ciphertext IV value of the characteristic to obtain a final ciphertext IV scrambling value and sends the final ciphertext IV scrambling value to the party A, and the method specifically comprises the following steps:
the party B generates a random number e, and encrypts the random number e by adopting the public key of the party A to obtain a ciphertext value e of the random number e enc Presetting the final ciphertext IV value as IV enc Then the final ciphertext IV scrambling value IV is calculated according to the calculation formula enc_err =IV enc +e enc ;
Scrambling of the final ciphertext IV by party B enc_err To party a.
According to a specific embodiment of the invention, the final ciphertext IV is scrambled by party B enc_err After sending to party a, the method further comprises:
party a utilizes itselfScrambling value IV of final ciphertext IV by private key enc_err Decryption is carried out to obtain a plaintext scrambling value IV dec =iv+e, scrambling value IV dec Transmitting to the participant B;
party B scrambling value IV for plaintext dec Obtaining characteristic engineering IV value iv=iv after descrambling dec -e。
According to an embodiment of the present invention, party B generates a random number e, specifically comprising:
presetting a random number supporting party C, wherein K random sources are arranged; the random number support C holds K character strings R x Wherein x represents the x-th random source number and x ε [1, K]The method comprises the steps of carrying out a first treatment on the surface of the Each character string comprises p characters which are arranged in sequence, wherein p is an even number;
the random number supporting party C carries out pairwise pairing on p characters of each character string in a random mode to form p/2 pairing groups, wherein each pairing group comprises front characters and rear characters;
the random number supporting party C sends the front characters of p/2 pairing groups of each character string to the party B and performs local pre-storage;
when the party B needs to generate a random number, instruction information is sent to the random number supporting party C;
The random number supporting party C triggers the adoption of K character strings R based on instruction information x The modulating photon string is specifically as follows: the p/2 front characters of the pairing groups in each character string are respectively used as a first selection source for randomly selecting the modulation bases of the modulation photon strings, and the p/2 rear characters of the pairing groups in each character string are used as a second selection source for modulating initial signals of the modulation photon strings; for each character string, randomly selecting a corresponding front character from a first selection source as a modulation base, selecting a rear character corresponding to the front character from a second selection source as a modulation initial signal, and respectively modulating the corresponding modulation initial signal into the polarization state of photons by each modulation base; based on the same character strings, the polarization states of all photons are combined to form corresponding photon strings, and K photon strings and a random selection mode of a modulation base corresponding to each character string are transmitted to a participant B through quantum communication;
the method comprises the steps that a participant B receives a random selection mode of K photon strings and modulation bases corresponding to each character string, a corresponding character string identifier is matched based on each photon string, a corresponding front character is found out from p/2 front characters of pairing groups of each character string in a random selection mode of the modulation bases corresponding to each character string to serve as a measurement base, and the corresponding photon string is measured by the aid of the measurement base, so that a measurement result is obtained;
Taking the measurement result as the input of the xth random source, processing the measurement result through a chaotic function, and outputting the initial random number of the xth random source;
and obtaining the random number of the current moment of the x random source through a plurality of chaotic motions, and carrying out exclusive OR operation on the random numbers of the current moment of each random source to output the random number e of the current moment.
It can be understood that the invention combines quantum communication technology and a plurality of random sources to carry out chaos operation, thereby leading the randomness of the generated random number to be higher and further effectively preventing the risk of being cracked.
It should be noted that, each character string has a corresponding character string identifier, when the random number support party C sends the K photon strings and the random selection manner of the modulation base corresponding to each character string to the party B, the random number support party C should also include the character string identifier, and the party B determines the corresponding character string based on the character string identifier, and then performs measurement base selection according to the same character string.
It can be understood that each front character has a corresponding sequence number, and the random number support party C can select the modulation base according to the sequence number sequence or reorder the sequence numbers of the selected front characters. Whichever is, a random selection manner needs to be notified to party B in order for it to select the corresponding test base.
According to an embodiment of the present invention, after the party B descrambles the plaintext IV scrambling value of the feature to obtain the plaintext IV value of the feature, the method further includes:
presetting a plurality of sample data sets, and respectively calculating IV values of the characteristic in the plurality of sample data sets;
judging whether the IV value of a certain sample data set is larger than a first preset threshold value, if so, taking the IV value as a selected sample data set, extracting the selected sample data set into a sample training library, and presetting S selected sample data sets in the sample training library;
training a preset neural network machine learning model by adopting each selected sample data set in a sample training library in sequence to obtain corresponding optimization parameters;
selecting a sample data set by taking a certain selected sample data set in the sample training library as a target, and sequentially differencing the optimization parameters corresponding to the selected sample data set with the optimization parameters corresponding to other selected sample data sets one by one to obtain S-1 difference values;
judging whether the absolute value of the difference value is larger than a second preset threshold value, if so, judging that the target selected sample data set is a suspected invalid sample data set once, and recording the total times that the target selected sample data set is the suspected invalid sample data set after the target selected sample data set and the rest selected sample data sets are subjected to difference comparison;
Judging whether the total times are larger than a third preset threshold value, if so, judging that the sample data set selected by the target is an invalid sample data set, and if not, judging that the sample data set selected by the target is an valid sample data set;
comparing each selected sample data set in the sample training library with the optimized parameters corresponding to the rest selected sample data sets, and screening out all invalid sample data sets;
eliminating the optimization parameters corresponding to all the invalid sample data sets, and reserving the optimization parameters corresponding to all the remaining valid sample data sets;
and carrying out average value calculation on the optimized parameters corresponding to all the effective sample data sets to obtain an optimized parameter average value, and placing the optimized parameter average value into a neural network machine learning model to complete the training process.
It can be understood that the method and the device can calculate more accurate optimization parameters according to the effective sample data set conveniently by eliminating the ineffective sample data set, thereby improving the prediction precision of the neural network machine learning model.
According to an embodiment of the present invention, after placing the optimized parameter average value into the neural network machine learning model, the method further comprises:
acquiring current appearance data information, inputting the appearance data information into a neural network machine learning model for machine learning, and obtaining a result predicted value;
Acquiring a plurality of historical data from a historical database, wherein each historical data at least comprises historical appearance data information and corresponding historical result true values;
performing feature analysis on the historical representation data information of each piece of historical data to respectively obtain first feature quantities of the historical representation data information of each piece of historical data;
performing feature analysis on the current appearance data information to obtain a second feature quantity;
comparing a difference rate between the first feature quantity of the history image data information of each history data and the second feature quantity of the current image data information;
adding historical data with the difference rate smaller than a fourth preset threshold value into a correction library;
performing neural network machine learning on the history image data information of each history data in the correction library by using a neural network machine learning model, and outputting a history result predicted value corresponding to each history image data information in the correction library;
for each historical data in the correction library, respectively carrying out difference between a corresponding historical result predicted value and a corresponding historical result true value to obtain a corresponding difference value;
carrying out average calculation on the difference values between the historical result predicted values of all the historical data in the correction library and the corresponding historical result true values to obtain correction values;
And correcting and optimizing the prediction result of the neural network machine learning model according to the correction value.
It will be appreciated that the appearance data information may be image data information and, correspondingly, the prediction result may be an image recognition result.
According to a specific embodiment of the present invention, correction optimization is performed on a prediction result of a neural network machine learning model according to a correction value, and specifically includes:
and adding the result predicted value obtained by predicting the neural network machine learning model with the correction value to obtain a corrected result predicted value.
It can be understood that the method is limited by the limitation of neural network machine learning, the correction value is further calculated by combining the difference value between the true value and the predicted value, and the neural network machine learning model is further optimized by combining the correction value, so that the neural network machine learning model outputs a more accurate predicted value.
According to a specific embodiment of the present invention, before the party a generates its public-private key pair and discloses the public key to the party B, the method further comprises:
and the party A and the party B perform bidirectional identity authentication.
According to a specific embodiment of the present invention, the mutual identity authentication between the party a and the party B specifically includes:
The participant A and the participant B agree with a first character string and a second character string, wherein the first character string and the second character string respectively comprise q characters which are arranged in sequence, and q is an even number;
the method comprises the steps that a party A and a party B pair q characters of a first character string and a second character string in a same random mode respectively to form q/2 pairing groups, wherein each pairing group in the first character string comprises a first front character and a first rear character; each pairing group in the second character string comprises a second front character and a second rear character;
the method comprises the steps that a participant A takes q/2 first front characters of pairing groups of a first character string as third random selection sources for modulating all modulation bases of a photon string, takes q/2 first rear characters of pairing groups of the first character string as fourth random selection sources for modulating initial signals of the photon string, randomly selects corresponding first front characters from the third random selection sources to serve as first modulation bases, selects first rear characters corresponding to the selected first front characters from the fourth random selection sources to serve as first modulation initial signals, modulates the corresponding first modulation initial signals into polarization states of photons by all the first modulation bases respectively, combines the polarization states of all photons to form a first photon string, and sends the first photon string and a selection mode of the corresponding first modulation base to a participant B through quantum communication;
The method comprises the steps that a participant B receives a selection mode of a first photon string and a corresponding first modulation base, a corresponding first front character is found out from q/2 pairing groups of a first character string based on the selection mode of the corresponding first modulation base to serve as a first measurement base, the first photon string is measured by the aid of the first measurement base to obtain a first measurement result, whether the first measurement result is identical to a first rear character corresponding to the found first front character or not is judged, and if the first measurement result is identical to the first rear character corresponding to the found first front character, authentication of the participant A by the participant B is achieved;
the participant B takes the second front characters of the q/2 matched groups of the second character string as fifth random selection sources for modulating all modulation bases of the photon string, takes the second rear characters of the q/2 matched groups in the second character string as sixth random selection sources for modulating initial signals of the photon string, randomly selects corresponding second front characters from the fifth random selection sources as second modulation bases, selects second rear characters corresponding to the selected second front characters from the sixth random selection sources as second modulation initial signals, modulates the corresponding second modulation initial signals into polarization states of photons by all the second modulation bases respectively, combines all the polarization states of photons to form a second photon string, and sends the second photon string and the selection mode of the corresponding second modulation bases to the participant A through quantum communication;
The participant A receives the selection mode of the second photon string and the corresponding second modulation base, finds out the corresponding second front character from q/2 pairing groups of the first character string based on the selection mode of the corresponding second modulation base as a second measurement base, measures the second photon string by adopting the second measurement base to obtain a second measurement result, judges whether the second measurement result is identical to a second rear character corresponding to the found second front character, and if so, realizes authentication of the participant A to the participant B.
It will be appreciated that when party a communicates with party B, both party identity authentication may be performed to ensure the legitimacy of both party identities. The invention adopts the quantum computing technology to realize the identity authentication of both parties, can effectively resist quantum impact and improve the authority of authentication.
Fig. 2 illustrates a block diagram of a privacy computing system based on feature engineering IV values of the present invention.
As shown in fig. 2, the second aspect of the present invention further proposes a privacy calculation system 2 based on a feature engineering IV value, including a memory 21 and a processor 22, where the memory includes a privacy calculation method program based on a feature engineering IV value, and when the privacy calculation method program based on a feature engineering IV value is executed by the processor, the following steps are implemented:
Presetting a participant A and a participant B which are subjected to characteristic engineering IV value joint calculation, and a plurality of sample data, wherein the participant A holds tag values of the sample data, and the participant B holds characteristic values of the sample data;
the participant A generates a public and private key pair of the participant A and discloses the public key to the participant B;
the participant A encrypts the tag value of each sample data by using the public key to generate a ciphertext tag value of each sample data, and sends the ciphertext tag value of each sample data to the participant B;
aiming at a certain characteristic, a participant B groups a plurality of sample data based on the characteristic value, and calculates a ciphertext IV value of each group by combining the ciphertext tag value of each sample data and the public key of the participant A;
the participant B accumulates the ciphertext IV values of each group to obtain a final ciphertext IV value of the feature;
the participant B scrambles the final ciphertext IV value of the feature to obtain a final ciphertext IV scrambling value and sends the final ciphertext IV scrambling value to the participant A;
the participant A decrypts the final ciphertext IV scrambling value of the feature by using the private key of the participant A, obtains the plaintext IV scrambling value of the feature and sends the plaintext IV scrambling value to the participant B;
and the party B descrambles the plaintext IV scrambling value of the feature to obtain the plaintext IV value of the feature.
According to an embodiment of the present invention, for a certain feature, a party B groups a plurality of sample data based on a feature value, and calculates a ciphertext IV value of each group by combining a ciphertext tag value of each sample data and a public key of a party a, including:
presetting m sample data, and marking the label value of each sample as L i E {0,1}, n features, L i =1 denotes positive sample, L i =0 denotes negative samples, and the total positive sample number is denoted G total The total negative sample number is denoted as B total I is the serial number of a certain sample data in m sample data, and the total positive sample number is recorded as G total And total negative sample count B total Known to party a and party B;
dividing m sample data of the feature into N groups, wherein the number of samples in each group is m l L is the sequence number of the group, and l is [1, N ]]The total positive sample number contained in each group is denoted as G l The total negative sample number is denoted as B l And meet the following
Acquiring a characteristic value F of each sample data in each group l,j And employing the public key of party a to characteristic value F of each sample data in each group l,j Encrypting to obtain ciphertext feature valueWherein l represents the first group, j represents the j-th sample data of the first group, F l,j A feature value representing the jth sample data of the kth group;
Obtaining a tag value ciphertext value of each sample data in each group according to the ciphertext tag value of each sample data received from party AAnd combine ciphertext feature value->Calculating all positive sample number ciphertext values of each group
Sample number m of each group using public key of party a l Encrypting to obtain the sample number ciphertext value of each groupCiphertext value of sample number combined with each group +.>And all positive sample number ciphertext values of each group +.>Calculating the total negative sample number ciphertext value of each group +.>
Calculate WOE value of each group, recordAnd calculating the IV value of each group according to the WOE value of each group, < >> Record->Then party a's public key pair a is employed l Obtaining ciphertext value->
For a pair ofTransforming to obtain-> For->Transforming to obtain-> Record ln (1+g) l ) The ciphertext value of/ln 10 is +.>Record ln (1+b) l ) The ciphertext value of/ln 10 is +.>
according to the embodiment of the invention, the party B scrambles the final ciphertext IV value of the characteristic to obtain a final ciphertext IV scrambling value and sends the final ciphertext IV scrambling value to the party A, and the method specifically comprises the following steps:
the party B generates a random number e, and encrypts the random number e by adopting the public key of the party A to obtain a ciphertext value e of the random number e enc Presetting the final ciphertext IV value as IV enc Then the final ciphertext IV scrambling value IV is calculated according to the calculation formula enc_err =IV enc +e enc ;
Scrambling of the final ciphertext IV by party B enc_err To party a.
The third aspect of the present invention also proposes a computer readable storage medium, in which a privacy calculation method program based on a feature engineering IV value is included, which when executed by a processor, implements the steps of a privacy calculation method based on a feature engineering IV value as described above.
The privacy calculation method, the privacy calculation system and the privacy calculation computer-readable storage medium based on the characteristic engineering IV value can realize privacy calculation of the multiparty characteristic engineering IV value, and a proper training sample is selected based on the characteristic engineering IV value so as to further optimize model parameters and improve prediction accuracy of a machine learning algorithm.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units; can be located in one place or distributed to a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present invention may be integrated in one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, where the program, when executed, performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, or the like, which can store program codes.
Alternatively, the above-described integrated units of the present invention may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in essence or a part contributing to the prior art in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, ROM, RAM, magnetic or optical disk, or other medium capable of storing program code.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (9)
1. A privacy computing method based on feature engineering IV values, the method comprising:
presetting a participant A and a participant B which are subjected to characteristic engineering IV value joint calculation, and a plurality of sample data, wherein the participant A holds tag values of the sample data, and the participant B holds characteristic values of the sample data;
the participant A generates a public and private key pair of the participant A and discloses the public key to the participant B;
the participant A encrypts the tag value of each sample data by using the public key to generate a ciphertext tag value of each sample data, and sends the ciphertext tag value of each sample data to the participant B;
aiming at a certain characteristic, a participant B groups a plurality of sample data based on the characteristic value, and calculates a ciphertext IV value of each group by combining the ciphertext tag value of each sample data and the public key of the participant A;
the participant B accumulates the ciphertext IV values of each group to obtain a final ciphertext IV value of the feature;
the participant B scrambles the final ciphertext IV value of the feature to obtain a final ciphertext IV scrambling value and sends the final ciphertext IV scrambling value to the participant A;
the participant A decrypts the final ciphertext IV scrambling value of the feature by using the private key of the participant A, obtains the plaintext IV scrambling value of the feature and sends the plaintext IV scrambling value to the participant B;
The participant B descrambles the plaintext IV scrambling value of the feature to obtain the plaintext IV value of the feature;
presetting a plurality of sample data sets, and respectively calculating IV values of the characteristic in the plurality of sample data sets;
judging whether the IV value of a certain sample data set is larger than a first preset threshold value, if so, taking the IV value as a selected sample data set, extracting the selected sample data set into a sample training library, and presetting S selected sample data sets in the sample training library;
training a preset neural network machine learning model by adopting each selected sample data set in a sample training library in sequence to obtain corresponding optimization parameters;
selecting a sample data set by taking a certain selected sample data set in the sample training library as a target, and sequentially differencing the optimization parameters corresponding to the selected sample data set with the optimization parameters corresponding to other selected sample data sets one by one to obtain S-1 difference values;
judging whether the absolute value of the difference value is larger than a second preset threshold value, if so, judging that the target selected sample data set is a suspected invalid sample data set once, and recording the total times that the target selected sample data set is the suspected invalid sample data set after the target selected sample data set and the rest selected sample data sets are subjected to difference comparison;
Judging whether the total times are larger than a third preset threshold value, if so, judging that the sample data set selected by the target is an invalid sample data set, and if not, judging that the sample data set selected by the target is an valid sample data set;
comparing each selected sample data set in the sample training library with the optimized parameters corresponding to the rest selected sample data sets, and screening out all invalid sample data sets;
eliminating the optimization parameters corresponding to all the invalid sample data sets, and reserving the optimization parameters corresponding to all the remaining valid sample data sets;
and carrying out average value calculation on the optimized parameters corresponding to all the effective sample data sets to obtain an optimized parameter average value, and placing the optimized parameter average value into a neural network machine learning model to complete the training process.
2. The privacy calculating method based on feature engineering IV values according to claim 1, wherein, for a certain feature, the party B groups a plurality of sample data based on the feature value, and calculates the ciphertext IV value of each group by combining the ciphertext tag value of each sample data and the public key of the party a, specifically comprising:
presetting m sample data, and marking the label value of each sample as L i E {0,1}, n features, L i =1 denotes positive sample, L i =0 denotes negative samples, and the total positive sample number is denoted G total The total negative sample number is denoted as B total I is the serial number of a certain sample data in m sample data, and the total positive sample number is recorded as G total And total negative sample count B total Known to party a and party B;
dividing m sample data of the feature into N groups, wherein the number of samples in each group is m l L is the sequence number of the group, and l e 1,N]the total positive sample number contained in each group is denoted as G l The total negative sample number is denoted as B l And meet the following
Acquiring a characteristic value F of each sample data in each group l,j And employing the public key of party a to characteristic value F of each sample data in each group l,j Encrypting to obtain ciphertext feature valueWherein l represents the first group, j represents the j-th sample data of the first group, F l,j A feature value representing the j-th sample data of the first group;
obtaining a tag value ciphertext value of each sample data in each group according to the ciphertext tag value of each sample data received from party AAnd combine ciphertext feature value->Calculating all positive sample number ciphertext values of each group
Sample number m of each group using public key of party a l Encrypting to obtain the sample number ciphertext value of each groupCiphertext value of sample number combined with each group +.>And all positive sample number ciphertext values of each group +.>Calculating the total negative sample number ciphertext value of each group +.>
Calculate WOE value of each group, recordAnd calculates the IV value for each group based on the WOE value for each group, record->Then party a's public key pair a is employed l Obtaining ciphertext value->
For a pair ofTransforming to obtain-> For->Transforming to obtain-> Record ln (1+g) l ) The ciphertext value of/ln 10 is +.>Record ln (1+b) l ) The ciphertext value of/ln 10 is +.>
3. the privacy computing method based on the feature engineering IV value according to claim 1, wherein the party B scrambles the final ciphertext IV value of the feature to obtain a final ciphertext IV scrambled value, and sends the final ciphertext IV scrambled value to the party a, specifically including:
the party B generates a random number e, and encrypts the random number e by adopting the public key of the party A to obtain a ciphertext value e of the random number e enc Presetting the final ciphertext IV value as IV enc Then the final ciphertext IV scrambling value IV is calculated according to the calculation formula enc_err =IV enc +e enc ;
Scrambling of the final ciphertext IV by party B enc_err To party a.
4. A privacy calculation method based on feature engineering IV values according to claim 3, wherein party B generates a random number e, comprising:
Presetting a random number supporting party C, wherein K random sources are arranged; the random number support C holds K character strings R x Wherein x represents the x-th random source number and x ε [1, K]The method comprises the steps of carrying out a first treatment on the surface of the Each character string comprises p characters which are arranged in sequence, wherein p is an even number;
the random number supporting party C carries out pairwise pairing on p characters of each character string in a random mode to form p/2 pairing groups, wherein each pairing group comprises front characters and rear characters;
the random number supporting party C sends the front characters of p/2 pairing groups of each character string to the party B and performs local pre-storage;
when the party B needs to generate a random number, instruction information is sent to the random number supporting party C;
the random number supporting party C triggers the adoption of K character strings R based on instruction information x The modulating photon string is specifically as follows: the p/2 front characters of the pairing groups in each character string are respectively used as a first selection source for randomly selecting the modulation bases of the modulation photon strings, and the p/2 rear characters of the pairing groups in each character string are used as a second selection source for modulating initial signals of the modulation photon strings; for each character string, randomly selecting a corresponding front character from a first selection source as a modulation base, selecting a rear character corresponding to the front character from a second selection source as a modulation initial signal, and respectively modulating the corresponding modulation initial signal into the polarization state of photons by each modulation base; based on the same character strings, the polarization states of all photons are combined to form corresponding photon strings, and K photon strings and a random selection mode of a modulation base corresponding to each character string are transmitted to a participant B through quantum communication;
The method comprises the steps that a participant B receives a random selection mode of K photon strings and modulation bases corresponding to each character string, a corresponding character string identifier is matched based on each photon string, a corresponding front character is found out from p/2 front characters of pairing groups of each character string in a random selection mode of the modulation bases corresponding to each character string to serve as a measurement base, and the corresponding photon string is measured by the aid of the measurement base, so that a measurement result is obtained;
taking the measurement result as the input of the xth random source, processing the measurement result through a chaotic function, and outputting the initial random number of the xth random source;
and obtaining the random number of the current moment of the x random source through a plurality of chaotic motions, and carrying out exclusive OR operation on the random numbers of the current moment of each random source to output the random number e of the current moment.
5. The privacy calculation method based on feature engineering IV values of claim 1, wherein after placing the optimized parameter average value into a neural network machine learning model, the method further comprises:
acquiring current appearance data information, inputting the appearance data information into a neural network machine learning model for machine learning, and obtaining a result predicted value;
acquiring a plurality of historical data from a historical database, wherein each historical data at least comprises historical appearance data information and corresponding historical result true values;
Performing feature analysis on the historical representation data information of each piece of historical data to respectively obtain first feature quantities of the historical representation data information of each piece of historical data;
performing feature analysis on the current appearance data information to obtain a second feature quantity;
comparing a difference rate between the first feature quantity of the history image data information of each history data and the second feature quantity of the current image data information;
adding historical data with the difference rate smaller than a fourth preset threshold value into a correction library;
performing neural network machine learning on the history image data information of each history data in the correction library by using a neural network machine learning model, and outputting a history result predicted value corresponding to each history image data information in the correction library;
for each historical data in the correction library, respectively carrying out difference between a corresponding historical result predicted value and a corresponding historical result true value to obtain a corresponding difference value;
carrying out average calculation on the difference values between the historical result predicted values of all the historical data in the correction library and the corresponding historical result true values to obtain correction values;
and correcting and optimizing the prediction result of the neural network machine learning model according to the correction value.
6. The privacy computing system based on the characteristic engineering IV value is characterized by comprising a memory and a processor, wherein the memory comprises a privacy computing method program based on the characteristic engineering IV value, and the privacy computing method program based on the characteristic engineering IV value realizes the following steps when being executed by the processor:
presetting a participant A and a participant B which are subjected to characteristic engineering IV value joint calculation, and a plurality of sample data, wherein the participant A holds tag values of the sample data, and the participant B holds characteristic values of the sample data;
the participant A generates a public and private key pair of the participant A and discloses the public key to the participant B;
the participant A encrypts the tag value of each sample data by using the public key to generate a ciphertext tag value of each sample data, and sends the ciphertext tag value of each sample data to the participant B;
aiming at a certain characteristic, a participant B groups a plurality of sample data based on the characteristic value, and calculates a ciphertext IV value of each group by combining the ciphertext tag value of each sample data and the public key of the participant A;
the participant B accumulates the ciphertext IV values of each group to obtain a final ciphertext IV value of the feature;
the participant B scrambles the final ciphertext IV value of the feature to obtain a final ciphertext IV scrambling value and sends the final ciphertext IV scrambling value to the participant A;
The participant A decrypts the final ciphertext IV scrambling value of the feature by using the private key of the participant A, obtains the plaintext IV scrambling value of the feature and sends the plaintext IV scrambling value to the participant B;
the participant B descrambles the plaintext IV scrambling value of the feature to obtain the plaintext IV value of the feature;
presetting a plurality of sample data sets, and respectively calculating IV values of the characteristic in the plurality of sample data sets;
judging whether the IV value of a certain sample data set is larger than a first preset threshold value, if so, taking the IV value as a selected sample data set, extracting the selected sample data set into a sample training library, and presetting S selected sample data sets in the sample training library;
training a preset neural network machine learning model by adopting each selected sample data set in a sample training library in sequence to obtain corresponding optimization parameters;
selecting a sample data set by taking a certain selected sample data set in the sample training library as a target, and sequentially differencing the optimization parameters corresponding to the selected sample data set with the optimization parameters corresponding to other selected sample data sets one by one to obtain S-1 difference values;
judging whether the absolute value of the difference value is larger than a second preset threshold value, if so, judging that the target selected sample data set is a suspected invalid sample data set once, and recording the total times that the target selected sample data set is the suspected invalid sample data set after the target selected sample data set and the rest selected sample data sets are subjected to difference comparison;
Judging whether the total times are larger than a third preset threshold value, if so, judging that the sample data set selected by the target is an invalid sample data set, and if not, judging that the sample data set selected by the target is an valid sample data set;
comparing each selected sample data set in the sample training library with the optimized parameters corresponding to the rest selected sample data sets, and screening out all invalid sample data sets;
eliminating the optimization parameters corresponding to all the invalid sample data sets, and reserving the optimization parameters corresponding to all the remaining valid sample data sets;
and carrying out average value calculation on the optimized parameters corresponding to all the effective sample data sets to obtain an optimized parameter average value, and placing the optimized parameter average value into a neural network machine learning model to complete the training process.
7. The privacy computing system of claim 6, wherein, for a feature, party B groups a plurality of sample data based on the feature value and computes a ciphertext IV value for each group in combination with the ciphertext tag value of each sample data and the public key of party a, comprising:
presetting m sample data, and marking the label value of each sample as L i E {0,1}, n features, L i =1 denotes positive sample, L i The expression =0 represents a negative sample,the total positive sample number is denoted as G total The total negative sample number is denoted as B total I is the serial number of a certain sample data in m sample data, and the total positive sample number is recorded as G total And total negative sample count B total Known to party a and party B;
dividing m sample data of the feature into N groups, wherein the number of samples in each group is m l L is the sequence number of the group, and l is [1, N ]]The total positive sample number contained in each group is denoted as G l The total negative sample number is denoted as B l And meet the following
Acquiring a characteristic value F of each sample data in each group l,j And employing the public key of party a to characteristic value F of each sample data in each group l,j Encrypting to obtain ciphertext feature valueWherein l represents the first group, j represents the j-th sample data of the first group, F l,j A feature value representing the j-th sample data of the first group;
obtaining a tag value ciphertext value of each sample data in each group according to the ciphertext tag value of each sample data received from party AAnd combine ciphertext feature value->Calculating all positive sample number ciphertext values of each group
Sample number m of each group using public key of party a l Encrypting to obtain the sample number ciphertext value of each groupCiphertext value of sample number combined with each group +.>And all positive sample number ciphertext values of each group +.>Calculating the total negative sample number ciphertext value of each group +.>
Calculate WOE value of each group, recordAnd calculates the IV value for each group based on the WOE value for each group, record->Then party a's public key pair a is employed l Obtaining ciphertext value->
For a pair ofTransforming to obtain-> For a pair ofTransforming to obtain-> Record ln (1+g) l ) The ciphertext value of/ln 10 is +.>Record ln (1+b) l ) The ciphertext value of/ln 10 is +.>
8. the privacy computing system of claim 6, wherein party B scrambles the final ciphertext IV value of the feature to obtain a final ciphertext IV scrambled value and sends the final ciphertext IV scrambled value to party a, comprising:
the party B generates a random number e, and encrypts the random number e by adopting the public key of the party A to obtain a ciphertext value e of the random number e enc Presetting the final ciphertext IV value as IV enc Then the final ciphertext IV scrambling value IV is calculated according to the calculation formula enc_err =IV enc +e enc ;
Scrambling of the final ciphertext IV by party B enc_err To party a.
9. A computer readable storage medium, characterized in that the computer readable storage medium comprises a privacy calculation method program based on a feature engineering IV value, which when executed by a processor, implements the steps of a privacy calculation method based on a feature engineering IV value as claimed in any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111654397.2A CN114398671B (en) | 2021-12-30 | 2021-12-30 | Privacy calculation method, system and readable storage medium based on feature engineering IV value |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111654397.2A CN114398671B (en) | 2021-12-30 | 2021-12-30 | Privacy calculation method, system and readable storage medium based on feature engineering IV value |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114398671A CN114398671A (en) | 2022-04-26 |
CN114398671B true CN114398671B (en) | 2023-07-11 |
Family
ID=81229533
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111654397.2A Active CN114398671B (en) | 2021-12-30 | 2021-12-30 | Privacy calculation method, system and readable storage medium based on feature engineering IV value |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114398671B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115344894A (en) * | 2022-10-18 | 2022-11-15 | 翼方健数(北京)信息科技有限公司 | Feature engineering IV value privacy calculation method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110968886A (en) * | 2019-12-20 | 2020-04-07 | 支付宝(杭州)信息技术有限公司 | Method and system for screening training samples of machine learning model |
CN111563267A (en) * | 2020-05-08 | 2020-08-21 | 京东数字科技控股有限公司 | Method and device for processing federal characteristic engineering data |
CN113591133A (en) * | 2021-09-27 | 2021-11-02 | 支付宝(杭州)信息技术有限公司 | Method and device for performing feature processing based on differential privacy |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110032878B (en) * | 2019-03-04 | 2021-11-02 | 创新先进技术有限公司 | Safety feature engineering method and device |
US11139961B2 (en) * | 2019-05-07 | 2021-10-05 | International Business Machines Corporation | Private and federated learning |
CN110472424A (en) * | 2019-07-16 | 2019-11-19 | 西安理工大学 | The method of more image encryptions, certification based on intensity transmission equation and photon counting |
US20210143987A1 (en) * | 2019-11-13 | 2021-05-13 | International Business Machines Corporation | Privacy-preserving federated learning |
CN110990857B (en) * | 2019-12-11 | 2021-04-06 | 支付宝(杭州)信息技术有限公司 | Multi-party combined feature evaluation method and device for protecting privacy and safety |
CN112668046A (en) * | 2020-12-24 | 2021-04-16 | 深圳前海微众银行股份有限公司 | Feature interleaving method, apparatus, computer-readable storage medium, and program product |
CN113688354B (en) * | 2021-08-27 | 2023-06-09 | 华东师范大学 | Chi-square box dividing method based on safe multiparty calculation |
CN113704799A (en) * | 2021-09-08 | 2021-11-26 | 深圳前海微众银行股份有限公司 | Method, device, equipment, storage medium and program product for processing box data |
CN113704800A (en) * | 2021-09-08 | 2021-11-26 | 深圳前海微众银行股份有限公司 | Data binning processing method, device, equipment and storage medium based on confusion box |
CN113807736A (en) * | 2021-09-29 | 2021-12-17 | 河南星环众志信息科技有限公司 | Data quality evaluation method, computer equipment and storage medium |
-
2021
- 2021-12-30 CN CN202111654397.2A patent/CN114398671B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110968886A (en) * | 2019-12-20 | 2020-04-07 | 支付宝(杭州)信息技术有限公司 | Method and system for screening training samples of machine learning model |
CN111563267A (en) * | 2020-05-08 | 2020-08-21 | 京东数字科技控股有限公司 | Method and device for processing federal characteristic engineering data |
CN113591133A (en) * | 2021-09-27 | 2021-11-02 | 支付宝(杭州)信息技术有限公司 | Method and device for performing feature processing based on differential privacy |
Also Published As
Publication number | Publication date |
---|---|
CN114398671A (en) | 2022-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20230106151A1 (en) | Multi-party threshold authenticated encryption | |
CN107145791B (en) | K-means clustering method and system with privacy protection function | |
US6038315A (en) | Method and system for normalizing biometric variations to authenticate users from a public database and that ensures individual biometric data privacy | |
Yu et al. | Privacy-preserving data aggregation computing in cyber-physical social systems | |
CN111931249B (en) | Medical secret data statistical analysis method supporting transmission fault-tolerant mechanism | |
CN111581648B (en) | Method of federal learning to preserve privacy in irregular users | |
CN108337092B (en) | Method and system for performing collective authentication in a communication network | |
CN111162894A (en) | Statistical analysis method for outsourcing cloud storage medical data aggregation with privacy protection | |
EP4226568A1 (en) | Updatable private set intersection | |
CN116957064A (en) | Knowledge distillation-based federal learning privacy protection model training method and system | |
CN116318617B (en) | Medical rescue material charity donation method based on RFID and blockchain | |
CN114398671B (en) | Privacy calculation method, system and readable storage medium based on feature engineering IV value | |
CN113364595B (en) | Power grid private data signature aggregation method and device and computer equipment | |
Bhat et al. | Fuzzy extractor and chaos enhanced elliptic curve cryptography for image encryption and authentication | |
CN117439799A (en) | Anti-tampering method for http request data | |
Slimane et al. | A novel image encryption scheme using chaos, hyper-chaos systems and the secure Hash algorithm SHA-1 | |
CN117134945A (en) | Data processing method, system, device, computer equipment and storage medium | |
Li et al. | A Privacy-Preserving Federated Learning Scheme Against Poisoning Attacks in Smart Grid | |
CN111277406A (en) | Block chain-based safe two-direction quantity advantage comparison method | |
CN108462946B (en) | Multidimensional data query method and system based on wireless sensor network | |
CN112491840B (en) | Information modification method, device, computer equipment and storage medium | |
CN111971677A (en) | Tamper-resistant data encoding for mobile devices | |
Bose et al. | A Fully Decentralized Homomorphic Federated Learning Framework | |
CN114362917A (en) | Method for discovering safe verifiable data truth value in mobile crowd sensing | |
CN112913184B (en) | Computing key rotation periods for block cipher based encryption scheme systems and methods |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |