CN109241770B - Information value calculation method and device based on homomorphic encryption and readable storage medium - Google Patents

Information value calculation method and device based on homomorphic encryption and readable storage medium Download PDF

Info

Publication number
CN109241770B
CN109241770B CN201810918870.5A CN201810918870A CN109241770B CN 109241770 B CN109241770 B CN 109241770B CN 201810918870 A CN201810918870 A CN 201810918870A CN 109241770 B CN109241770 B CN 109241770B
Authority
CN
China
Prior art keywords
data
terminal
value
information
tag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810918870.5A
Other languages
Chinese (zh)
Other versions
CN109241770A (en
Inventor
范涛
马国强
刘洋
陈天健
杨强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN201810918870.5A priority Critical patent/CN109241770B/en
Publication of CN109241770A publication Critical patent/CN109241770A/en
Application granted granted Critical
Publication of CN109241770B publication Critical patent/CN109241770B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services

Abstract

The invention discloses a homomorphic encryption-based information value calculation method, equipment and a readable storage medium, wherein the method comprises the following steps: after the second terminal determines intersection sample data carrying the same data identification with the first terminal, the second terminal encrypts a data tag corresponding to the intersection sample data by adopting a homomorphic encryption algorithm to obtain a data tag value; sending a data identifier and a data tag value corresponding to the data tag value to a first terminal, and detecting whether information data sent by the first terminal is received or not, wherein the information data is obtained by the first terminal according to the data identifier and the data tag value; and after receiving the information data, calculating the information value of the characteristic variable corresponding to the information data according to the information data. The invention realizes that the second terminal calculates the information value corresponding to each sample data in the intersection sample data in the first terminal by the method of joint learning with the first terminal under the condition that the first terminal and the second terminal do not leak respective data.

Description

Information value calculation method and device based on homomorphic encryption and readable storage medium
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and an apparatus for calculating an information value based on homomorphic encryption, and a readable storage medium.
Background
Before statistical modeling or machine learning is performed on data, a large amount of feature engineering work needs to be performed, that is, data which is important for modeling or machine learning needs to be selected from a large amount of data. Therefore, the importance of calculating the data features is particularly important.
With the development of scientific technology, the privacy protection of data becomes more and more important. However, many modeling tasks require joint learning with multiple pieces of data to complete the modeling. Therefore, how to calculate the information value of the data through a joint learning method without revealing respective data is an urgent problem to be solved, where the Information Value (IV) is an index representing the importance of data characteristics.
Disclosure of Invention
The invention mainly aims to provide a homomorphic encryption-based information value calculation method, equipment and a readable storage medium, and aims to solve the technical problem of how to calculate the information value of data by a joint learning method under the condition that a plurality of parties do not reveal respective data.
In order to achieve the above object, the present invention provides a homomorphic encryption-based information value calculating method, including the steps of:
after a second terminal determines intersection sample data carrying the same data identification as the first terminal, the second terminal encrypts a data tag corresponding to the intersection sample data by adopting a homomorphic encryption algorithm to obtain a data tag value;
sending a data identifier corresponding to the data tag value and the data tag value to a first terminal, and detecting whether information data sent by the first terminal is received or not, wherein the information data is obtained by the first terminal according to the data identifier and the data tag value;
and after the information data are received, calculating the information value of the characteristic variable corresponding to the information data according to the information data, wherein each data identifier at least corresponds to one characteristic variable.
Preferably, after receiving the information data, the step of calculating the information value of the characteristic variable corresponding to the information data according to the information data includes:
when the information data is received, decrypting the information data to obtain the number of negative samples and the number of positive samples of sample data corresponding to the information data;
calculating the weight value of the characteristic variable corresponding to the information data according to the number of the negative samples and the number of the positive samples;
and calculating to obtain the information value of the characteristic variable corresponding to the information data through the weight value and a preset information value calculation formula.
Preferably, after the second terminal determines that the intersection sample data with the same data identifier as the first terminal carries, the second terminal encrypts the data tag corresponding to the intersection sample data by using a homomorphic encryption algorithm, and before the step of obtaining the data tag value, the method further includes:
after the second terminal receives the encrypted first data identifier sent by the first terminal, the second terminal encrypts the first data identifier for the second time by adopting a preset public key to obtain a first encrypted value;
sending the second data identifier encrypted by the preset public key to the first terminal, and detecting whether a second encrypted value returned after the second data identifier is encrypted by the first terminal is received;
and after receiving the second encrypted value, determining intersection sample data carrying the same data identifier with the first terminal according to the first encrypted value and the second encrypted value.
Preferably, after the step of calculating the information value of the characteristic variable corresponding to the information data according to the information data after receiving the information data, the method further includes:
and after a modeling instruction is received, selecting a characteristic variable required by modeling according to the information value.
In addition, in order to achieve the above object, the present invention further provides a homomorphic encryption-based information value calculating method, including the steps of:
after a first terminal receives a data tag value sent by a second terminal and a data identifier corresponding to the data tag value, the first terminal determines the data tag value belonging to the same category according to the category to which each characteristic value in intersection sample data belongs;
summing the data tag values belonging to the same category to obtain the summed data tag values;
and sending the data identifier corresponding to the summed data tag value and the summed data tag value serving as information data to the second terminal so that the second terminal can calculate the information value of the characteristic variable corresponding to the information data according to the information data, wherein each data identifier corresponds to at least one characteristic variable.
Preferably, after the first terminal receives the data tag value sent by the second terminal and the data identifier corresponding to the data tag value, before the step of determining, by the first terminal, the data tag value belonging to the same class according to the class to which each feature value in the intersection sample data belongs, the method further includes:
after the first terminal determines intersection sample data carrying the same data identification with the second terminal, the first terminal classifies characteristic values corresponding to characteristic variables in the intersection sample data according to a preset mode to determine the category of the characteristic values;
after the first terminal receives the data tag value sent by the second terminal and the data identifier corresponding to the data tag value, the step of determining, by the first terminal, the data tag value belonging to the same category according to the category to which each feature value in the intersection sample data belongs includes:
and after the first terminal receives the data label value and the data identification sent by the second terminal, the first terminal determines the data identification belonging to the same category according to the category to which the characteristic value belongs, and determines the data label value belonging to the same category according to the data identification belonging to the same category.
Preferably, before the step of sending the data identifier corresponding to the summed data tag value and the summed data tag value as information data to the second terminal, the method further includes:
recording data identifications corresponding to the data label values belonging to the same category as target data identifications;
coding the target data identification to obtain the coded data identification;
the step of sending the data identifier corresponding to the summed data tag value and the summed data tag value as information data to the second terminal includes:
and sending the summed data label value and the coded data identifier as information data to the second terminal so that the second terminal can calculate the information value of the characteristic variable corresponding to the information data according to the information data.
Preferably, the step of summing the data tag values belonging to the same category to obtain the summed data tag values includes:
determining a first tag value and a second tag value of the data tag values belonging to the same class;
and summing the first label value and the second label value respectively to obtain the summed data label value.
Further, to achieve the above object, the present invention also provides a homomorphic encryption-based information value computing apparatus including a memory, a processor, and a homomorphic encryption-based information value computing program stored on the memory and executable on the processor, the homomorphic encryption-based information value computing program, when executed by the processor, implementing the steps of the homomorphic encryption-based information value computing method as described above.
Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a homomorphic encryption-based information value calculation program which, when executed by a processor, implements the steps of the homomorphic encryption-based information value calculation method as described above.
After the second terminal determines intersection sample data carrying the same data identification with the first terminal, the second terminal encrypts a data tag corresponding to the intersection sample data by adopting a homomorphic encryption algorithm to obtain a data tag value; sending a data identifier corresponding to the data tag value and the data tag value to a first terminal, and detecting whether information data sent by the first terminal is received; and after receiving the information data, calculating the information value of the information data corresponding to the characteristic variable according to the information data, wherein each data identifier corresponds to at least one characteristic variable. The method and the device have the advantage that the second terminal calculates the information value corresponding to each sample data in the intersection sample data in the first terminal by the method of joint learning with the first terminal under the condition that the first terminal and the second terminal do not reveal respective data.
Drawings
FIG. 1 is a schematic diagram of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a first embodiment of a method for calculating an information value based on homomorphic encryption according to the present invention;
FIG. 3 is a flowchart illustrating a second embodiment of a method for calculating an information value based on homomorphic encryption according to the present invention;
FIG. 4 is a flow chart illustrating a third embodiment of a method for calculating an information value based on homomorphic encryption according to the present invention;
FIG. 5 is a flowchart illustrating a fourth embodiment of a method for calculating an information value based on homomorphic encryption according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, fig. 1 is a schematic structural diagram of a hardware operating environment according to an embodiment of the present invention.
It should be noted that fig. 1 is a schematic structural diagram of a hardware operating environment of an information value computing device based on homomorphic encryption. The information value calculation device based on homomorphic encryption in the embodiment of the invention can be a terminal device such as a PC, a portable computer and the like.
As shown in fig. 1, the homomorphic encryption-based information value calculation apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the homomorphic encryption based information value computing device architecture shown in FIG. 1 does not constitute a limitation of homomorphic encryption based information value computing devices, and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and an information value calculation program based on homomorphic encryption. The operating system is a program for managing and controlling hardware and software resources of the homomorphic encryption-based information value calculating device, and supports the operation of the homomorphic encryption-based information value calculating program and other software or programs.
In the homomorphic encryption-based information value calculation apparatus shown in fig. 1, when the homomorphic encryption-based information value calculation apparatus is a first terminal, the user interface 1003 is mainly used for connecting a second terminal to perform data communication with the second terminal; when the information value calculation device based on homomorphic encryption is the second terminal, the user interface 1003 is mainly used for connecting the first terminal and performing data communication with the first terminal; the network interface 1004 is mainly used for connecting to a background server and performing data communication with the background server. When the homomorphic encryption-based information value calculating apparatus is a second terminal, the processor 1001 may be configured to call the homomorphic encryption-based information value calculating program stored in the memory 1005, and perform the following operations:
after intersection sample data carrying the same data identification with the first terminal is determined, encrypting a data tag corresponding to the intersection sample data by adopting a homomorphic encryption algorithm to obtain a data tag value;
sending a data identifier corresponding to the data tag value and the data tag value to a first terminal, and detecting whether information data sent by the first terminal is received or not, wherein the information data is obtained by the first terminal according to the data identifier and the data tag value;
and after the information data are received, calculating the information value of the characteristic variable corresponding to the information data according to the information data, wherein each data identifier at least corresponds to one characteristic variable.
Further, after receiving the information data, the step of calculating the information value of the characteristic variable corresponding to the information data according to the information data includes:
when the information data is received, decrypting the information data to obtain the number of negative samples and the number of positive samples of sample data corresponding to the information data;
calculating the weight value of the characteristic variable corresponding to the information data according to the number of the negative samples and the number of the positive samples;
and calculating to obtain the information value of the characteristic variable corresponding to the information data through the weight value and a preset information value calculation formula.
Further, before the step of determining intersection sample data carrying the same data identifier with the first terminal, and encrypting the data tag corresponding to the intersection sample data by using a homomorphic encryption algorithm to obtain a data tag value, the processor 1001 may be further configured to invoke an information value calculation program based on homomorphic encryption stored in the memory 1005, and execute the following steps:
after receiving the encrypted first data identifier sent by the first terminal, secondarily encrypting the first data identifier by adopting a preset public key to obtain a first encrypted value;
sending the second data identifier encrypted by the preset public key to the first terminal, and detecting whether a second encrypted value returned after the second data identifier is encrypted by the first terminal is received;
and after receiving the second encrypted value, determining intersection sample data carrying the same data identifier with the first terminal according to the first encrypted value and the second encrypted value.
Further, after the step of calculating the information value of the feature variable corresponding to the information data according to the information data after receiving the information data, the processor 1001 may be further configured to call an information value calculation program based on homomorphic encryption stored in the memory 1005, and perform the following steps:
and after a modeling instruction is received, selecting a characteristic variable required by modeling according to the information value.
Further, when the homomorphic encryption-based information value calculation apparatus is a first terminal, the processor 1001 may be further configured to call a homomorphic encryption-based information value calculation program stored in the memory 1005, and perform the following steps:
after receiving a data tag value sent by a second terminal and a data identifier corresponding to the data tag value, determining the data tag value belonging to the same category according to the category to which each characteristic value in the intersection sample data belongs;
summing the data tag values belonging to the same category to obtain the summed data tag values;
and sending the data identifier corresponding to the summed data tag value and the summed data tag value serving as information data to the second terminal so that the second terminal can calculate the information value of the characteristic variable corresponding to the information data according to the information data, wherein each data identifier corresponds to at least one characteristic variable.
Further, before the step of determining the data tag value belonging to the same class according to the class to which each feature value in the intersection sample data belongs after receiving the data tag value sent by the second terminal and the data identifier corresponding to the data tag value, the processor 1001 may be further configured to call an information value calculation program based on homomorphic encryption stored in the memory 1005, and execute the following steps:
after intersection sample data carrying the same data identification with the second terminal is determined, classifying feature values corresponding to feature variables in the intersection sample data according to a preset mode to determine the category of the feature values;
after receiving the data tag value sent by the second terminal and the data identifier corresponding to the data tag value, the step of determining the data tag value belonging to the same category according to the category to which each feature value in the intersection sample data belongs includes:
and after receiving the data label value and the data identification sent by the second terminal, determining the data identification belonging to the same category according to the category to which the characteristic value belongs, and determining the data label value belonging to the same category according to the data identification belonging to the same category.
Further, before the step of sending the data identifier corresponding to the summed data tag value and the summed data tag value as information data to the second terminal, the processor 1001 may be further configured to call an information value calculation program based on homomorphic encryption stored in the memory 1005, and execute the following steps:
recording data identifications corresponding to the data label values belonging to the same category as target data identifications;
coding the target data identification to obtain the coded data identification;
the step of sending the data identifier corresponding to the summed data tag value and the summed data tag value as information data to the second terminal includes:
and sending the summed data label value and the coded data identifier as information data to the second terminal so that the second terminal can calculate the information value of the characteristic variable corresponding to the information data according to the information data.
Further, the step of summing the data tag values belonging to the same category to obtain the summed data tag values includes:
determining a first tag value and a second tag value of the data tag values belonging to the same class;
and summing the first label value and the second label value respectively to obtain the summed data label value.
Based on the above structure, various embodiments of the information value calculation method based on homomorphic encryption are proposed.
Referring to fig. 2, fig. 2 is a flowchart illustrating a first embodiment of a method for calculating an information value based on homomorphic encryption according to the present invention.
While a logical order is illustrated in the flow charts, in some cases, the steps shown or described may be performed in an order different than presented.
The homomorphic encryption-based information value calculation method is applied to a second terminal, which may include a mobile terminal such as a mobile phone, a tablet computer, a notebook computer, a palm top computer, a Personal Digital Assistant (PDA), etc., and a fixed terminal such as a Digital TV, a desktop computer, etc. The information value calculation method based on homomorphic encryption comprises the following steps:
step S10, after the second terminal determines that the intersection sample data carrying the same data identification with the first terminal, the second terminal encrypts the data label corresponding to the intersection sample data by adopting a homomorphic encryption algorithm to obtain a data label value.
And after the second terminal determines the intersection sample data carrying the same data identifier with the first terminal, the second terminal determines the data tag corresponding to the intersection sample data, and encrypts the data tag corresponding to the intersection sample data by adopting a homomorphic encryption algorithm to obtain a data tag value. It should be noted that, in both the first terminal and the second terminal, there are corresponding sample data, and in the second terminal, one sample data corresponds to one data identifier and one data tag; in the first terminal, the sample data only has a corresponding data identifier and does not have a corresponding data tag. Each sample data at least corresponds to one characteristic variable, and each characteristic variable at least corresponds to one characteristic value. The data identification of the sample data is set by the first terminal and the second terminal according to the same rule.
If the sample data of the first terminal is: { < id 1: x1, x2>, < id2: x1, x2>, < id3: x1, x2> }, and the sample data of the second terminal is: { < id2: x3, x4>, < id3: x3, x4>, < id4: x3, x4>, then the intersection sample data in the second terminal is: { < id2: x3, x4>, < id3: x3, x4> }, the intersection sample data in the first terminal is: { < id2: x1, x2>, < id3: x1, x2> }. Wherein id1, id2, id3 and id4 are data identifiers, x1, x2, x3 and x4 are feature variables corresponding to sample data, each feature variable has a corresponding feature value, for example, the feature variable x1 represents age, and the corresponding feature values are 0, 5, 16, 25 and 50, and are denoted as x1 ═ {0, 5, 16, 25, 50 }.
Homomorphic encryption is a cryptographic technique based on the computational complexity theory of mathematical problems, processes homomorphic encrypted data to obtain an output, decrypts the output, and has the same result as the output obtained by processing unencrypted original data in the same method.
In the present embodiment, there are two kinds of data tags, the first data tag is denoted by "0" and the second data tag is denoted by "1". In other embodiments, three data tags or four data tags may be provided.
Step S20, sending the data identifier corresponding to the data tag value and the data tag value to a first terminal, and detecting whether information data sent by the first terminal is received, where the information data is obtained by the first terminal according to the data identifier and the data tag value.
And after the second terminal obtains the data tag value, the second terminal obtains a data identifier corresponding to the intersection sample data, sends the data tag value and the data identifier to the first terminal, and detects whether information data sent by the first terminal is received or not, wherein the information data is obtained by the first terminal according to the data identifier and the data tag value. In the second terminal, one data sample corresponds to one data tag and one data identifier, and one data tag corresponds to one data tag value, so that in the second terminal, one data tag value corresponds to one data identifier, and the second terminal acquires the data identifier corresponding to the intersection sample data, that is, acquires the data identifier corresponding to each data tag value. It should be noted that, in the process of sending the data identifier and the data tag value to the first terminal, the second terminal may send the data identifier and the data tag value to the first terminal correspondingly, that is, after receiving the data tag value, the first terminal may send the data identifier corresponding to the data tag value according to the data identifier corresponding to the data tag value.
After the first terminal receives the data identifier and the data tag value sent by the second terminal, the first terminal determines the data tag value belonging to the same category, sums the data tag values belonging to the same category to obtain a summed data tag value, and sends the data identifier corresponding to the data tag value belonging to the same category and the corresponding summed data tag value as information data to the second terminal.
In this embodiment, Encry (y) is used to represent a data tag value with a data tag of "1", Encry (1-y) is used to represent a data tag value with a data tag of "0", and id is used to represent a corresponding data identifier, then the data identifier and the data tag value sent by the second terminal to the first terminal may be represented as: { id, Encry (y), Encry (1-y) }.
Step S30, after receiving the information data, calculating an information value of a characteristic variable corresponding to the information data according to the information data, where each data identifier corresponds to at least one characteristic variable.
And after the second terminal receives the information data sent by the first terminal, the second terminal calculates the information value of the characteristic variable corresponding to the information data according to the information data, wherein each data identifier corresponds to at least one characteristic variable. It should be noted that the information value of the characteristic variable corresponding to the information data is an information value of the characteristic variable corresponding to the data identifier belonging to the same category.
Further, step S30 includes:
step a, after the information data is received, decrypting the information data to obtain the number of negative samples and the number of positive samples of the sample data corresponding to the information data.
And after the second terminal receives the information data sent by the first terminal, the second terminal decrypts the information data to obtain the number of negative samples and the number of positive samples of the sample data corresponding to the information data. The information data can be identified as { id _ set _ i, sum _1, sum _2}, id _ set _ i represents a data identifier in the information data, sum _1 represents the sum of data tag values with a data tag of "1" in the information data, and sum _2 represents the sum of data tag values with a data tag of "0" in the information data. If id _ set _ i is { id3, id6, id7, id8}, and the data tag corresponding to id3 and id8 is "0", and the data tag corresponding to id6 and id7 is "1", sum _1 is sum (Encry (y6) + Encry (y7)), sum _2 is sum (Encry (1-y3) + Encry (1-y8)), Encry (y6) is the data tag value corresponding to id6, Encry (y7) is the data tag value corresponding to id7, Encry (1-y3) is the data tag value corresponding to id3, and Encry (1-y8) is the data tag value corresponding to id 8.
In this embodiment, the number of negative samples is the number of samples carrying a data tag of "0" in a certain category of the first terminal, and the number of positive samples is the number of samples carrying a data tag of "1" in a certain category of the first terminal. And after the information data is decrypted, namely the sum _1 and the sum _2 in the information data are decrypted, the data label values in the sum _1 and the sum _2 are obtained, and the number of the positive samples and the number of the negative samples can be determined according to the number of the data label values corresponding to the sum _1 and the sum _ 2. If sum _1 has 3 data tag values, determining the number of positive samples to be 3; when there are 4 data tag values in sum _2, the number of negative samples is determined to be 4.
Further, in order to improve the security of the data transmitted by the first terminal and the second terminal, the first terminal may encode the data identifier in the information data to obtain an encoded data identifier, and send the encoded data identifier to the second terminal together with sum _1 and sum _ 2. And after the second terminal receives the coded data identifier, the data identifier can be decoded to obtain an original data identifier.
And b, calculating the weight value of the characteristic variable corresponding to the information data according to the number of the negative samples and the number of the positive samples.
And after the second terminal obtains the number of the negative samples and the number of the positive samples, the second terminal calculates the weight value of the characteristic variable corresponding to the information data according to the number of the negative samples and the number of the positive samples. Specifically, the second terminal divides the number of the negative samples by the total number of the samples carrying the same data labels as the negative samples in the intersection sample data to obtain the weight values of the negative samples of the corresponding categories in the corresponding characteristic variables of the information data; and dividing the number of the positive samples by the total number of the samples carrying the same data labels as the positive samples in the intersection sample data to obtain the weight values of the positive samples of the corresponding categories in the corresponding characteristic variables of the information data. And after the second terminal obtains the weight value corresponding to the positive sample and the weight value corresponding to the negative sample, the second terminal calculates the weight value of the category corresponding to the characteristic variable through a preset weight formula. The weight formula is: woe _ i is 100 × log (distpos _ i/distneg _ i), where distpos _ i is a positive sample weight value, distneg _ i is a negative sample weight value, and Woe _ i represents a weight value corresponding to a certain category in intersection sample data of the first terminal, that is, a weight value of a feature variable corresponding to information data.
And c, calculating to obtain the information value of the characteristic variable corresponding to the information data through the weighted value and a preset information value calculation formula.
And after the second terminal calculates the weight value, the second terminal calculates the information value of the category of the characteristic variable corresponding to the information data according to the calculated weight value and a preset information value calculation formula. The preset information value calculation formula is as follows:
Figure BDA0001761539880000111
namely, the preset information value is calculated by the formula
Figure BDA0001761539880000112
IV denotes the corresponding information value. It should be noted that the IV value in this embodiment is only an information value of a certain category to which a certain characteristic variable belongs, and the information value corresponding to the characteristic variable is equal to the sum of the information values of all the categories corresponding to the characteristic variable. For example, when the feature variable x1 corresponds to 4 categories, that is, the feature variable x1 corresponds to feature values belonging to 4 categories, and the information values corresponding to these 4 categories are IV1, IV2, IV3, and IV4, respectively, the information value of the feature variable x1 is IV1+ IV2+ IV3+ IV 4.
In this embodiment, after the second terminal determines that the intersection sample data carrying the same data identifier as the first terminal, the second terminal encrypts the data tag corresponding to the intersection sample data by using a homomorphic encryption algorithm to obtain a data tag value; sending a data identifier corresponding to the data tag value and the data tag value to a first terminal, and detecting whether information data sent by the first terminal is received; and after receiving the information data, calculating the information value of the information data corresponding to the characteristic variable according to the information data, wherein each data identifier corresponds to at least one characteristic variable. The method and the device have the advantage that the second terminal calculates the information value corresponding to each sample data in the intersection sample data in the first terminal by the method of joint learning with the first terminal under the condition that the first terminal and the second terminal do not reveal respective data.
Further, a second embodiment of the method for computing an information value based on homomorphic encryption according to the present invention is proposed.
The second embodiment of the homomorphic encryption-based information value computing method differs from the first embodiment of the homomorphic encryption-based information value computing method in that, with reference to fig. 3, the homomorphic encryption-based information value computing method further includes:
step S40, after the second terminal receives the encrypted first data identifier sent by the first terminal, the second terminal encrypts the first data identifier for the second time by using a preset public key to obtain a first encrypted value.
And after the second terminal receives the encrypted first data identifier sent by the first terminal, the second terminal encrypts the first data identifier for the second time by adopting the preset public key to obtain the secondarily encrypted first data identifier, and records the secondarily encrypted first data identifier as a first encrypted value. It should be noted that the encrypted first data identifier sent by the first terminal is obtained after the data identifier corresponding to the sample data held by the first terminal is encrypted by the first terminal, and specifically, the first terminal may encrypt the first data identifier by using a public key generated in advance by the first terminal. The public key used for the encryption of the first terminal and the second terminal is generated by an asymmetric encryption algorithm.
Step S50, sending the second data identifier encrypted by using the preset public key to the first terminal, and detecting whether a second encrypted value returned after the second data identifier is encrypted by the first terminal is received.
And the second terminal sends the second data identifier encrypted by the preset public key to the first terminal, and detects whether a second encrypted value returned after the second data identifier is encrypted by the first terminal is received. And the second data identifier is a data identifier corresponding to the second terminal sample data. And after the first terminal receives the encrypted second data identifier sent by the second terminal, the first terminal encrypts the second data identifier for the second time by using the public key of the first terminal, records the second data identifier subjected to the second encryption as a second encrypted value, and sends the second encrypted value to the second terminal.
Step S60, after receiving the second encrypted value, determining intersection sample data carrying the same data identification with the first terminal according to the first encrypted value and the second encrypted value.
And after the second terminal receives the second encryption value sent by the first terminal, the second terminal judges whether the first encryption value is equal to the second encryption value. If the first encryption value is equal to the second encryption value, the second terminal determines the sample data correspondingly carrying the second data identifier as intersection sample data; and if the first encryption value is not equal to the second encryption value, the second terminal determines that the sample data carrying the second data identifier is not the intersection sample data. It will be appreciated that when the first cryptographic value is equal to the second cryptographic value, it indicates that the first data identity corresponding to the first cryptographic value is the same as the second data identity corresponding to the second cryptographic value.
If the public key of the first terminal is pub _ a and the public key of the second terminal is pub _ b, the process of determining the intersection sample data is as follows: (1) the first terminal encrypts id _ a (first data identity) with its public key pub _ a: and d _ a _ fa is f (id _ a, pub _ a), then id _ a _ fa is sent to the second terminal, and the second terminal encrypts the id _ a encryption string again by using the public key pub _ b to obtain d _ a _ fa _ fb which is f (id _ a _ fa, pub _ b). (2) The second terminal encrypts id _ b (second data identifier) by using the public key pub _ b: id _ b _ fb ═ f (id _ b, pub _ b), and then id _ b _ fb is sent to the first terminal, which re-encrypts the id _ b encryption string with the public key pub _ a: and id _ b _ fb _ fa is f (id _ b _ fb, pub _ a), and then id _ b _ fb _ fa is transmitted to the second terminal. (3) The second terminal compares id _ a _ fa _ fb (first encrypted value) and id _ b _ fb _ fa (second encrypted value), and if the two encrypted strings are equal, it means that id _ a and id _ b are the same.
In the embodiment, the intersection sample data of the first terminal and the sample data of the second terminal are obtained under the condition that the data owned by the first terminal and the second terminal are not revealed, so that the data safety of the first terminal and the second terminal is improved in the process of calculating the data information value.
Further, a third embodiment of the inventive method for computing an information value based on homomorphic encryption is proposed.
The third embodiment of the homomorphic encryption-based information value calculating method differs from the first or second embodiment of the homomorphic encryption-based information value calculating method in that, with reference to fig. 4, the homomorphic encryption-based information value calculating method further includes:
and step S70, after receiving the modeling command, selecting the characteristic variables required by modeling according to the information values.
And when the second terminal receives the modeling instruction, the second terminal selects the characteristic variable required by modeling according to the information value. Wherein the modeling instruction can be triggered by a corresponding user according to needs. Specifically, the second terminal may determine whether an information value corresponding to the characteristic variable is greater than or equal to a preset threshold value in the process of modeling the required characteristic variable. When the information value of a certain characteristic variable is determined to be larger than or equal to a preset threshold value, the second terminal takes the characteristic variable as a modeling data source; when the information value of a certain characteristic variable is determined to be smaller than the preset threshold, the second terminal does not consider the characteristic variable in the modeling process, or reduces the weight of the characteristic variable in the modeling process.
According to the method and the device, the data required by modeling are selected through the information values, so that the accuracy of the established model is improved, and the modeling efficiency is improved.
In addition, the embodiment of the present invention also provides a method for calculating information values based on homomorphic encryption, and the embodiment of the present invention provides an embodiment of a method for calculating information values based on homomorphic encryption, and it should be noted that, although a logical order is shown in a flowchart, in some cases, the steps shown or described may be performed in an order different from that here.
The homomorphic encryption-based information value calculation method is applied to a first terminal, which may include a mobile terminal such as a mobile phone, a tablet computer, a notebook computer, a palm top computer, a Personal Digital Assistant (PDA), etc., and a fixed terminal such as a Digital TV, a desktop computer, etc. Referring to fig. 5, homomorphic encryption-based information value calculation includes:
step S110, after the first terminal receives the data tag value sent by the second terminal and the data identifier corresponding to the data tag value, the first terminal determines the data tag value belonging to the same category according to the category to which each characteristic value in the intersection sample data belongs.
After the first terminal receives the data tag value sent by the second terminal and the data identifier corresponding to the data tag value, the first terminal determines the category of each characteristic value in intersection sample data carrying the same data identifier with the second terminal, determines the data tag value belonging to the same category in the received data tag values according to the category to which each characteristic value belongs, and divides the data tag values corresponding to the characteristic values belonging to the same category into one category. It should be noted that each feature value has a corresponding data identifier, and each data tag value also has a corresponding data identifier, so that the data tag values belonging to the same category can be determined by the data identifiers.
For example, when the feature values in the first terminal intersection sample data are a, b, c, d and e, the corresponding data identifiers are id1, id2, id3, id4 and id5, the data tag values received by the first terminal are encry (a), encry (b), encry (c), encry (d) and encry (e), and the corresponding data identifiers are id1, id2, id3, id4 and id5, if a, b and e are one type and c and d are one type in the first terminal, encry (a), encry (b) and encry (e) can be determined as one type, and encry (c) and encry (d) are one type.
It should be noted that the principle of the process of determining the intersection sample data by the first terminal is the same as that of the process of determining the intersection sample data by the second terminal, and details are not repeated in this embodiment. It is understood that in the intersection sample data of the first terminal and the second terminal, the corresponding data identifications are the same, but the feature variables corresponding to the same data identifications may not be the same.
Step S120, summing the data label values belonging to the same category to obtain the summed data label values.
And when the first terminal determines the data tag values belonging to the same category in the received data tag values, summing the data tag values belonging to the same category to obtain the summed data tag values.
Further, step S120 includes:
and d, determining a first label value and a second label value in the data label values belonging to the same category.
Specifically, the first terminal determines a first tag value and a second tag value among data tag values belonging to the same category. In this embodiment, only the first tag value and the second tag value exist because only two data tags exist. If there are three types of data tags, there will be a first tag value, a second tag value, and a third tag value. In this embodiment, the first tag value and the second tag value are obtained according to the difference between the data tags corresponding to the data tag values. For example, a data tag value with a data tag of "0" may be used as the first tag value, and a data tag with a data tag of "1" may be used as the second tag value.
And e, summing the first label value and the second label value respectively to obtain the summed data label value.
After the first terminal determines a first tag value and a second tag value in the data tag values belonging to the same category, the first terminal sums the first tag value and the second tag value respectively, that is, the first tag values are added, and the second tag values are added to obtain the summed data tag value.
Step S130, sending the data identifier corresponding to the summed data tag value and the summed data tag value as information data to the second terminal, so that the second terminal calculates an information value of a characteristic variable corresponding to the information data according to the information data, where each data identifier corresponds to at least one characteristic variable.
After the first terminal obtains the summed data tag values, the first terminal determines the data identifiers corresponding to the summed data tag values, and sends the data identifiers corresponding to the summed data tag values and the summed data tag values as information data to the second terminal, so that the second terminal can calculate the information values of the characteristic variables corresponding to the information data according to the information data, namely the second terminal calculates the information values of the characteristic variables corresponding to the data identifiers according to the summed data tag values, wherein each data identifier at least corresponds to one characteristic variable, and one characteristic variable at least corresponds to one characteristic value. It should be noted that only the corresponding data identifier exists in the sample data of the first terminal, and the corresponding data tag does not exist, and the corresponding data identifier and the corresponding data tag exist in the sample data of the second terminal.
The first terminal of this embodiment obtains the information data according to the data tag value and the data identifier sent by the second terminal, and sends the information data to the second terminal, so that the second terminal calculates the information value of the feature variable corresponding to the information data according to the information data, and the second terminal calculates the information value corresponding to each sample data in the intersection sample data in the first terminal by using a method of jointly learning with the first terminal under the condition that the first terminal and the second terminal do not reveal respective data.
Further, a fifth embodiment of the method for computing an information value based on homomorphic encryption according to the present invention is proposed.
The fifth embodiment of the homomorphic encryption-based information value calculating method is different from the fourth embodiment of the homomorphic encryption-based information value calculating method in that the homomorphic encryption-based information value calculating method further includes:
and f, after the first terminal determines intersection sample data carrying the same data identification with the second terminal, classifying the characteristic values corresponding to the characteristic variables in the intersection sample data according to a preset mode by the first terminal so as to determine the category of the characteristic values.
After the first terminal determines intersection sample data carrying the same data identification with the second terminal, the first terminal classifies characteristic values corresponding to characteristic variables in the intersection sample data according to a preset mode to determine the category of the characteristic values. Specifically, one characteristic variable may correspond to one or more characteristic values, and the first terminal may classify the characteristic values corresponding to the characteristic variable according to an equidistance or equal frequency method. It should be noted that, in the first terminal, the preset modes corresponding to one feature variable are the same, for example, the modes of classifying feature values corresponding to the feature variable of age are classified at intervals of 10 years, and the modes of classifying feature values corresponding to the feature variable of price are classified at intervals of 1000 yuan.
It is understood that the first terminal may also classify the feature values corresponding to the feature variables according to specific needs. If the eigenvalues corresponding to the characteristic variable x1 have 0, 5, 16, 25, and 50, and are denoted as x1 { [0, 5, 16, 25, 50}, and are classified into x1 { [0-10], [0-10], (10-20], (20-40], >40} forms, it is understood that 0 and 5 belong to the category [0-10], 16 belongs to the category (10-20], 25 belongs to the category (20-40), and 50 belongs to the category larger than 40.
Step S110 includes:
and g, after the first terminal receives the data label value and the data identification sent by the second terminal, the first terminal determines the data identification belonging to the same category according to the category to which the characteristic value belongs, and determines the data label value belonging to the same category according to the data identification belonging to the same category.
And when the first terminal receives the data label value and the data identification sent by the second terminal, the first terminal determines the data identification belonging to the same category according to the category to which the characteristic value belongs, and determines the data label value belonging to the same category according to the data identification belonging to the same category. It is understood that the data identifications corresponding to the feature values belonging to the same category also belong to the same category.
In this embodiment, the feature values corresponding to the feature variables in the intersection sample data of the first terminal are classified, and then the data identifier and the data tag value belonging to the same class are determined according to the class to which the feature values belong, so that the joint learning of the first terminal and the second terminal is realized on the basis that the first terminal and the second terminal do not provide respective sample data.
Further, a sixth embodiment of the present invention is presented based on a homomorphic cryptographic information value calculation method.
The sixth embodiment of the homomorphic encryption-based information value calculating method differs from the fourth or fifth embodiment in that the homomorphic encryption-based information value calculating method further includes:
and h, recording the data identifier corresponding to the data label value belonging to the same category as a target data identifier.
And i, encoding the target data identification to obtain the encoded data identification.
After the first terminal determines the data tag values and the data identifiers belonging to the same category, the first terminal marks the data identifiers corresponding to the data tag values belonging to the same category as target data identifiers, encodes the target data identifiers, and obtains the encoded data identifiers. In this embodiment, the encoding manner of the first terminal encoding target data identifier is not particularly limited.
The step S130 includes:
and j, sending the summed data label value and the coded data identifier as information data to the second terminal so that the second terminal can calculate the information value of the characteristic variable corresponding to the information data according to the information data.
And after the first terminal obtains the coded data identification, the first terminal sends the summed data label value and the coded data identification as information data to the second terminal so that the second terminal can calculate the information value of the characteristic variable corresponding to the information data according to the information data.
In the embodiment, the data identifier of a certain category is encoded, and the encoded data identifier is sent to the second terminal, so that the data identifier sent to the second terminal is protected, and the security of data transmission of the first terminal and the second terminal in the process of calculating the characteristic variable information value is improved.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, on which an information value calculation program based on homomorphic encryption is stored, and when being executed by a processor, the information value calculation program based on homomorphic encryption realizes the steps of the information value calculation method based on homomorphic encryption as described above.
The specific implementation of the computer-readable storage medium of the present invention is substantially the same as the embodiments of the above-mentioned information value calculation method based on homomorphic encryption, and will not be described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (9)

1. An information value calculating method based on homomorphic encryption, which is characterized by comprising the following steps:
after a second terminal determines intersection sample data carrying the same data identification as the first terminal, the second terminal encrypts a data tag corresponding to the intersection sample data by adopting a homomorphic encryption algorithm to obtain a data tag value;
sending a data identifier corresponding to the data tag value and the data tag value to a first terminal, and detecting whether information data sent by the first terminal is received or not, wherein the information data is obtained by the first terminal according to the data identifier and the data tag value;
after the information data are received, calculating the information values of the characteristic variables corresponding to the information data according to the information data, wherein each data identifier at least corresponds to one characteristic variable;
after the information data is received, the step of calculating the information value of the characteristic variable corresponding to the information data according to the information data comprises the following steps:
when the information data is received, decrypting the information data to obtain the number of negative samples and the number of positive samples of sample data corresponding to the information data;
calculating the weight value of the characteristic variable corresponding to the information data according to the number of the negative samples and the number of the positive samples;
and calculating to obtain the information value of the characteristic variable corresponding to the information data through the weight value and a preset information value calculation formula.
2. The method for calculating an information value based on homomorphic encryption according to claim 1, wherein, after the second terminal determines that the intersection sample data carrying the same data identifier as the first terminal, the second terminal encrypts the data tag corresponding to the intersection sample data by using a homomorphic encryption algorithm, before the step of obtaining the data tag value, the method further comprises:
after the second terminal receives the encrypted first data identifier sent by the first terminal, the second terminal encrypts the first data identifier for the second time by adopting a preset public key to obtain a first encrypted value;
sending the second data identifier encrypted by the preset public key to the first terminal, and detecting whether a second encrypted value returned after the second data identifier is encrypted by the first terminal is received;
and after receiving the second encrypted value, determining intersection sample data carrying the same data identifier with the first terminal according to the first encrypted value and the second encrypted value.
3. The homomorphic encryption-based information value calculating method according to any one of claims 1 to 2, wherein after the step of calculating the information value of the characteristic variable corresponding to the information data based on the information data after receiving the information data, further comprising:
and after a modeling instruction is received, selecting a characteristic variable required by modeling according to the information value.
4. An information value calculating method based on homomorphic encryption, which is characterized by comprising the following steps:
after a first terminal receives a data tag value sent by a second terminal and a data identifier corresponding to the data tag value, the first terminal determines the data tag value belonging to the same category according to the category to which each characteristic value in intersection sample data belongs, wherein the second terminal encrypts the data tag corresponding to the intersection sample data by adopting a homomorphic encryption algorithm to obtain the data tag value;
summing the data tag values belonging to the same category to obtain the summed data tag values;
sending the data identifier corresponding to the summed data tag value and the summed data tag value serving as information data to the second terminal so that the second terminal can decrypt the information data to obtain the number of negative samples and the number of positive samples of sample data corresponding to the information data; calculating the weight value of the characteristic variable corresponding to the information data according to the number of the negative samples and the number of the positive samples; and calculating to obtain the information values of the characteristic variables corresponding to the information data through the weight values and a preset information value calculation formula, wherein each data identifier at least corresponds to one characteristic variable.
5. The method for calculating the information value based on homomorphic encryption according to claim 4, wherein, before the step of the first terminal determining the data tag value belonging to the same category according to the category to which each feature value in the intersection sample data belongs after the first terminal receives the data tag value and the data identifier corresponding to the data tag value sent by the second terminal, the method further comprises:
after the first terminal determines intersection sample data carrying the same data identification with the second terminal, the first terminal classifies characteristic values corresponding to characteristic variables in the intersection sample data according to a preset mode to determine the category of the characteristic values;
after the first terminal receives the data tag value sent by the second terminal and the data identifier corresponding to the data tag value, the step of determining, by the first terminal, the data tag value belonging to the same category according to the category to which each feature value in the intersection sample data belongs includes:
and after the first terminal receives the data label value and the data identification sent by the second terminal, the first terminal determines the data identification belonging to the same category according to the category to which the characteristic value belongs, and determines the data label value belonging to the same category according to the data identification belonging to the same category.
6. The homomorphic encryption-based information value calculating method according to claim 4, wherein before the step of sending the data identifier corresponding to the summed data tag value and the summed data tag value as information data to the second terminal, further comprising:
recording data identifications corresponding to the data label values belonging to the same category as target data identifications;
coding the target data identification to obtain the coded data identification;
the step of sending the data identifier corresponding to the summed data tag value and the summed data tag value as information data to the second terminal includes:
and sending the summed data label value and the coded data identifier as information data to the second terminal so that the second terminal can calculate the information value of the characteristic variable corresponding to the information data according to the information data.
7. A method of homomorphic encryption based computation of information values according to any of claims 4 to 6 wherein said step of summing said data tag values belonging to the same class to obtain said summed data tag values comprises:
determining a first tag value and a second tag value of the data tag values belonging to the same class;
and summing the first label value and the second label value respectively to obtain the summed data label value.
8. A homomorphic encryption based information value computing apparatus, characterized in that the homomorphic encryption based information value computing apparatus comprises a memory, a processor and a homomorphic encryption based information value computing program stored on the memory and executable on the processor, the homomorphic encryption based information value computing program when executed by the processor implementing the steps of the homomorphic encryption based information value computing method according to any one of claims 1 to 3 or claims 4 to 7.
9. A computer-readable storage medium, having stored thereon a homomorphic encryption based informational value calculation program that, when executed by a processor, performs the steps of the homomorphic encryption based informational value calculation method of any of claims 1 through 3, or claims 4 through 7.
CN201810918870.5A 2018-08-10 2018-08-10 Information value calculation method and device based on homomorphic encryption and readable storage medium Active CN109241770B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810918870.5A CN109241770B (en) 2018-08-10 2018-08-10 Information value calculation method and device based on homomorphic encryption and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810918870.5A CN109241770B (en) 2018-08-10 2018-08-10 Information value calculation method and device based on homomorphic encryption and readable storage medium

Publications (2)

Publication Number Publication Date
CN109241770A CN109241770A (en) 2019-01-18
CN109241770B true CN109241770B (en) 2021-11-09

Family

ID=65071197

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810918870.5A Active CN109241770B (en) 2018-08-10 2018-08-10 Information value calculation method and device based on homomorphic encryption and readable storage medium

Country Status (1)

Country Link
CN (1) CN109241770B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032878B (en) * 2019-03-04 2021-11-02 创新先进技术有限公司 Safety feature engineering method and device
CN110851869B (en) * 2019-11-14 2023-09-19 深圳前海微众银行股份有限公司 Sensitive information processing method, device and readable storage medium
CN111047051B (en) * 2019-12-20 2023-03-31 支付宝(杭州)信息技术有限公司 Method and system for screening training samples of machine learning model
CN110968886B (en) * 2019-12-20 2022-12-02 支付宝(杭州)信息技术有限公司 Method and system for screening training samples of machine learning model
CN111563267B (en) * 2020-05-08 2024-04-05 京东科技控股股份有限公司 Method and apparatus for federal feature engineering data processing
CN111371544B (en) * 2020-05-27 2020-09-08 支付宝(杭州)信息技术有限公司 Prediction method and device based on homomorphic encryption, electronic equipment and storage medium
CN112529101A (en) * 2020-12-24 2021-03-19 深圳前海微众银行股份有限公司 Method and device for training classification model, electronic equipment and storage medium
CN112711765A (en) * 2020-12-30 2021-04-27 深圳前海微众银行股份有限公司 Sample characteristic information value determination method, terminal, device and storage medium
CN112468521B (en) * 2021-02-01 2021-05-07 支付宝(杭州)信息技术有限公司 Data processing method and device based on privacy protection and server

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1310825A (en) * 1998-06-23 2001-08-29 微软公司 Methods and apparatus for classifying text and for building a text classifier
CN103294959A (en) * 2013-05-29 2013-09-11 南京信息工程大学 Text information hiding method resistant to statistic analysis
CN103559205A (en) * 2013-10-09 2014-02-05 山东省计算中心 Parallel feature selection method based on MapReduce
CN104463208A (en) * 2014-12-09 2015-03-25 北京工商大学 Multi-view semi-supervised collaboration classification algorithm with combination of agreement and disagreement label rules
CN105577379A (en) * 2014-10-16 2016-05-11 阿里巴巴集团控股有限公司 Information processing method and apparatus thereof
CN107133628A (en) * 2016-02-26 2017-09-05 阿里巴巴集团控股有限公司 A kind of method and device for setting up data identification model
CN108022146A (en) * 2017-11-14 2018-05-11 深圳市牛鼎丰科技有限公司 Characteristic item processing method, device, the computer equipment of collage-credit data
CN108345567A (en) * 2018-01-31 2018-07-31 天津大学 Feature selection approach based on conditional mutual information

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6265783B2 (en) * 2014-03-06 2018-01-24 キヤノン株式会社 Encryption / decryption system, control method therefor, and program
US10481919B2 (en) * 2015-03-23 2019-11-19 Tibco Software Inc. Automatic optimization of continuous processes
US9946799B2 (en) * 2015-04-30 2018-04-17 Microsoft Technology Licensing, Llc Federated search page construction based on machine learning
CN106856441A (en) * 2017-01-23 2017-06-16 北京市天元网络技术股份有限公司 VIM systems of selection and device in NFVO
CN107864116A (en) * 2017-06-22 2018-03-30 平安科技(深圳)有限公司 Data transmission method, terminal and computer-readable recording medium
CN107871087B (en) * 2017-11-08 2020-10-30 广西师范大学 Personalized differential privacy protection method for high-dimensional data release in distributed environment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1310825A (en) * 1998-06-23 2001-08-29 微软公司 Methods and apparatus for classifying text and for building a text classifier
CN103294959A (en) * 2013-05-29 2013-09-11 南京信息工程大学 Text information hiding method resistant to statistic analysis
CN103559205A (en) * 2013-10-09 2014-02-05 山东省计算中心 Parallel feature selection method based on MapReduce
CN105577379A (en) * 2014-10-16 2016-05-11 阿里巴巴集团控股有限公司 Information processing method and apparatus thereof
CN104463208A (en) * 2014-12-09 2015-03-25 北京工商大学 Multi-view semi-supervised collaboration classification algorithm with combination of agreement and disagreement label rules
CN107133628A (en) * 2016-02-26 2017-09-05 阿里巴巴集团控股有限公司 A kind of method and device for setting up data identification model
CN108022146A (en) * 2017-11-14 2018-05-11 深圳市牛鼎丰科技有限公司 Characteristic item processing method, device, the computer equipment of collage-credit data
CN108345567A (en) * 2018-01-31 2018-07-31 天津大学 Feature selection approach based on conditional mutual information

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
New feature selection method based on neural network and machine learning;Nicole Challita 等;《2016 IEEE International Multidisciplinary Conference on Engineering Technology (IMCET)》;20161208;全文 *
基于条件相关的特征选择方法;刘杰 等;《吉林大学学报(工学版)》;20180531;第48卷(第3期);第874-881页 *
用信息值进行特征选择(Information Value);stardsd;《https://www.cnblogs.com/sddai/p/6113992.html》;20161129;全文 *

Also Published As

Publication number Publication date
CN109241770A (en) 2019-01-18

Similar Documents

Publication Publication Date Title
CN109241770B (en) Information value calculation method and device based on homomorphic encryption and readable storage medium
CN109325357B (en) RSA-based information value calculation method, device and readable storage medium
WO2020238677A1 (en) Data processing method and apparatus, and computer readable storage medium
EP3418950A1 (en) Data exchange method, data exchange device and computing device
CN107786331B (en) Data processing method, device, system and computer readable storage medium
JP2018054765A (en) Data processing device, data processing method, and program
CN110851869A (en) Sensitive information processing method and device and readable storage medium
US20050105719A1 (en) Personal information control and processing
US20140033267A1 (en) Type mining framework for automated security policy generation
CN104838388A (en) Versatile and reliable intelligent package
KR20220041704A (en) Multi-model training method and device based on feature extraction, an electronic device, and a medium
CN112287372B (en) Method and apparatus for protecting clipboard privacy
CN108848058A (en) Intelligent contract processing method and block catenary system
US20210143975A1 (en) System and method for performing homomorphic aggregation over encrypted data
CN111191255A (en) Information encryption processing method, server, terminal, device and storage medium
WO2023216494A1 (en) Federated learning-based user service strategy determination method and apparatus
CN114218322B (en) Data display method, device, equipment and medium based on ciphertext transmission
CN110738323B (en) Method and device for establishing machine learning model based on data sharing
CN112149706A (en) Model training method, device, equipment and medium
CN111368196A (en) Model parameter updating method, device, equipment and readable storage medium
WO2023134055A1 (en) Privacy-based federated inference method and apparatus, device, and storage medium
CN110516467B (en) Data distribution method and device, storage medium and terminal
CN114881247A (en) Longitudinal federal feature derivation method, device and medium based on privacy computation
CN111414636A (en) Method, device and equipment for updating recognition model and storage medium
CN112329057A (en) Document management method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant