CN113704800A

CN113704800A - Data binning processing method, device, equipment and storage medium based on confusion box

Info

Publication number: CN113704800A
Application number: CN202111048168.6A
Authority: CN
Inventors: 谭明超; 马国强; 范涛; 杨强
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2021-09-08
Filing date: 2021-09-08
Publication date: 2021-11-26

Abstract

The invention discloses a data binning processing method, a device, equipment and a storage medium based on a confusion box, wherein the method comprises the following steps: the method comprises the steps of obtaining a plurality of data identifications sent by a second party and encrypted labels and anti-labels corresponding to the data identifications, carrying out box separation operation on the data identifications according to characteristic variables corresponding to the data identifications stored locally to obtain a plurality of real boxes, randomly generating a plurality of confusion boxes, calculating the encrypted positive sample proportion and the encrypted negative sample proportion corresponding to each box for each box, sending indication information of each box to the second party, obtaining encrypted intermediate results corresponding to each box sent by the second party, selecting intermediate results corresponding to the real boxes from the encrypted intermediate results, and calculating encrypted evidence weight and/or information value according to the intermediate results corresponding to the real boxes.

Description

Data binning processing method, device, equipment and storage medium based on confusion box

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a data binning processing method, device, equipment and storage medium based on a confusion box.

Background

In the field of artificial intelligence, federal learning is increasingly used. Federal learning can combine a plurality of participants to carry out model training, promotes artificial intelligence's effect. In the process of constructing the federal learning model, the Evidence Weight (WOE) and the Information Value (IV) of the characteristic variables are required to be considered.

When the evidence weight and the information value are determined, the participant holding the label sends the label to the participant holding the characteristic variable, the participant performs box separation operation on the sample according to the characteristic variable and returns the sum of the labels of each box, and the participant holding the label can calculate the evidence weight and the information value according to the sum of the labels.

However, if the party holding the tag maliciously encodes the tag by one-hot encoding, the binned information may be reversely deduced from the sum of the returned tags, and data of the party holding the characteristic variable may be leaked, so that security in determining the evidence weight and the information value is poor.

Disclosure of Invention

The invention mainly aims to provide a data binning processing method, a data binning processing device, data binning processing equipment and a data binning processing storage medium based on a confusion box, and aims to improve safety in evidence weight and information value determination.

In order to achieve the above object, the present invention provides a data binning processing method based on a confusion box, wherein the method is applied to a first participant, and the method comprises the following steps:

acquiring a plurality of data identifications sent by a second party and an encrypted label and an anti-label corresponding to each data identification;

according to the characteristic variables which are locally stored and correspond to the plurality of data identifications, carrying out box separation operation on the plurality of data identifications which are locally stored and correspond to the characteristic variables to obtain a plurality of real boxes, and randomly generating a plurality of confusion boxes;

for each box in the plurality of real boxes and the plurality of confusion boxes, calculating the encrypted positive sample proportion and the encrypted negative sample proportion corresponding to the box according to the encrypted label and the encrypted anti-label corresponding to the data identifier in the box;

sending the indication information of each box to the second party so that the second party determines an intermediate result according to the indication information and encrypts the intermediate result; wherein the indication information is related to the encrypted positive sample ratio and the encrypted negative sample ratio, and the intermediate result is used for calculating the evidence weight and/or the information value;

acquiring encrypted intermediate results corresponding to each box sent by the second participant, and selecting intermediate results corresponding to a plurality of real boxes from the encrypted intermediate results;

and calculating encrypted evidence weight and/or information value according to the intermediate results corresponding to the plurality of real boxes and sending an encrypted final result to the second party, wherein the final result comprises the change trend and/or the information value of the evidence weight.

Optionally, sending the indication information of each box to the second party includes:

for each box, generating a random number, multiplying the encrypted positive sample ratio and the encrypted negative sample ratio by the corresponding random number respectively to obtain indication information, and sending the indication information to the second party;

accordingly, calculating an encrypted evidence weight and/or an information value according to the intermediate results corresponding to the plurality of real boxes and sending an encrypted final result to the second participant, comprising:

and calculating the encrypted evidence weight and/or the encrypted information value according to the intermediate results corresponding to the plurality of real boxes and the corresponding random numbers, and sending the encrypted final result to the second participant.

Optionally, for each bin, the indication information specifically includes a product of an encrypted positive sample fraction and a first random number, and a product of an encrypted negative sample fraction and a second random number;

the intermediate result is determined through logarithm operation, the intermediate result comprises the difference between a first logarithm value and a second logarithm value corresponding to the box, the first logarithm value is a logarithm value of the product of the positive sample proportion and the first random number, and the second logarithm value is a logarithm value of the product of the negative sample proportion and the second random number;

correspondingly, calculating the encrypted evidence weight and/or the encrypted information value according to the intermediate results corresponding to the plurality of real boxes and the corresponding random numbers, and sending the encrypted final result to the second participant, wherein the method comprises the following steps:

for each box in the plurality of real boxes, eliminating the influence of the first random number and the second random number on the encrypted intermediate result corresponding to the box to obtain the encrypted evidence weight;

and dividing the encrypted evidence weight corresponding to each box by a third random number to obtain an encrypted final result and sending the encrypted final result to the second party, wherein the final results of the plurality of real boxes are used for representing the variation trend of the evidence weight.

Optionally, for each bin, the indication information includes a product of the encrypted positive sample fraction and the first random number, a product of the encrypted negative sample fraction and the second random number, a product of a difference between the encrypted positive sample fraction and the encrypted negative sample fraction and the third random number;

the intermediate result is determined by a logarithmic operation and includes a product of a difference between a positive sample fraction and a negative sample fraction, a third random number, a difference between a first logarithm value and a second logarithm value; the first logarithm value is a logarithm value of a product of a positive sample proportion and a first random number, and the second logarithm value is a logarithm value of a product of a negative sample proportion and a second random number;

and for each box in the plurality of real boxes, eliminating the influence of the first random number, the second random number and the third random number on the intermediate result corresponding to the box, obtaining the encrypted value information and sending the encrypted value information to the second participant.

The invention also provides a data binning processing method based on the confusion box, which is applied to a second party and comprises the following steps:

sending a plurality of data identifications and encrypted tags and anti-tags corresponding to each data identification to a first participant, so that the first participant performs box separation operation on the plurality of data identifications which are locally stored and correspond to the characteristic variables according to the characteristic variables which are locally stored and correspond to the plurality of data identifications to obtain a plurality of real boxes, and randomly generating a plurality of confusion boxes;

acquiring indication information of each of the plurality of real boxes and the plurality of confusion boxes sent by the first participant; the indication information of each box is related to the encrypted positive sample proportion and the encrypted negative sample proportion corresponding to the box, and the encrypted positive sample proportion and the encrypted negative sample proportion are determined by the first party according to the encrypted tag and the encrypted anti-tag of the data identifier in the box;

determining an intermediate result according to the indication information and encrypting the intermediate result; wherein the intermediate results are used to calculate evidence weights and/or information values;

sending the encrypted intermediate results of each box to the first participant, so that the first participant selects intermediate results corresponding to a plurality of real boxes from the encrypted intermediate results corresponding to each box, and calculates encrypted evidence weight and/or information value according to the intermediate results corresponding to the real boxes;

and acquiring an encrypted final result sent by the first participant, wherein the final result comprises the change trend and/or the information value of the evidence weight.

The invention also provides a data box separating processing device based on the confusion box, which is applied to a first participant and comprises:

the data identifier acquisition module is used for acquiring a plurality of data identifiers sent by the second party and encrypted tags and anti-tags corresponding to each data identifier;

the box separation module is used for carrying out box separation operation on the locally stored data identifications corresponding to the characteristic variables according to the locally stored characteristic variables corresponding to the data identifications to obtain a plurality of real boxes and randomly generating a plurality of confusion boxes;

the calculation module is used for calculating the encrypted positive sample proportion and the encrypted negative sample proportion corresponding to each box according to the encrypted label and the encrypted anti-label corresponding to the data identifier in each box;

the indication information sending module is used for sending the indication information of each box to the second participant so that the second participant can determine an intermediate result according to the indication information and encrypt the intermediate result; wherein the indication information is related to the encrypted positive sample ratio and the encrypted negative sample ratio, and the intermediate result is used for calculating the evidence weight and/or the information value;

an intermediate result obtaining module, configured to obtain encrypted intermediate results corresponding to each box sent by the second party, and select intermediate results corresponding to multiple real boxes from the encrypted intermediate results;

and the final result sending module is used for calculating the encrypted evidence weight and/or the encrypted information value according to the intermediate results corresponding to the plurality of real boxes and sending the encrypted final result to the second party, wherein the final result comprises the change trend of the evidence weight and/or the information value.

The invention also provides a data box separating processing device based on the confusion box, which is applied to a second party and comprises:

the data identification sending module is used for sending a plurality of data identifications and encrypted labels and anti-labels corresponding to the data identifications to a first participant so that the first participant performs box separation operation on the locally stored data identifications corresponding to the characteristic variables according to the locally stored characteristic variables corresponding to the data identifications to obtain a plurality of real boxes and randomly generates a plurality of confusion boxes;

an indication information obtaining module, configured to obtain indication information of each of the plurality of real boxes and the plurality of obfuscated boxes sent by the first party; the indication information of each box is related to the encrypted positive sample proportion and the encrypted negative sample proportion corresponding to the box, and the encrypted positive sample proportion and the encrypted negative sample proportion are determined by the first party according to the encrypted tag and the encrypted anti-tag of the data identifier in the box;

the intermediate result determining module is used for determining an intermediate result according to the indication information and encrypting the intermediate result; wherein the intermediate results are used to calculate evidence weights and/or information values;

the intermediate result sending module is used for sending the encrypted intermediate results of the boxes to the first participant so that the first participant selects intermediate results corresponding to a plurality of real boxes from the encrypted intermediate results corresponding to the boxes, and calculates encrypted evidence weight and/or information value according to the intermediate results corresponding to the real boxes;

and the final result acquisition module is used for acquiring the encrypted final result sent by the first participant, and the final result comprises the change trend and/or the information value of the evidence weight.

The invention also provides a data box separating processing device based on the confusion box, which comprises: a memory, a processor and a confusion-bin-based data binning processing program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the confusion-bin-based data binning processing method as described in any of the preceding claims.

The present invention also provides a computer-readable storage medium having stored thereon a confusion-bin-based data binning processing program, which when executed by a processor implements the steps of the confusion-bin-based data binning processing method as described in any one of the preceding claims.

The invention also provides a computer program product comprising a computer program which, when executed by a processor, implements the obfuscated-box-based data binning processing method of any one of the preceding claims.

The invention provides a data binning processing method, a device, equipment and a storage medium based on a confusion bin, which can obtain a plurality of data identifications sent by a second party and encrypted labels and anti-labels corresponding to each data identification, perform binning operation on a plurality of locally stored data identifications corresponding to characteristic variables according to the locally stored characteristic variables corresponding to the plurality of data identifications to obtain a plurality of real bins, randomly generate a plurality of confusion bins, calculate encrypted positive sample proportion and encrypted negative sample proportion corresponding to each real bin and each confusion bin according to the encrypted labels and the anti-labels corresponding to the data identifications in the bins for each real bin and each confusion bin, send indication information of each bin to the second party so that the second party determines an intermediate result according to the indication information and encrypts the intermediate result, the indication information is related to the encrypted positive sample proportion and the encrypted negative sample proportion, the intermediate results are used for calculating evidence weight and/or information value, the encrypted intermediate results corresponding to each box sent by the second participant are obtained, the intermediate results corresponding to a plurality of real boxes are selected from the intermediate results, the encrypted evidence weight and/or information value are calculated according to the intermediate results corresponding to the real boxes, the encrypted final results are sent to the second participant, the final results comprise the change trend of the evidence weight and/or the information value, the box dividing results can be confused based on randomly generated confusion boxes, the second participant cannot know which boxes are really used, the data of the first participant are difficult to decipher, the safety in determining the information value and the evidence weight is improved, and the data leakage risk is reduced, and the overall interaction safety of the federal study is improved.

Drawings

Fig. 1 is a schematic diagram of an application scenario according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of data interaction in Guest malicious attack in the scenario shown in FIG. 1;

FIG. 3 is a schematic flow chart of a data binning processing method based on a confusion box according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an embodiment of the present invention providing an additional confusion box;

FIG. 5 is a schematic diagram of an interaction in calculating an evidence weight according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of an interaction in calculating an information value according to an embodiment of the present invention;

fig. 7 is a graph of evidence weight change trend provided by the embodiment of the present invention;

FIG. 8 is a schematic diagram of an information value distribution according to an embodiment of the present invention;

FIG. 9 is a flow chart of another data binning processing method based on a confusion bin according to an embodiment of the present invention;

FIG. 10 is a schematic structural diagram of a data binning processing apparatus based on a confusion bin according to an embodiment of the present invention;

FIG. 11 is a schematic structural diagram of another data binning processing apparatus based on a confusion bin according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of a data binning processing apparatus based on a confusion bin according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

In the federal learning process, a plurality of participants each possess sample data, which may contain various types of characteristic variables. When model training is performed according to characteristic variables in sample data, the evidence weight and the information value are important characteristic indexes.

Specifically, multiple types of feature variables can be screened through evidence weight and information value, and the feature variables used in model training are determined. The evidence weight and the information value can reflect the prediction capability of the characteristic variable, and the contribution of the characteristic variable to the model prediction result is effectively measured, so that the performance of the model can be improved. How to jointly calculate the evidence weight and the information value under the federal scene and ensure the data safety is an important proposition in federal learning.

Fig. 1 is a schematic view of an application scenario according to an embodiment of the present invention. As shown in fig. 1, a first participant is a participant holding a characteristic variable, for example, Host, a second participant is a participant holding a label (label), for example, Guest, both parties can hold coincident data identifiers, the second participant holds a plurality of data identifiers and corresponding labels Y and inverse labels (1-Y), Y and (1-Y) are encrypted to obtain E (Y) and E (1-Y), the plurality of data identifiers and corresponding E (Y) and E (1-Y) are sent to the first participant, the first participant performs binning operation on the plurality of data identifiers according to the locally stored characteristic variable, the plurality of data identifiers are binned into a plurality of bins, the sum of E (Y) and the sum of E (1-Y) corresponding to the data identifiers in each bin are calculated, and the bin number and the calculation result of each bin are fed back to the second participant, the second party, after decryption, can determine the evidence weight and the information value based on the sum of Y and the sum of (1-Y).

However, this solution is relatively easy to defeat when faced with malicious attacks. In particular, the second participant may specifically encode Y, adding a magic number (magic number) to the data to be pulled over by the first participant.

Fig. 2 is a schematic diagram of data interaction in Guest malicious attack in the scenario shown in fig. 1. As shown in fig. 2, if the label Y is coded by one hot (onehot), the summation result can show the box where each sample of the first participant is located, so that the complete data can be restored by the evidence weight, resulting in data leakage.

For example, the second party encodes Y of data id1, data id2, and data id3 as 0001, 0010, and 0100, respectively, then if the summation result of Y in box 1 is 0101, it can be concluded that data id1 and data id3 are in box 1.

In order to solve the problem, confusion boxes are added during box separation to obtain a plurality of real boxes and a plurality of confusion boxes, a first participant obtains Y and (1-Y), respectively calculates the sum of Y and (1-Y) of each real box and each confusion box, then respectively divides the sum of Y and the sum of (1-Y) of all data, sends the result to a second participant, the second participant feeds back the obtained data to the first participant after carrying out logarithm operation on the obtained data, the first participant selects data corresponding to the real boxes from the data, further calculates evidence weight and information value, and sends the information value and the variation trend of the evidence weight to the first participant, the calculation of the evidence weight and the information value can be realized by using the real boxes and the confusion boxes together, so that the first participant can not know the received boxes, which can be really used, thereby increasing the difficulty of decoding the data of the first party, and effectively improving the safety of the information value and evidence weight solving process by increasing the confusion box.

Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The features of the embodiments and examples described below may be combined with each other without conflict between the embodiments.

Fig. 3 is a schematic flow chart of a data binning processing method based on a confusion box according to an embodiment of the present invention. The execution subject of the method provided by the embodiment may be the first participant holding the feature variable. As shown in fig. 3, the method includes:

step 301, acquiring a plurality of data identifications sent by the second party and an encrypted tag and an anti-tag corresponding to each data identification.

The first participant and the second participant have a plurality of identical data identifiers, the data identifiers can be identification numbers, bank card numbers, user account numbers, serial numbers and the like, meanwhile, the first participant also has characteristic variables corresponding to the data identifiers, such as age, income, transaction times, deposits, recent consumption amount, academic history, regions where the first participant is located and the second participant has tags corresponding to the data identifiers, for example, whether the payment is overdue or not.

Table 1 example of sample data

As shown in table 1, the data identification is represented by ID, X is a characteristic variable, specifically, X1, X2, X3 are age, income and transaction number, respectively, on the basis of which more characteristic variables can be expanded. The label Y may be used to indicate whether there is an expiration, with 1 indicating an expiration and 0 indicating no expiration.

The sample data of the first party may include ID and X, and the sample data of the second party may include ID and Y. The second party can encrypt the tag Y and the anti-tag (1-Y) in a homomorphic way and send the plain text of the ID, the cipher text of the tag Y and the cipher text of the anti-tag (1-Y) to the first party.

Step 302, according to the locally stored characteristic variables corresponding to the plurality of data identifications, performing box separation operation on the locally stored plurality of data identifications corresponding to the characteristic variables to obtain a plurality of real boxes, and randomly generating a plurality of confusion boxes.

And the box separation operation can be completed according to the characteristic variable corresponding to the data identifier. Optionally, the value of the characteristic variable may be divided into a plurality of ranges, and a corresponding real box may be generated for each range, where the real box includes the data identifier in the corresponding range.

It will be appreciated that in some embodiments, the data identifier included by the second party is not identical to the data identifier included by the first party, and that sample alignment or the like may be performed to screen the data identifiers prior to performing the binned data processing of the present invention. For convenience of description, when performing the boxed data processing of the present invention, the data identifier sent by the second party is the same as the data identifier contained by the first party by default.

Taking the characteristic variable as the transaction frequency as an example, the transaction frequency can be divided into three intervals: 0-20, 21-50, 51 and above, respectively for bin 1, bin 2, and bin 3, IDs 1 and 4 in table 1 are classified into bin 1, IDs 2 and 5 are classified into bin 2, IDs 3 and ID6 are classified into bin 3, and bin 1, bin 2, and bin 3 are a plurality of real bins generated based on characteristic variables.

In addition, a plurality of confusion boxes are generated randomly, the confusion boxes do not need to follow the box dividing rule, for example, a plurality of data identifications can be divided randomly to generate a plurality of confusion boxes.

Fig. 4 is a schematic diagram of adding an confusion box according to an embodiment of the present invention. As shown in fig. 4, the IDs 1 to 6 may be divided into a plurality of real boxes, and a plurality of confusion boxes may be randomly generated, according to the feature variables corresponding to the IDs 1 to 6. When the intermediate result is calculated, the data of a plurality of real boxes and a plurality of confusion boxes can be used, the risk of data leakage is reduced, and when the final result is calculated, the data of a plurality of real boxes can be used, so that the accuracy of the final result is ensured.

And step 303, for each of the plurality of real boxes and the plurality of confusion boxes, calculating the encrypted positive sample proportion and the encrypted negative sample proportion corresponding to the box according to the encrypted tag and the encrypted anti-tag corresponding to the data identifier in the box.

The positive sample proportion of each box is the ratio of the number of the positive samples in the box to the total number of all the positive samples, and the negative sample proportion is the ratio of the number of the negative samples in the box to the total number of all the negative samples. In the embodiment of the present invention, the positive sample is sample data whose corresponding label Y is a first numerical value, and the negative sample is sample data whose corresponding label Y is a non-first numerical value. Optionally, the first value is 1. The positive sample fraction and the negative sample fraction can be determined by the following equations.

Wherein A is_iPositive sample fraction for the ith bin, B_iAs the ith tankNegative sample ratio, Ni is the number of data marks in the ith bin, Y_jIdentify the corresponding Y for the jth data, (1-Y)_jIdentify the corresponding (1-Y), Y, for the jth data_totalIdentifying the sum of the corresponding Y for all data, (1-Y)_totalThe corresponding sum of (1-Y) is identified for all data.

And the first participant counts Y and (1-Y) corresponding to each box in the ciphertext state, and further can obtain the encrypted positive sample ratio and the encrypted negative sample ratio.

Step 304, sending the indication information of each box to the second party, so that the second party determines an intermediate result according to the indication information and encrypts the intermediate result; wherein the indication information is related to the encrypted positive and negative sample ratios, and the intermediate result is used for calculating the evidence weight and/or the information value.

Wherein the indication information sent by the first party may include indication information of a plurality of real boxes and indication information of a plurality of confusion boxes. The indication may be data related to positive and negative sample ratios, for example, the indication may include positive and negative sample ratios, or the indication may include products of the positive and negative sample ratios and random numbers.

After the second party acquires the indication information, the indication information can be processed through a formula to obtain an intermediate result, and the intermediate result is encrypted. For each of the plurality of real bins and the plurality of obfuscated bins, its corresponding intermediate result may be computed.

And 305, acquiring the encrypted intermediate results corresponding to the boxes sent by the second participant, and selecting the intermediate results corresponding to a plurality of real boxes from the encrypted intermediate results.

Optionally, when the first party and the second party perform information interaction, the first party and the second party may carry sequence numbers or identifiers of the boxes, and the first party may determine an intermediate result corresponding to the real box from the obtained encrypted intermediate results according to the sequence numbers or the identifiers.

Step 306, calculating an encrypted evidence weight and/or an encrypted information value according to the intermediate results corresponding to the plurality of real boxes, and sending an encrypted final result to the second party, wherein the final result comprises a change trend and/or an information value of the evidence weight.

Alternatively, the evidence weight and the information value may be calculated by the following formulas.

WOE＝logA-logB (3)

Where WOE is the evidence weight, IV is the information value, and n is the number of real bins. For each real box, a corresponding evidence weight may be calculated. And calculating the information value corresponding to each characteristic variable according to the plurality of real boxes corresponding to each characteristic variable.

As can be seen from the above formula, the calculation process of the evidence weight and the information value comprises the logarithmic operation of the positive sample ratio and the negative sample ratio. Because the intermediate result returned by the second party comprises the logarithm operation on the indication information, the first party can directly calculate the encrypted evidence weight and the information value corresponding to the real box according to the encrypted intermediate result after filtering the confusion box.

After the first party obtains the encrypted evidence weight and the encrypted information value, the first party can send the encrypted information value or the change trend of the encrypted evidence weight to the second party, and the second party obtains the information value or obtains the change trend of the evidence weight after decryption and displays the information value or the change trend of the evidence weight to the user. For each characteristic variable in the multiple characteristic variables, the evidence weight and/or the information value of the characteristic variable can be calculated by adopting the method provided by the embodiment of the invention, so that a user can select the characteristic variable applied to the federal model training from the multiple characteristic variables according to the change trend and/or the information value of the evidence weight corresponding to the multiple characteristic variables.

In practical application, the first participant may calculate at least one of the evidence weight and the information value and send a corresponding final result to the second participant, and the second participant may select a feature variable for model training according to the information value, may select a feature variable for model training according to the evidence weight variation trend, or may select a feature variable in combination with the information value and the evidence weight variation trend.

The data binning processing method based on the confusion bin provided in this embodiment may obtain a plurality of data identifiers sent by a second party and an encrypted tag and an anti-tag corresponding to each data identifier, perform binning operation on a plurality of locally stored data identifiers corresponding to feature variables according to locally stored feature variables corresponding to the plurality of data identifiers to obtain a plurality of real bins, randomly generate a plurality of confusion bins, calculate an encrypted positive sample proportion and an encrypted negative sample proportion corresponding to each real bin and each confusion bin according to the encrypted tag and the anti-tag corresponding to the data identifier in the bin, send indication information of each bin to the second party, so that the second party determines an intermediate result according to the indication information and encrypts the intermediate result, where the indication information is related to the encrypted positive sample proportion and the encrypted negative sample proportion, the intermediate results are used for calculating the evidence weight and/or the information value, obtaining the encrypted intermediate results corresponding to each box sent by the second party, selecting the intermediate results corresponding to a plurality of real boxes from the encrypted intermediate results, calculating the encrypted evidence weight and/or the information value according to the intermediate results corresponding to the real boxes, and sending the encrypted final result to the second party, wherein the final result comprises the change trend of the evidence weight and/or the information value, the box splitting results can be confused based on randomly generated confusion boxes, so that the second party cannot know which boxes are really used, the data of the first party is difficult to decipher, the safety in determining the information value and the evidence weight is improved, the risk of data leakage is reduced, and the whole federal interaction safety in learning is improved.

On the basis of the technical solution provided by the above embodiment, optionally, sending the indication information of each box to the second party includes: and for each box, generating a random number, multiplying the encrypted positive sample ratio and the encrypted negative sample ratio by the corresponding random numbers respectively to obtain indication information, and sending the indication information to the second party.

Specifically, the encrypted positive sample fraction may be multiplied by a random number, the encrypted negative sample fraction may be multiplied by another random number, and the indication information may include two multiplication results.

Accordingly, calculating an encrypted evidence weight and/or an information value according to the intermediate results corresponding to the plurality of real boxes and sending an encrypted final result to the second participant may include: and calculating the encrypted evidence weight and/or the encrypted information value according to the intermediate results corresponding to the plurality of real boxes and the corresponding random numbers, and sending the encrypted final result to the second participant.

Specifically, the intermediate result includes interference of the random number, and when the final result is calculated according to the intermediate result, the interference of the random number can be removed, so that a correct final result is obtained.

The encrypted positive sample proportion and the encrypted negative sample proportion are interfered by using the random number, so that the difficulty of the second party in decrypting the data of the first party after decryption can be increased, and the safety of calculating the information value and the evidence weight is further improved.

Optionally, in calculating the information weight, the indication information specifically includes, for each bin, a product of the encrypted positive sample fraction and the first random number, and a product of the encrypted negative sample fraction and the second random number.

Wherein the first random number and the second random number of the plurality of real boxes and the plurality of confusion boxes may not all be the same.

The intermediate result is determined by a logarithm operation, and the intermediate result includes a difference between a first logarithm value and a second logarithm value corresponding to the bin, the first logarithm value is a logarithm value of a product of the positive sample proportion and the first random number, and the second logarithm value is a logarithm value of a product of the negative sample proportion and the second random number.

Correspondingly, calculating the encrypted evidence weight and/or the encrypted information value according to the intermediate results corresponding to the plurality of real boxes and the corresponding random numbers, and sending the encrypted final result to the second participant, wherein the method comprises the following steps: for each box in the plurality of real boxes, eliminating the influence of the first random number and the second random number on the encrypted intermediate result corresponding to the box to obtain the encrypted evidence weight; and dividing the encrypted evidence weight corresponding to each box by a third random number to obtain an encrypted final result and sending the encrypted final result to the second party, wherein the final results of the plurality of real boxes are used for representing the variation trend of the evidence weight.

Optionally, the eliminating the influence of the first random number and the second random number from the encrypted intermediate result corresponding to the bin may include: the logarithmic value of the ratio of the first random number to the second random number is subtracted from the encrypted intermediate result corresponding to the bin.

Specifically, as can be seen from formula (3), the evidence weight is equal to the difference between the logarithm of the positive sample ratio and the logarithm of the negative sample ratio, and the intermediate result returned by the second party contains the random number, so that the intermediate result minus the logarithm of the random number can obtain the correct evidence weight, that is, the influence of the first random number and the second random number is eliminated, and the encrypted evidence weight is sent to the second party after being interfered by the third random number, so that the second party obtains the variation trend of the evidence weight.

Fig. 5 is an interaction diagram for calculating an evidence weight according to an embodiment of the present invention. As shown in fig. 5, calculating the evidence weight may include the steps of:

step 501, the second party sends the encrypted Y and 1-Y to the first party.

Specifically, the first party may obtain a plurality of data identifications sent by the second party, and an encrypted tag Y and an encrypted anti-tag (1-Y) corresponding to each data identification.

Step 502, the first party calculates encrypted a and B for each bin.

Wherein, A is the positive sample ratio, and B is the negative sample ratio. N real bins and k confusion bins may be provided, and for each bin of the n + k bins, the corresponding a and B may be calculated by equations (1) and (2).

Step 503, each box generates two random numbers r1, r2, and each feature variable generates a random number r 3.

Wherein r1, r2, r3 may be the first random number, the second random number, the third random number when calculating the evidence weight, r1, r2 of each box may be different, r3 of each feature variable may also be different.

Step 504, the first participant sends encrypted a r1, B r 2.

Specifically, for each of the n + k bins, indication information may be sent, including the encrypted a × r1, B × r2 to which the bin corresponds.

And step 505, calculating log (A r1) -log (B r2) by the second participant.

Specifically, the second participant decrypts encrypted a × r1 and B × r2 to obtain a × r1 and B × r2, and then calculates log (a × r1) -log (B × r2) as an intermediate result, log (a × r1) being a first logarithm and log (B × r2) being a second logarithm, the intermediate result including a logarithmic operation on the indicating information.

Step 506, the second participant sends encrypted log (a r1) -log (B r 2).

Wherein for each of the real and obfuscated bins, a corresponding log (a r1) -log (B r2) may be returned.

And step 507, selecting a real box by the first participant, and calculating encrypted log (A r1) -log (B r2) -log (r1/r2) to obtain encrypted evidence weight.

The specific derivation process can be seen in equation (5).

Step 508, the first party sends the encrypted WOE/r 3.

In other alternative implementations, other random number interference modes may be used, for example, the encrypted WOE may be added to r3 and then divided by the random number r4, and the result is sent to the second participant.

And step 509, the second party decrypts to obtain the change trend of the WOE.

The difference log (A r1) -log (B r2) between the first logarithm and the second logarithm of the second party is fed back by the second party, and the final result is calculated by the first party according to the first logarithm log (A r1) and the second logarithm log (B r2), so that the calculation of the evidence weight can be rapidly and accurately realized on the basis of ensuring the data security, and the efficiency and the accuracy of determining the evidence weight are effectively improved.

Optionally, in calculating the information value, the indication information may include, for each bin, a product of the encrypted positive sample fraction and the first random number, a product of the encrypted negative sample fraction and the second random number, a product of a difference between the encrypted positive sample fraction and the encrypted negative sample fraction and the third random number.

Optionally, the first random number, the second random number, and the third random number of the plurality of real bins and the plurality of confusion bins are not all the same.

The intermediate result may include a product of a difference between the positive and negative sample ratios, a third random number, a difference between the first and second logarithm values; the first logarithm value is a logarithm value of a product of a positive sample proportion and a first random number, and the second logarithm value is a logarithm value of a product of a negative sample proportion and a second random number.

Correspondingly, calculating the encrypted evidence weight and/or the encrypted information value according to the intermediate results corresponding to the plurality of real boxes and the corresponding random numbers, and sending the encrypted final result to the second participant, wherein the method comprises the following steps: and for each box in the plurality of real boxes, eliminating the influence of the first random number, the second random number and the third random number on the intermediate result corresponding to the box, obtaining the encrypted value information and sending the encrypted value information to the second participant.

Optionally, eliminating the influence of the first random number, the second random number, and the third random number from the intermediate result corresponding to the bin may include: subtracting the encrypted positive sample proportion corresponding to the box from the encrypted negative sample proportion to obtain a first subtraction result, subtracting the logarithm value of the first random number from the logarithm value of the second random number to obtain a second subtraction result, and multiplying the first subtraction result, the second subtraction result and the third random number to obtain an intermediate product; adding the encrypted intermediate products corresponding to the plurality of real boxes to obtain a first addition result, and adding the encrypted intermediate results corresponding to the plurality of real boxes to obtain a second addition result; dividing a difference of the second addition result and the first addition result by a third random number.

Specifically, it can be known from formula (4) that the difference between the logarithm of the information value and the ratio of the positive samples and the logarithm of the ratio of the negative samples multiplied by the difference between the ratio of the positive samples and the ratio of the negative samples is related to the intermediate result returned by the second party, and since the intermediate result returned by the second party contains the logarithm of the random number interference, the encrypted logarithm of the ratio of the positive samples and the ratio of the negative samples can be restored by subtracting the logarithm of the random number from the encrypted intermediate result, and the influence of the first random number and the second random number is eliminated, and the first party knows the difference between the ratio of the encrypted positive samples and the ratio of the negative samples, so that the correct information value can be calculated.

Fig. 6 is a schematic diagram of interaction when calculating an information value according to an embodiment of the present invention. As shown in fig. 6, calculating the information value may include the steps of:

step 601, the second party sends the encrypted Y and 1-Y to the first party.

Specifically, the first party may obtain a plurality of data identifiers sent by the second party, and an encrypted target variable Y and an encrypted opposite variable (1-Y) corresponding to each data identifier.

Step 602, the first participant calculates encrypted a and B corresponding to each bin.

Wherein, A is the positive sample ratio, and B is the negative sample ratio. The specific calculation method can be seen in formula (1) and formula (2). N real bins and k confusion bins may be generated, with a corresponding a and B being calculated for each of the n + k bins.

Step 603, generating three random numbers r1, r2 and r3 for each box.

Wherein r1, r2, r3 may be the first random number, the second random number, the third random number when calculating the information value, and r1, r2, r3 of each box may be different.

When calculating both the proof weight and the information value, r1, r2, and r3 corresponding to the proof weight may be the same as or different from r1, r2, and r3 corresponding to the information value.

Step 604, the first participant sends encrypted a r1, B r2, (a-B) r 3.

And sending indication information to each of the n + k boxes, wherein the indication information comprises A r1, B r2 and (A-B) r3 corresponding to the box.

Step 605, the second party calculates IV'.

Specifically, the second participant decrypts encrypted a × r1, B × r2, and (a-B) × r3 to obtain a × r1, B × r2, and (a-B) × r3, and then calculates IV' by the following formula.

IV′＝(A-B)*r3(log(A*r1)-log(B*r2)) (4)

Wherein, IV' is a corresponding intermediate result when calculating the information value, and comprises the product of the following three terms: a difference between positive and negative sample ratios (a-B), a third random number r3, a difference between the first and second log values log (a x r1) -log (B x r 2).

Step 606, the second party sends the encrypted IV'.

Wherein for each of the real and obfuscated bins, a corresponding IV' may be returned.

Step 607, the first party calculates the encrypted IV according to the encrypted IV'.

Specifically, IV' can be modified by the following formula.

Given that some of the data returned by the second party is aliased box data, the remainder of the real boxes may be summed after the aliased box data is removed.

The specific derivation procedure is as follows.

From this, IV can be calculated by the following formula.

Wherein, IV'_i、A_i、B_iRespectively representing the IV', A, B corresponding to the ith real box, and n is the number of real boxes.

Specifically, for each of a plurality of real bins, the encrypted positive sample fraction corresponding to the bin is subtracted from the encrypted negative sample fraction to obtain a first subtraction result (a)_i-B_i) Subtracting the logarithm of the first random number from the logarithm of the second random number to obtain a second subtraction result (logr1-logr2), and subtracting the first subtraction result (A) from the second subtraction result (A)_i-B_i) Multiplying the second subtraction result (logr1-logr2) by the third random number r3 to obtain an intermediate product (A)_i-B_i)*r3*(logr1-logr2)。

Adding the encrypted intermediate products corresponding to the real boxes to obtain a first addition result, and adding the encrypted intermediate results IV 'corresponding to the real boxes'_iAnd adding to obtain a second addition result, and dividing the difference value of the second addition result and the first addition result by a third random number to obtain the encrypted value information.

Alternatively, the data acquired by the first party from the second party or calculated by the second party may be encrypted data, for example, a, B, logA, logB always exist in an encrypted form in the first party, and when the encrypted data needs to be added or subtracted with a constant, the constant may be the encrypted constant.

For example, the logr1 and the logr2 in the formula can be specifically encrypted logr1 and logr 2. Alternatively, a first party may store a public key, a second party may store a public key and a private key, the first party may encrypt the constant using the public key, and the second party may decrypt using the public key and the private key. Due to the adoption of a homomorphic encryption mode, the operations of adding and multiplying the constant do not influence the correctness of the decrypted data.

Step 608, the first party sends the encrypted IV.

And step 609, the second party decrypts to obtain the IV.

The target variable provider feeds back the product of the difference between the encrypted positive sample ratio and the encrypted negative sample ratio, the third random number, the product of the difference between the first logarithm value and the second logarithm value, and the first participant further calculates the final result, so that the information value can be quickly and accurately calculated on the basis of ensuring the data security, and the efficiency and the accuracy of information value calculation are effectively improved.

In other alternative implementations, the random numbers in the intermediate interaction process can be omitted to calculate the evidence weights and information values. Optionally, in calculating the evidence weight, the indication information specifically includes an encrypted positive sample fraction and an encrypted negative sample fraction for each of the plurality of real bins and the plurality of obfuscated bins; the intermediate result is determined by a logarithm operation, and the intermediate result includes a difference between a first logarithm value and a second logarithm value corresponding to the box, the first logarithm value is a logarithm value of a positive sample ratio, and the second logarithm value is a logarithm value of a negative sample ratio.

Accordingly, calculating an encrypted evidence weight and/or an information value according to the intermediate results corresponding to the plurality of real boxes and sending an encrypted final result to the second participant, comprising: and for each box in the plurality of real boxes, dividing the intermediate result corresponding to the box by the random number to obtain an encrypted final result and sending the encrypted final result to the second party, wherein the final results of the plurality of real boxes are used for representing the variation trend of the evidence weight.

Optionally, in calculating the information value, the indication information may include the following three items for each bin: an encrypted positive sample fraction, an encrypted negative sample fraction, a difference between the encrypted positive sample fraction and the encrypted negative sample fraction; the intermediate result may comprise a product of two: a difference between the positive sample fraction and the negative sample fraction, a difference between the first logarithm and the second logarithm; the first logarithm value is a logarithm value of a positive sample ratio, and the second logarithm value is a logarithm value of a negative sample ratio.

Accordingly, calculating an encrypted evidence weight and/or an information value according to the intermediate results corresponding to the plurality of real boxes and sending an encrypted final result to the second participant, comprising: for each box in the real boxes, subtracting the encrypted positive sample proportion corresponding to the box from the encrypted negative sample proportion to obtain a subtraction result; and summing the subtraction results corresponding to the plurality of real boxes to obtain encrypted value information and sending the encrypted value information to a second participant.

In the case of reducing the random number, only the corresponding random number in the formula needs to be removed, and the specific implementation principle may refer to the foregoing embodiment, which is not described herein again.

Although the interference of random numbers is reduced, the difficulty of the second party in cracking the data of the first party can be increased to a certain extent due to the existence of the confusion box, and the data security is improved. In addition, after the first party calculates the evidence weight, the evidence weight and the random number can still be added or multiplied and then fed back to the second party, so that the second party can only know the change trend of the evidence weight and can not directly deduce the sample data distribution condition stored by the first party.

On the basis of the technical solutions provided in the foregoing embodiments, optionally, a final result obtained by a second party from the first party includes a final result corresponding to a plurality of feature variables, and the second party may show a variation trend and/or an information value of an evidence weight corresponding to each feature variable, so that a user selects at least part of the feature variables from the plurality of feature variables for federal learning according to the variation trend and/or the information value.

Wherein, the change trend of the evidence weight can be shown by a change trend chart. In the variation trend graph of the evidence weight of each feature variable, the horizontal axis can be used for representing each box obtained according to the feature variable, and the vertical axis is the product of the evidence weight corresponding to each box and a random number.

Fig. 7 is a graph of evidence weight change trend provided by the embodiment of the present invention. As shown in fig. 7, for the feature variable of the transaction frequency, the bins 1 to 3 respectively represent 3 bins divided by three intervals of the transaction frequency, and the vertical axis may be the product of the evidence weight corresponding to each bin and the random number, and the change trend of the evidence weight is the same.

After the change trend of the evidence weight is obtained, a proper characteristic variable can be selected according to the change trend, for example, a characteristic variable which is monotonically increased is selected for model training, so that a specific change rule is presented between the characteristic variable and an output result of the model, and the interpretability of the model is improved.

Fig. 8 is a schematic diagram of information value distribution according to an embodiment of the present invention. As shown in fig. 8, the horizontal axis represents feature variables, such as age, income, transaction times, deposit, academic calendar, etc., and the vertical axis represents information value corresponding to each feature variable, and the feature variables can be sorted according to the information value, and the first feature variables with higher information value are selected for model training.

By displaying the change trend and the information value of the evidence weight, a user can quickly and accurately screen out various characteristic variables according to the change trend and the information value of the evidence weight to construct a federal model, and the overall efficiency of federal learning is improved.

The technical scheme provided by the embodiment of the invention can be applied to various fields. For example, in the financial field, the federal model to be trained may be a model used to predict whether a user will be overdue. After the model is trained, the characteristic variables of the user to be analyzed can be input into the model to obtain a corresponding prediction result, the prediction result can be used for indicating whether the user is overdue or not, or the possibility of overdue, and further the loan limit of the user can be adjusted according to the result, or prompt information is sent to an auditor, and the like.

In the meteorological field, the model to be trained can be a model for predicting precipitation, optional characteristic variables can include a satellite cloud picture, precipitation in the same period of the previous year, weather, regions, altitudes, seasons and the like in the past week, some characteristic variables have strong correlation with the precipitation, and some characteristic variables may have small correlation with the precipitation. After the model training is finished, the precipitation can be predicted according to the model.

The embodiments of the present invention can be applied to other fields besides the financial field and the meteorological field, and specific processing methods can be referred to the foregoing fields, which are not described herein again.

Fig. 9 is a schematic flow chart of another data binning processing method based on a confusion box according to an embodiment of the present invention. The method may be applied to a second party. As shown in fig. 9, the method includes:

step 901, sending a plurality of data identifications and encrypted tags and anti-tags corresponding to each data identification to a first participant, so that the first participant performs a binning operation on the plurality of locally stored data identifications corresponding to the characteristic variables according to the locally stored characteristic variables corresponding to the plurality of data identifications to obtain a plurality of real bins, and randomly generates a plurality of confusion bins.

Step 902, obtaining indication information of each box in the plurality of real boxes and the plurality of confusion boxes sent by the first participant; the indication information of each box is related to the encrypted positive sample proportion and the encrypted negative sample proportion corresponding to the box, and the encrypted positive sample proportion and the encrypted negative sample proportion are determined by the first party according to the encrypted label and the encrypted anti-label identified by the data in the box.

Step 903, determining an intermediate result according to the indication information and encrypting the intermediate result; wherein the intermediate results are used to calculate an evidence weight and/or an information value.

Alternatively, the intermediate result may be determined by a logarithmic operation on the indication information.

And 904, sending the encrypted intermediate results of each box to the first participant, so that the first participant selects intermediate results corresponding to a plurality of real boxes from the encrypted intermediate results corresponding to each box, and calculates encrypted evidence weight and/or information value according to the intermediate results corresponding to the plurality of real boxes.

Step 905, obtaining an encrypted final result sent by the first party, where the final result includes a change trend of the evidence weight and/or an information value.

Optionally, the indication information of each bin includes products obtained by multiplying the encrypted positive sample fraction and the encrypted negative sample fraction by corresponding random numbers, respectively.

Optionally, the indication information of each bin specifically includes a product of the encrypted positive sample fraction and the corresponding first random number, and a product of the encrypted negative sample fraction and the corresponding second random number;

correspondingly, determining an intermediate result according to the indication information and encrypting the intermediate result, including:

decrypting the indication information to obtain the product of the positive sample ratio and the first random number and the product of the negative sample ratio and the second random number;

for each box, calculating a logarithmic value of a product of the positive sample ratio corresponding to the box and the first random number to obtain a first logarithmic value, and calculating a logarithmic value of a product of the negative sample ratio corresponding to the box and the second random number to obtain a second logarithmic value;

and for each box, calculating the difference between the first logarithm value and the second logarithm value corresponding to the box to obtain an intermediate result corresponding to the box, and encrypting the intermediate result.

Optionally, for each bin, the indication information specifically includes a product of the encrypted positive sample fraction and the first random number, a product of the encrypted negative sample fraction and the second random number, and a product of a difference between the encrypted positive sample fraction and the encrypted negative sample fraction and the third random number;

decrypting the indication information to obtain a product of the positive sample ratio and the first random number, a product of the negative sample ratio and the second random number, and a product of the difference between the positive sample ratio and the negative sample ratio and the third random number;

and subtracting the second logarithm value from the first logarithm value to obtain a difference value, multiplying the difference value between the positive sample ratio and the negative sample ratio by a third random number to obtain an intermediate result, and encrypting the intermediate result.

Optionally, the final result obtained from the first party includes final results corresponding to a plurality of feature variables; the method further comprises the following steps:

and displaying the change trend and/or the information value of the evidence weight corresponding to each characteristic variable, so that a user selects at least part of the characteristic variables from the multiple characteristic variables for federal learning according to the change trend and/or the information value.

Specific implementation principles, processes and beneficial effects of the data binning processing method based on the confusion box provided by the embodiment can be seen in the foregoing embodiments, and are not described herein again.

Fig. 10 is a schematic structural diagram of a data binning processing apparatus based on a confusion box according to an embodiment of the present invention. The apparatus may be applied to a first party. As shown in fig. 10, the confusion box-based data binning processing apparatus may include:

a data identifier obtaining module 1001, configured to obtain a plurality of data identifiers sent by a second party and an encrypted tag and an anti-tag corresponding to each data identifier;

the binning module 1002 is configured to perform binning operation on the locally stored multiple data identifiers corresponding to the multiple data identifiers according to the locally stored characteristic variables corresponding to the multiple data identifiers, so as to obtain multiple real bins, and randomly generate multiple confusion bins;

a calculating module 1003, configured to calculate, for each of the multiple real boxes and the multiple confusion boxes, an encrypted positive sample proportion and an encrypted negative sample proportion corresponding to the box according to the encrypted tag and the encrypted anti-tag corresponding to the data identifier in the box;

an indication information sending module 1004, configured to send the indication information of each bin to the second party, so that the second party determines an intermediate result according to the indication information and encrypts the intermediate result; wherein the indication information is related to the encrypted positive sample ratio and the encrypted negative sample ratio, and the intermediate result is used for calculating the evidence weight and/or the information value;

an intermediate result obtaining module 1005, configured to obtain encrypted intermediate results corresponding to each bin sent by the second party, and select intermediate results corresponding to multiple real bins from the encrypted intermediate results;

a final result sending module 1006, configured to calculate an encrypted evidence weight and/or an encrypted information value according to the intermediate results corresponding to the plurality of real boxes, and send an encrypted final result to the second participant, where the final result includes a variation trend of the evidence weight and/or the encrypted information value.

Optionally, the indication information sending module 1004 is specifically configured to:

accordingly, the final result sending module 1006 is specifically configured to:

optionally, the encrypted evidence weight corresponding to each box is divided by a third random number to obtain an encrypted final result, and the encrypted final result is sent to the second party, where the final results of the multiple real boxes are used to characterize a variation trend of the evidence weight.

The data binning processing apparatus based on the confusion box provided in this embodiment may be used to execute the methods provided in the embodiments shown in fig. 1 to fig. 8, and the implementation principle and the technical effect are similar, and are not described herein again.

Fig. 11 is a schematic structural diagram of another data binning processing apparatus based on a confusion box according to an embodiment of the present invention. The apparatus may be applied to a second party. As shown in fig. 11, the data binning processing apparatus based on a confusion bin may include:

the data identifier sending module 1101 is configured to send a plurality of data identifiers and encrypted tags and anti-tags corresponding to each data identifier to a first party, so that the first party performs binning operation on a plurality of locally stored data identifiers corresponding to a plurality of locally stored data identifiers according to locally stored characteristic variables corresponding to the plurality of data identifiers, obtains a plurality of real bins, and randomly generates a plurality of confusion bins;

an indication information obtaining module 1102, configured to obtain indication information of each of the plurality of real boxes and the plurality of obfuscated boxes sent by the first party; the indication information of each box is related to the encrypted positive sample proportion and the encrypted negative sample proportion corresponding to the box, and the encrypted positive sample proportion and the encrypted negative sample proportion are determined by the first party according to the encrypted tag and the encrypted anti-tag of the data identifier in the box;

an intermediate result determining module 1103, configured to determine an intermediate result according to the indication information and encrypt the intermediate result; wherein the intermediate results are used to calculate evidence weights and/or information values;

an intermediate result sending module 1104, configured to send the encrypted intermediate results of each bin to the first participant, so that the first participant selects intermediate results corresponding to multiple real bins from the encrypted intermediate results corresponding to each bin, and calculates an encrypted evidence weight and/or an encrypted information value according to the intermediate results corresponding to the multiple real bins;

a final result obtaining module 1105, configured to obtain an encrypted final result sent by the first party, where the final result includes a trend of change of the evidence weight and/or an information value.

correspondingly, the intermediate result determining module 1103 is specifically configured to:

Optionally, the final result obtained from the first party includes final results corresponding to a plurality of feature variables; the final result obtaining module 1105 is further configured to:

The data binning processing apparatus based on the confusion box provided in this embodiment may be used to execute the method provided in the embodiment shown in fig. 9, and the implementation principle and the technical effect are similar, and are not described herein again.

Fig. 12 is a schematic structural diagram of a data binning processing apparatus based on a confusion bin according to an embodiment of the present invention. As shown in fig. 12, the apparatus may include: a memory 1201, a processor 1202 and a confusion box based data binning processing program stored on the memory 1201 and executable on the processor 1202, which when executed by the processor 1202 implements the steps of the confusion box based data binning processing method according to any of the previous embodiments.

Alternatively, the memory 1201 may be separate or integrated with the processor 1202.

For the implementation principle and the technical effect of the device provided by this embodiment, reference may be made to the foregoing embodiments, and details are not described here.

The embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a obfuscated-box-based data binning processing program, and when the obfuscated-box-based data binning processing program is executed by a processor, the method of the present invention as described in any one of the foregoing embodiments is implemented.

An embodiment of the present invention further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the method for data binning processing based on a confusion box is implemented.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of modules may be combined or integrated into another system, or some features may be omitted, or not executed.

The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a processor to execute some steps of the methods according to the embodiments of the present invention.

It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.

The storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.

An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the storage medium may reside as discrete components in an electronic device or host device.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The term "comprising" is used to specify the presence of stated features, integers, steps, operations, elements, components, operations, components, or the components, and/components.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A confusion box-based data binning method applied to a first participant, the method comprising:

2. The confusion box-based data binning method of claim 1, wherein sending indication information for each bin to the second party comprises:

calculating an encrypted evidence weight and/or an information value according to the intermediate results corresponding to the plurality of real boxes and sending an encrypted final result to the second participant, wherein the method comprises the following steps:

3. A confusion-bin-based data binning processing method as claimed in claim 2, wherein the indication information specifically comprises, for each bin, an encrypted product of the positive sample fraction multiplied by a first random number, an encrypted product of the negative sample fraction multiplied by a second random number;

calculating an encrypted evidence weight and/or an encrypted information value according to the intermediate results corresponding to the plurality of real boxes and the corresponding random numbers, and sending an encrypted final result to the second participant, wherein the method comprises the following steps:

4. The obfuscated-bin-based data binning processing method of claim 2, wherein the indication information includes, for each bin, an encrypted positive-sample-fraction multiplied by a first random number, an encrypted negative-sample-fraction multiplied by a second random number, a difference between the encrypted positive-sample-fraction and the encrypted negative-sample-fraction multiplied by a third random number;

5. A confusion box-based data binning method applied to a second participant, the method comprising:

6. A confusion box-based data binning apparatus for use by a first party, the apparatus comprising:

7. A confusion box-based data binning apparatus for use by a second party, the apparatus comprising:

8. A confusion-box-based data binning processing apparatus, comprising: a memory, a processor and a confusion-bin-based data binning processing program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the confusion-bin-based data binning processing method as claimed in any of claims 1-7.

9. A computer-readable storage medium, on which a confusion-box-based data binning processing program is stored, which, when executed by a processor, implements the steps of the confusion-box-based data binning processing method according to any of claims 1-7.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the confusion box based data binning method as claimed in any of claims 1 to 7.