CN113407988A

CN113407988A - Method and device for determining effective value of service data characteristic of control traffic

Info

Publication number: CN113407988A
Application number: CN202110580162.7A
Authority: CN
Inventors: 刘颖婷; 陈超超; 王力
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2021-05-26
Filing date: 2021-05-26
Publication date: 2021-09-17

Abstract

The embodiment of the specification provides a method and a device for determining an effective value of a service data characteristic of control traffic. The business data belongs to privacy data, and the business data of a plurality of participants can be supposed to be spliced into joint data which comprises characteristic values of a plurality of objects for a plurality of characteristic items. And multiple parties respectively obtain the joint data fragment, the predicted value fragment of the sample and the model parameter fragment. The selected participants in the multiple parties reconstruct complete predicted value data by using the predicted value fragments in the multiple parties; determining relevance data fragments respectively corresponding to multiple parties by utilizing multi-party safety calculation and multi-party interaction based on joint data fragments of the multiple parties and predicted value data of selected participants, wherein the relevance data fragments comprise relevance data among multiple feature items; and determining the effective value of the characteristic item corresponding to the model parameter on the aspect of improving the effect of the service prediction model by adopting a significance test method and through the safety interaction among multiple parties based on the corresponding data in the model parameter fragment and the relevance data fragment of the multiple parties.

Description

Method and device for determining effective value of service data characteristic of control traffic

Technical Field

One or more embodiments of the present specification relate to the technical field of data security, and in particular, to a method and an apparatus for determining an effective value of a service data characteristic for controlling traffic.

Background

The data required for machine learning often involves multiple platforms, multiple domains. For example, in a merchant classification analysis scenario based on machine learning, an electronic payment platform has transaction flow data of merchants, an electronic commerce platform stores sales data of the merchants, and a banking institution has loan data of the merchants. In order to improve service, multiple parties often train a business prediction model in a combined manner on the premise of ensuring privacy and security of business data.

As the amount of data increases, the characteristic dimensions of the data become larger and larger. The multi-dimensional feature data often has some redundant information, which may affect the effect of machine learning and reduce the stability of the model. Therefore, the multidimensional feature data can be subjected to dimension reduction according to feature effectiveness, redundant features with low significance in improving the model performance are removed under the condition that the information quantity is not lost as much as possible, and the redundant features are converted into low-dimensional features.

Therefore, an improved scheme is desired, which can improve the processing efficiency in the process of determining the effective value of the feature as much as possible while ensuring the privacy of the data.

Disclosure of Invention

One or more embodiments of the present specification describe a method and an apparatus for determining effective values of characteristics of service data for controlling traffic, which can determine effective values of characteristic items for service data distributed in multiple parties without revealing privacy data, and improve processing efficiency. The specific technical scheme is as follows.

In a first aspect, an embodiment provides a method for determining effective values of characteristics of service data for controlling traffic, where the service data is distributed among multiple participants, and the service data of each of the multiple participants forms federated data under the assumption of concatenation, where the federated data includes characteristic values of multiple objects for multiple characteristic items; the method is performed by any first participant device, comprising:

acquiring joint data fragments of a first participant, acquiring predicted value fragments corresponding to a plurality of objects respectively, and model parameter fragments corresponding to a plurality of feature items respectively; the predicted value fragment and the model parameter fragment are obtained based on a trained service prediction model;

reconstructing complete predictor data in a selected participant device using a plurality of predictor slices through transmission of predictor slices between the participant device and the selected participant device;

determining relevance data fragments respectively corresponding to a plurality of participants based on joint data fragments of the plurality of participants and predicted value data of a selected participant by utilizing multi-party safety calculation through interaction among a plurality of participant devices, wherein the relevance data fragments comprise relevance data among a plurality of characteristic items;

and determining the effective value of the characteristic item corresponding to the model parameter on improving the effect of the business prediction model by adopting a significance test method and through the safety interaction among a plurality of participant devices based on the model parameter fragments of the participants and the corresponding data in the relevance data fragments.

In one embodiment, the step of obtaining the federated data segment of the first party includes:

adopting a secret sharing addition, and carrying out splitting and splicing operations based on the service data of a plurality of participants through interaction with other participant equipment, so that the plurality of participants respectively obtain joint data fragments; the federated data fragments of multiple participants result in the federated data assuming reconstruction.

In one embodiment, the service prediction model is obtained by performing security association training based on respective association data segments of a plurality of participants; the business prediction model is used for conducting business prediction on the object.

In an embodiment, the step of obtaining the predicted value slices corresponding to the plurality of objects and the model parameter slices corresponding to the plurality of feature items includes:

obtaining a model parameter fragment of the trained service prediction model in the local first participant device;

through interaction of a plurality of participants, the plurality of participants are enabled to determine predicted value fragments of the object respectively based on joint data fragments of the plurality of participants and the trained service prediction model.

In one embodiment, the selected participant comprises a participant having tag data; the step of reconstructing complete predictive value data comprises:

when the first participant is not the selected participant, the predicted value fragments of the first participant are sent to the selected participant equipment, so that the selected participant equipment utilizes the plurality of predicted value fragments to reconstruct complete predicted value data;

and when the first participant is the selected participant, receiving the predicted value fragments sent by other participants, and reconstructing the predicted value fragments of the first participant and the received predicted value fragments to obtain complete predicted value data.

In one embodiment, the correlation data comprises covariance matrix data, and the correlation data patches comprise covariance matrix patches;

the step of determining the relevance data segments corresponding to the multiple participants respectively includes:

determining intermediate matrix fragments corresponding to the multiple participants respectively based on joint data fragments of the multiple participants, predicted value data of the selected participants and a functional relation in the service prediction model;

and calculating the inverse fragments of the intermediate matrix corresponding to the multiple participants respectively based on the intermediate matrix fragments of the multiple participants to obtain the covariance matrix fragments corresponding to the multiple participants respectively.

In one embodiment, the step of determining the intermediate matrix slices corresponding to the multiple participants respectively includes:

dividing a Hessian matrix expression obtained based on a functional relation in the service prediction model into a plurality of blocks according to the assumed splicing relation of the service data of a plurality of participants; the Hessian matrix expression comprises joint data and predictive value data;

and according to the data of the participants related to the multiple blocks, respectively determining the data fragments of the multiple blocks by the multiple participants by using the joint data fragments of the multiple participants and the corresponding data in the predictive value data of the selected participant, and respectively determining the corresponding Hessian matrix fragments based on the data fragments of the blocks to serve as the middle matrix fragments.

In one embodiment, the business data of any one participant comprises characteristic values of part of characteristic items of all objects;

the step of dividing the hessian matrix expression obtained based on the functional relation in the service prediction model into a plurality of blocks includes:

dividing the hessian matrix expression into a first partition associated with the data of the selected participant, a second partition associated with a plurality of participants;

the step of enabling the plurality of participants to respectively determine the data shards of the plurality of chunks and respectively determine the corresponding hessian matrix shards based on the data shards of the chunks includes:

when the first participant is a selected participant, determining the data fragment of the first block by using the joint data fragment of the first participant and the predictive value data; when the first participant is not the selected participant, filling a value of 0 into the first block to obtain the data fragment of the first block;

determining data shards of second shards corresponding to the multiple participants by utilizing the secret sharing matrix multiplication SMM and through interaction among the multiple participants and utilizing the joint data shards of the multiple participants and corresponding data in the predicted value data of the selected participant;

and splicing the data fragments of the first block and the data fragments of the second block of the first party to obtain the Hessian matrix fragment of the first party.

In one embodiment, the business data of any one participant comprises feature values of all feature items of a part of objects;

the step of dividing the hessian matrix expression obtained based on the functional relation in the service prediction model into a plurality of blocks includes: dividing the hessian matrix expression into a plurality of partitions respectively associated with data of a plurality of parties;

determining a data fragment of a block associated with the data of the first participant by using the joint data fragment and predicted value data of the first participant, and filling a value of 0 into the block associated with the data of other participants to obtain the data fragment;

and splicing the data fragments of the plurality of blocks to obtain the Hessian matrix fragment of the first participant.

In an embodiment, the step of calculating inverse partitions of intermediate matrices corresponding to the multiple participants respectively based on the intermediate matrix partitions of the multiple participants to obtain covariance matrix partitions corresponding to the multiple participants respectively includes:

and obtaining covariance matrix fragments respectively corresponding to the multiple participants through iterative computation based on the intermediate matrix fragments of the multiple participants by using a secret sharing matrix inverse algorithm (SMI).

In an embodiment, the step of determining an effective value of a feature item corresponding to a model parameter in improving the effect of the business prediction model includes:

using diagonal elements in the covariance matrix segments of the multiple participants as variance segments corresponding to the multiple model parameters respectively;

aiming at any model parameter, utilizing a secret sharing root number inverse algorithm SNSI and a significance test method, jointly performing safe root number inverse operation through interaction among a plurality of participant devices on the basis of a corresponding model parameter fragment of the first participant and a corresponding variance fragment of a plurality of participants, and determining a significance test value fragment of the first participant aiming at the model parameter; and determining the effective value of the feature item corresponding to the model parameter based on the significance test value shards of the multiple participants for the model parameter.

In one embodiment, the method further comprises:

aiming at any first characteristic item, obtaining a valid value fragment of the first characteristic item from other participant equipment;

and determining the reconstructed effective value of the first feature item based on the local effective value fragment of the first feature item and the obtained effective value fragment.

In one embodiment, the method further comprises:

and based on the effective value, removing the characteristic items of which the effective values do not meet the preset conditions from the plurality of characteristic items so that the plurality of participants perform safe joint training on the service prediction model by adopting the service data without the characteristic items.

In one embodiment, the object comprises one of a user, a commodity, an event; the characteristic items include at least one of: basic attribute information, incidence relation information, interaction information and historical behavior information; the business prediction model is used for conducting business prediction on the object.

In one embodiment, the business prediction model is based on a logistic regression model.

In a second aspect, an embodiment provides an apparatus for determining effective values of characteristics of service data for controlling traffic, where the service data is distributed among multiple participants, and the service data of each of the multiple participants forms federated data under the assumption of concatenation, where the federated data includes characteristic values of multiple objects for multiple characteristic items; the apparatus is deployed in any first participant device, and comprises:

the acquisition module is configured to acquire joint data fragments of a first participant, acquire predicted value fragments corresponding to a plurality of objects respectively, and model parameter fragments corresponding to a plurality of feature items respectively; the predicted value fragment and the model parameter fragment are obtained based on a trained service prediction model;

a reconstruction module configured to reconstruct complete predictor data in a selected participant device using a plurality of predictor slices through transmission of predictor slices between the participant device and the selected participant device;

the interaction module is configured to determine relevance data fragments corresponding to a plurality of participants respectively based on joint data fragments of the plurality of participants and predicted value data of a selected participant by utilizing multi-party safety calculation through interaction among a plurality of participant devices, wherein the relevance data fragments comprise relevance data among a plurality of characteristic items;

and the verification module is configured to determine an effective value of a feature item corresponding to a model parameter in improving the effect of the business prediction model by adopting a significance verification method through the safety interaction among a plurality of participant devices and based on the model parameter fragments of the participants and the corresponding data in the relevance data fragments.

In one embodiment, the obtaining module, when obtaining the federated data segment of the first participant, includes:

In one embodiment, the selected participant comprises a participant having tag data; the reconstruction module is specifically configured to:

the interaction module comprises:

the determining submodule is configured to determine intermediate matrix fragments corresponding to a plurality of participants respectively based on joint data fragments of the participants, predicted value data of the selected participants and a functional relation in the service prediction model;

and the calculation submodule is configured to calculate the inverse fragments of the intermediate matrix corresponding to the multiple participants respectively based on the intermediate matrix fragments of the multiple participants to obtain the covariance matrix fragments corresponding to the multiple participants respectively.

In a third aspect, embodiments provide a computer-readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to perform the method of any of the first aspect.

In a fourth aspect, an embodiment provides a computing device, including a memory and a processor, where the memory stores executable code, and the processor executes the executable code to implement the method of any one of the first aspect.

According to the method and the device provided by the embodiment of the specification, through interaction among a plurality of participants, based on the joint data fragmentation and the predicted value data of the selected participant and the joint data fragmentation and the predicted value fragmentation of other participants, a plurality of participants are enabled to obtain the relevance data fragmentation by utilizing multi-party safety calculation, and then the effect value of the feature item on improving the model effect is determined by utilizing the model parameter fragmentation and the relevance data fragmentation. A plurality of participants use the joint data fragments to carry out multi-party security calculation, the obtained related data is also the data fragments, model parameter fragments are also used when effective values are calculated, the effective values are determined by using the data fragments, the data privacy can be well protected from being revealed, and the privacy and the security of the data in the processing process are improved. Meanwhile, the prediction value data is reconstructed, so that the interactive data amount in the multi-party safety calculation process can be reduced to a greater extent, and the efficiency in the whole processing process is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

FIG. 1 is a schematic diagram illustrating an implementation scenario of an embodiment disclosed herein;

fig. 2 is a schematic flowchart of a method for determining an effective value of a service data characteristic of controlling traffic according to this embodiment;

FIG. 3 is a schematic diagram illustrating a calculation flow of the secret sharing matrix multiplication application according to the present embodiment;

fig. 4 is a schematic block diagram of an apparatus for determining a valid value of a service data characteristic of control traffic according to an embodiment.

Detailed Description

The scheme provided by the specification is described below with reference to the accompanying drawings.

Fig. 1 is a schematic view of an implementation scenario of an embodiment disclosed in this specification. As shown in fig. 1, in a shared learning scenario, a data set is provided by a plurality of participants 1,2, …, W in common (W is a natural number), and each participant possesses a part of data in the data set, and forms business data (i.e., an original matrix) of the participant. The data set may be a training data set for training a model, a testing data set for testing a model, or a data set to be predicted. The data set may include characteristic data of an object, and the object may be one of various business objects to be analyzed, such as a user, a commodity, an event, and the like. The model may comprise a business prediction model trained in a machine learning manner.

There are at least two data distributions for the data set. One distribution is that each participant has different characteristic data for all objects. For example, each participant has the same samples of N objects, and the privacy data of each sample contains D features, which are distributed among W participants, each participant having D/W features. As another example, two platforms have the same set of users, but have different user characteristics in their business data. Each participant has different kinds of features, and the number of the features can be the same (for example, each participant has D/W features) or different. N, D and W are both natural numbers. This is a scenario of data vertical slicing in a data set, and table 1 is service data distribution of the data vertical slicing scenario.

TABLE 1

Where xx represents a specific characteristic value, belonging to the private data of the participant. Each row in table 1 represents one sample data, each column represents the feature value of a feature item of N objects, and D feature items belong to W participants. The feature values of the D feature items of the N objects constitute the entire business data.

Another distribution is that each participant has all the characteristic data of the different objects. For example, there are N samples of the object, the business data of each sample includes D feature items, the N pieces of business data are distributed in W participants, each participant has a part of the samples in all N samples, and the feature items included in each sample are the same. The number of object samples stored by different participants may be the same or different. As another example, there are two banks that serve different groups of users, but they both have the same user credit characteristics. This is a scenario of data horizontal slicing in the data set, and table 2 is service data distribution of the data horizontal slicing scenario.

TABLE 2

Where xx represents a specific characteristic value, belonging to the private data of the participant. Each row in table 2 represents one sample data, each column represents the feature value of a certain feature item of N objects, and N sample data belong to W participants. Different participants have different object samples. The feature values of the D feature items of the N objects constitute the entire business data.

The business data owned by the participants may include a plurality of characteristic items. The feature item of the object may include at least one of: basic attribute information, association relation information, interaction information, historical behavior information and the like of the object. For example, when the object is a user, the basic attribute information may include gender, age, income, and the like of the user, the association information of the user may include other users, companies, regions, and the like, which have an association with the user, the interaction information of the user may include information of clicking, viewing, participating in a certain activity, and the like of the user at a certain website, and the historical behavior information of the user may include historical transaction behavior, payment behavior, purchase behavior, and the like of the user.

When the object is a commodity, the basic attribute information may include a category, a place of production, a price, and the like of the commodity, the association relationship information of the commodity may include a user, a shop, or other commodities, and the like, which have an association relationship with the commodity, the interaction information of the commodity may include interaction characteristics between the user, the shop, and the commodity, and the historical behavior information of the commodity may include information that the commodity is purchased, transferred, returned, and the like.

When the object is an event, the event may include a transaction event, a login event, a purchase event, a social event, and the like. The basic attribute information of the event may be text information for describing the event, the association relation information may include text having a contextual relation with the event, other event information having an association with the event, and the like, and the historical behavior information may include record information of the event developing and changing in a time dimension, and the like.

The various participants may correspond to different service platforms that may include various enterprises, institutions, organizations, and the like. The service data is often privacy data of the service platform, and higher privacy and security are required to be maintained in the processing process. Regardless of the data distribution mode, the eigenvalue (i.e., the characteristic data) corresponding to the characteristic item of the object belongs to the private data, and can be stored as a private data matrix. In order to secure the private data, each participant needs to leave the private data thereof locally, not output plaintext data, and not perform plaintext aggregation.

In order to protect private data of each participant from being leaked out, in one embodiment, each participant can adopt a multi-party safe calculation mode, and utilize a predicted value and an original matrix of each participant to enable a third party to obtain covariance matrix data capable of representing correlation data among a plurality of feature items through interaction with the third party. And the third party determines the effective value of the characteristic item corresponding to the model parameter on improving the effect of the service prediction model by using the covariance matrix data and the model parameter and adopting a significance test method.

The covariance matrix data contains certain privacy data, so that the security of the privacy data can be improved by further improving the covariance matrix data. In another embodiment, the validity of the feature item may be determined based on multi-party security calculations and using various data shards to determine the relevance data shard, and thus the validity of the feature item, without using a third party, but only between the multiple participants.

When processing in the above manner, since data exists in the form of data fragments among a plurality of participants, and the plurality of participants need to perform a large amount of data interaction when determining the relevant data fragments by using multi-party security calculation, although the security of private data can be guaranteed to the maximum extent, the processing efficiency is very low.

In order to balance the ratio between the processing efficiency and the privacy protection, namely to improve the processing efficiency and appropriately reduce the requirement on a little privacy protection when the requirement is allowed, the embodiment of the specification provides a corresponding implementation scheme. Referring to fig. 1, in the embodiment of the present specification, each participant stores a respective data segment, which includes a respective joint data segment, a predicted value segment corresponding to a plurality of objects, a model parameter segment corresponding to a plurality of features, and the like, a selected participant device reconstructs complete predicted value data using the predicted value segments of the plurality of participants, the plurality of participant devices perform interaction based on multi-party security computation, determines a relevance data segment corresponding to each of the plurality of participants using the joint data segment and the predicted value data of the selected participant, the relevance data fragment comprises relevance data among a plurality of characteristic items, each participant adopts a significance test method respectively, and determining effective values of the feature items on the effect of the service prediction model on the basis of corresponding data in the model parameter fragments and the relevance data fragments of the multiple participants through the safety interaction among the multiple participant devices.

In this embodiment, a plurality of participants perform multi-party security calculation by using the joint data fragment, the obtained correlation data is also a fragment, and a model parameter fragment is also used when calculating an effective value. In the process, joint data, correlation data, model parameters and other data containing important privacy are all represented by fragments, data reconstruction is not performed, and data privacy can be well protected from being leaked. And the selected participants are used for reconstructing the predicted value data, so that the interactive data volume in the multi-party security calculation process can be well reduced, the communication traffic between the participants is controlled, and the overall processing efficiency can be improved. Therefore, the purpose of giving consideration to privacy protection and processing efficiency is achieved.

In this specification, a plurality of participants have corresponding participant apparatuses, respectively, and the operations in the embodiments of the specification are performed using the corresponding participant apparatuses. Participant devices include, but are not limited to, any apparatus, device, platform, cluster of devices, etc. having computing, processing capabilities. The following describes embodiments of the present invention with reference to specific examples.

Fig. 2 is a flowchart illustrating a method for determining an effective value of a service data characteristic of controlling traffic according to this embodiment. The service data is distributed among a plurality of participants, and the service data of each of the plurality of participants constitutes joint data under the condition of supposing splicing. The business data of the participants belong to privacy data with high privacy, and the business data cannot be sent in a clear text among the multiple participants, and the business data cannot be really spliced to form combined data. The syndication data is only a data set consisting of business data of a plurality of participants under the assumption. For example, the above tables 1 and 2 are specific forms of the joint data in the scenarios of data vertical slicing and data horizontal slicing, respectively. The joint data includes feature values of a plurality of objects for a plurality of feature items, and may include feature values of N objects for D feature items, where N and D are both natural numbers, for example.

For convenience of description, two participants are exemplified in the following examples. For example, the two parties are a first party a and a second party B, respectively, the first party a corresponding to a first party device and the second party B corresponding to a second party device. The participator device is used for executing the operation of the participator and storing the data of the participator. In particular embodiments, the participant device may also obtain data of the participant from other devices. The method of the present embodiment specifically includes the following steps S210 to S240.

Step S210, the first participant device obtains the joint data segment of the first participant a, obtains the predicted value segments corresponding to the plurality of objects, and obtains the model parameter segments corresponding to the plurality of feature items. And the second participant equipment acquires the joint data fragments of the second participant B, acquires the predicted value fragments corresponding to the plurality of objects respectively, and acquires the model parameter fragments corresponding to the plurality of characteristic items respectively.

The multiple participants respectively have respective service data, which belongs to the original data and is also the privacy data. In the vertical segmentation scene, the feature items of a plurality of participants are different, and the objects are the same. The plurality of participants may respectively represent their respective raw data in raw matrices, for example, the raw matrices of the first participant a and the second participant B may be respectively represented as X_AAnd X_BThe characteristic items are respectively represented as d_A、d_BThe number of objects is respectively represented as n_AAnd n_BThen the total characteristic term of the conjoined data is D ═ D_A+d_BThe total number of objects or samples is N ═ N_A＝n_B. When the columns in the original matrix represent characteristic items and the rows represent objects or samples, a plurality of participants such as a first participant A and a second participant B are involvedThe assumed transverse splicing is carried out on the service data to obtain joint data in the form of X ═ X (X)_A，X_B). The above is the case where the columns in the original matrix represent the feature items and the rows represent the samples, and corresponds to the data distribution in table 1. In other embodiments, the columns in the original matrix may represent objects, and the rows represent feature items, in which case, assuming vertical concatenation is performed on the service data of multiple participants, such as the first participant a and the second participant B, joint data may be obtained in the form of joint data

In the horizontal segmentation scene, the characteristic items of a plurality of participants are the same, and the objects are different. The original matrices of the first party A and the second party B are X respectively_AAnd X_BThe characteristic items are respectively d_A＝d_BD, the number of objects is n_A、n_BThen the total characteristic term of the conjoined data is D ═ D_A＝d_BThe total number of objects or samples is N ═ N_A+n_B. When the rows in the original matrix of the participants represent objects and the columns represent characteristic items, the service data of a plurality of participants such as a first participant A and a second participant B are assumed to be longitudinally spliced to obtain joint data in the form of

The above may correspond to the data distribution scenario in table 2. When the rows in the original matrix represent feature items and the columns represent objects, service data of multiple participants such as a first participant a and a second participant B are subjected to assumed horizontal splicing to obtain joint data in the form of X ═ X (X)_A，X_B)。

In order to enable a plurality of participants to obtain the joint data fragmentation, secret sharing addition can be adopted among the participants to split the business data of the participants into random numbers, and the fragmentation is completed through the transmission of the random numbers among the participants. Specifically, when the first participant device acquires the joint data segment of the first participant a, a secret sharing addition may be adopted, and through interaction with other participant devices, splitting and splicing operations are performed based on the service data of multiple participants, so that the multiple participants respectively acquire the joint data segment. Similarly, the second party B also obtains its joint data shards.

The secret sharing addition can split an original matrix into random matrices, and the fragmentation is completed through the transmission of the random matrices among a plurality of participants. Taking two participants as an example, a first participant a and a second participant B respectively possess original matrices X of service data_AAnd X_B. For the first participant device, it may generate a random matrix R in the finite field_AAnd calculating X_A-R_A＝X₂The first participant device may combine the two random matrices R_AAnd X₂Any one of (1), e.g. X₂And sending to the second participant device. A second participant device, also generating a random matrix R in the finite field_BAnd calculating X_B-R_B＝X₃The second participant device may combine the two random matrices R_BAnd X₃Any one of (1), e.g. X₃And sending the message to the first participant device.

The first participant device may associate R with_AAnd X received from the second participant device₃Spliced into federated data fragments, the second participant device can segment R_BAnd X sent by the received first participant device₂And splicing into joint data fragments. Of course, in practical application scenarios, the number of participants is usually 3 or more, and the implementation process of the secret sharing addition can be easily extended to more than three parties. The data sent among the multiple participants is a random matrix, and the privacy data of the original matrix is not disclosed.

Wherein the federated data fragments of the multiple participants result in federated data assuming reconstruction. The reconstruction may be implemented by adding the data fragments of the parties, and the specific reconstruction may be to add other matrix transformation operations on the basis of the addition, the matrix transformation including, for example, multiplication by a preset value, and the like. The union data contains the privacy data, each participant does not directly carry out plaintext aggregation on the privacy data, the union data is only a representation under an assumed condition, and the data fragments of the participants cannot be directly reconstructed together in practice. The following meanings for reconstitution apply to the description herein.

Federated data fragmentation for first participant A may be employed<X>_AIndicating that federated data fragmentation of the first party B can be employed<X>_BDenotes that then the joint data X ═<X>_A+<X>_B. Wherein the content of the first and second substances,<X>denotes the slice of the parameter X, with the subscript indicating the party to which the slice belongs. For the sake of uniformity in expression, the fragmentation of data in a certain participant is indicated hereinafter in the form of "tip brackets + subscripts".

In this embodiment, the federated data segments of the participants are obtained based on the business data of the multiple participants, and the sum of the federated data segments of the multiple participants is conceptually or theoretically equal to the federated data.

In step S210, the predicted value segment and the model parameter segment are data obtained based on the trained service prediction model. The service prediction model is obtained by performing safe joint training based on joint data fragments of a plurality of participants. The business prediction model can be obtained by pre-training. The business prediction model can be a model obtained by training based on a logistic regression model, and can also be obtained by training based on other types of models. The business prediction model is used for performing business prediction on the object, for example, classification prediction or regression prediction can be performed on input feature data of the object.

And the plurality of participant devices can obtain the predicted value fragments and the model parameter fragments through the trained service prediction model. For example, the first participant device may obtain a model parameter fragment of the trained business prediction model local to the first participant device, and respectively enable the multiple participants to determine the predicted value fragments of the object based on the joint data fragments of the multiple participants and the trained business prediction model through secure interaction between the multiple participant devices.

And the plurality of participant devices take N objects in the joint data fragments as samples to train a service prediction model. After training, the model parameter fragment of the service prediction model in the present participant device can be obtained. Through the safe interaction among a plurality of participant devices, the joint data fragments of the participants are input into a service prediction model, and each participant device can determine the predicted value fragments of a plurality of objects of the participant.

Therefore, for a participant, in the acquired data, one object corresponds to one predicted value fragment, N objects correspond to N predicted value fragments respectively, and the N predicted value fragments can be used as vector elements to form a vector, that is, the vector is represented; when the service data contains D characteristic items, the trained service prediction model contains a plurality of model parameters which respectively correspond to the D characteristic items. For any predicted value data, the corresponding predicted value segments owned by a plurality of participants obtain the predicted value data under the condition of supposing reconstruction. For any model parameter, the corresponding model parameter slices owned by multiple participants obtain the model parameter under the condition of supposing reconstruction.

Step S220, reconstructing complete predicted value data in the selected participant device using the plurality of predicted value slices through transmission of the predicted value slices between the participant device and the selected participant device.

The selected party C may be any one of a plurality of parties, or may be selected from a plurality of parties according to a certain selection rule. The selected participant may be preselected or set by an administrator, and the identity of the selected participant may be pre-sent to other participants.

In one embodiment, the selected participant may be the participant who owns the tag data. Taking the first party a as an example, the step of reconstructing the complete predicted value data may include:

and when the first participant A is not the selected participant, transmitting the predicted value fragment of the first participant A to the selected participant equipment, so that the selected participant equipment reconstructs complete predicted value data by using the plurality of predicted value fragments.

And when the first participant A is the selected participant, receiving the predicted value fragments sent by other participants, and reconstructing the predicted value fragments of the first participant A and the received predicted value fragments to obtain complete predicted value data.

Tag data may be included in the business data of the participants, for example, at least one tag data exists in the business data corresponding to each object. For example, when the object is a user, the business data of user 1 may include tag data of whether user 1 is a high-risk user, and tag data 1 and 0 may represent yes or no, respectively.

The participant having the tag data may be one or a plurality of participants. In the service data distribution as shown in table 1, one of the participating parties 1 to W may possess tag data of all objects, for example, the participating party 1 possesses tag data of all N objects. When reconstructing the predicted value data, the selected participant can obtain a plurality of predicted value fragments of other participants, and all the predicted value fragments of the N objects are reconstructed to obtain predicted value data. For example, there are 3 participants A, B and C, each having business data for 10 objects, and participant A having tag data for the selected participant. Party B and party C may send their 10 predicted value slices to party a, respectively. The device of the participant a may obtain all 30 predicted value slices, where each object corresponds to 3 predicted value slices, and the 3 predicted value slices may reconstruct predicted value data of the object, so as to obtain 10 predicted value data of 10 objects, which is 10 in total.

In the service data distribution as shown in table 2, the participating parties 1 to W respectively possess tag data of partial objects. In this case, the predictive value data is reconstructed in the different parties for the different traffic data, with the different parties acting as selected parties. For example, there are 3 participants A, B and C, each having business data for 5 objects, for a total of 15 objects. The business data of party a includes tag data of objects 1 to 5, the business data of party B includes tag data of objects 6 to 10, and the business data of party C includes tag data of objects 11 to 15. When reconstructing the predicted value data, for 5 objects from the object 1 to the object 5, the participant B and the participant C transmit the predicted value fragments of the 5 objects to the participant a, and the participant a reconstructs the predicted value data corresponding to the object 1 to the object 5 by using the 5 predicted value fragments of the participant a and the received 10 predicted value fragments transmitted by other participants, and the total number of the predicted value data is 5. Similarly, for the objects 6 to 10, the participant B serves as a selected participant, and the predicted value data of the part of the objects are reconstructed; for the objects 11 to 15, the participant C serves as a selected participant, and the prediction value data of the part of the objects is reconstructed.

The participant having the tag data serves as a selected participant, and the step of reconstructing the predicted value data is executed, so that the predicted value data is kept secret relative to the non-tag owner, and other participants cannot acquire the predicted value data, thereby protecting the privacy of the predicted value data.

Step S230, determining, by using multi-party security computation, through interaction among multiple participant devices, relevance data segments corresponding to multiple participants respectively based on joint data segments of the multiple participants and predicted value data of a selected participant, where the relevance data segments include relevance data among multiple feature items.

The relevance data fragments of the multiple participants obtain relevance data under the condition of reconstruction, namely relevance data among feature items, wherein the feature items comprise relevance data among feature items owned by the same participant and relevance data among feature items owned by different participants, and the relevance data among different feature items and the relevance data among the same feature items exist.

When the step is implemented, the relevance data fragments corresponding to a plurality of participants respectively can be determined by utilizing the joint data fragments and the predictive value data of the selected participants and in a multi-party safe calculation mode based on the existing formula for calculating the relevance data between the characteristic items. The formula capable of expressing the correlation data between the feature items may include a covariance matrix formula, a correlation coefficient formula, and the like.

Multi-party Secure computing (MPC) is an existing data privacy protection technology that can be used for Multi-party participation, and specific implementations thereof include homomorphic encryption, garbled circuit, careless transmission, secret sharing, and the like. By adopting a multi-party safe computing mode, the safe interactive computing aiming at the joint data fragmentation and the predictive value data among a plurality of participant devices can be realized, and then the plurality of participants can determine the corresponding correlation data fragmentation.

And S240, determining the effective value of the feature item corresponding to the model parameter on the improvement of the effect of the business prediction model by adopting a significance test method through the safety interaction among the equipment of the multiple participants and based on the corresponding data in the model parameter fragments and the relevance data fragments of the multiple participants.

The significance test method may include a Wald test, a Likelihood Ratio (LR) test, a Lagrange Multiplier (LM) test, and the like. After the existing formula provided by the significance test method is transformed, the model parameter fragments and the relevance data fragments of a plurality of participants are safely calculated through the safety interaction among the devices of the participants, and the effective value fragments corresponding to the participants are determined.

In this embodiment, the feature items correspond to model parameters, and data corresponding to the feature items exist in both the model parameter patches and the correlation data patches. By using the corresponding data in the model parameter fragment and the correlation data fragment and adopting a significance test method, the significance test value fragments corresponding to the plurality of model parameters respectively, namely the significance test value fragments of the corresponding plurality of feature items, can be determined, and the effective value fragment can be determined based on the significance test value fragments.

When the valid value of a certain feature item needs to be determined, for example, for an arbitrary first feature item, the first participant device may obtain a valid value fragment of the first feature item from other participant devices, and determine a reconstructed valid value of the first feature item based on the local valid value fragment of the first feature item and the obtained valid value fragment. The valid value of the feature item may also be reconstructed in the second participant device or in another participant device, and this embodiment is described only by taking the reconstruction of the valid value in the first participant device as an example.

After obtaining the effective values of the plurality of feature items, the first participant device may further remove, from the plurality of feature items, the feature item whose effective value does not satisfy the preset condition based on the plurality of effective values, so that the plurality of participants perform safe joint training on the service prediction model by using the service data from which the feature item is removed. The service data after the feature items are removed realizes the dimension reduction processing of the original matrix, so that the feature items are more refined, and the safety of the private data is ensured without leakage.

One embodiment is described in detail below. When the business prediction model includes a logistic regression model and the significance test method adopts the Wald test method, the manner of determining the relevance data fragmentation in step S230 and the specific implementation manner of determining the effect value of the feature item in step S240 are provided.

The application of the Wald test to logistic regression is first explained in detail below. When the logistic regression model is adopted to carry out regression on the characteristic data of the sample, the calculation formula of the predicted value comprises the following steps:

wherein, X is the characteristic data of the sample and can be used as an independent variable; pi (X) is a predictive value function of the sample and can be used as a dependent variable; beta is a model parameter and is a characteristic term coefficient; e is a natural constant.

The original and alternative hypotheses of the Wald test are:

H₀：ω_j0 (j-1, 2, …, k), i.e. the independent variable has no influence on the probability of the dependent variable occurring, i.e. the dependent variable is not influencedThe independent variable is assumed to have no influence on the estimation value of the dependent variable;

H₁：ω_j≠0

if the null hypothesis is rejected, it is stated that the dependent variable changes depending on the independent variable j.

The test statistic of the Wald test is

Wald_kIs a significance check value, which conforms to a chi-square distribution with a degree of freedom of 1. Wherein the content of the first and second substances,

as a parameter of the model

Also equal to the square root of the diagonal elements of the covariance matrix:

the diagonal elements of the covariance matrix are the variances of the feature terms. Covariance matrix of model parameters

The negative Hessian (Hessian) matrix is a log-likelihood function

Value of (A)

Wherein

For the element expression in the Hessian matrix H, the indices k and r are natural numbers less than N, x_ikAnd x_irIs an element in the joint data X, X_iRepresenting the characteristic data of the ith sample.

By deriving the above formula, the H matrix can be expressed as H ═ X^TMX of which

Where N is the total number of samples, i.e., the total number of objects, D is the dimension of the feature data, and pi (X)_N) For sample X for logistic regression model_NM is a diagonal matrix obtained based on the predicted value, and may also be referred to as a predicted value matrix.

From the above equation (2)

It can be seen that, for the kth model parameter, when the standard deviation of the model parameter is larger, that is, the value of the kth row and the kth column in the covariance matrix is larger, it is indicated that the model parameter causes the higher the concussion of the logistic regression model, and the smaller the Wald test value corresponding to the model parameter is.

In determining the significance check value Wald of the kth model parameter_kThen, can also be according to

To obtain z_kStatistic and according to p _ value ═ 2[1-norm_k|)]Cdf to obtain the probability distribution function of normal distributionAnd (4) counting. When the p value is smaller than the significance level threshold value, rejecting the original hypothesis, wherein the model parameter can be kept for modeling, and the effective value of the feature item corresponding to the model parameter can be 1 or other higher values; when the p value is not less than the significance level threshold, the original assumption is accepted, the model parameter is not retained, and the valid value of the feature item corresponding to the model parameter can be 0 or other lower value. The significance level threshold may typically be 0.05 or 0.01, etc.

Logistic regression analysis is a statistical method that resolves independent variables and dependent variables and defines the relationship between them. The regression equation that is built is only meaningful if the independent and dependent variables do have some relationship. Therefore, the fact that the independent variable is related to whether the prediction target is a dependent variable, how much the correlation is, and how much the reliability of the correlation is determined is a problem to be solved by the regression analysis. Logistic regression analysis may use the Wald test to check the values of the regression term coefficients one by one. If for certain arguments, the Wald test indicates that these arguments are important, they should be included in the model. If the Wald test indicates that these arguments are not significant, these arguments may be omitted from the model. The model parameters of the business prediction model can be evaluated by using logistic regression analysis and Wald test, and then the characteristic items of the object samples are screened based on the evaluation results, so that the purpose of performing dimension reduction processing on the business data is achieved.

In this embodiment, in step S230, the correlation data includes covariance matrix data, and the correlation data slices include covariance matrix slices. Covariance matrix patches of multiple participants can constitute a covariance matrix assuming reconstruction. The covariance matrix is a matrix formed by the covariance between two feature items in a plurality of feature items in the joint data, wherein the elements on the main diagonal are the variances of the plurality of feature items, and the elements on the off-diagonal are the covariance between the two feature items. The covariance matrix is a symmetric matrix, and when there are D feature entries in the joint data, the covariance matrix may be a symmetric matrix in D × D dimensions.

When determining the pieces of correlation data corresponding to the plurality of participants, respectively, in step S230, that is, determining the pieces of covariance matrix corresponding to the plurality of participants, respectively, the participant devices of the plurality of participants may perform the following steps 1 and 2.

Step 1, determining intermediate matrix fragments respectively corresponding to a plurality of participants based on joint data fragments of the participants, predicted value data of the selected participants and a functional relation in a service prediction model. For example, the first participant A gets the intermediate matrix shard<H>_AThe second participant B gets the intermediate matrix patches<H>_BMultiple intermediate matrix slices yield an intermediate matrix H under the assumption of reconstruction. The multiple participants do not really perform the reconstruction of the inter-matrix slice, and here only represent the relationship between the multiple inter-matrix slices.

And 2, calculating the inverse fragments of the intermediate matrix corresponding to the multiple participants respectively based on the intermediate matrix fragments of the multiple participants to obtain the covariance matrix fragments corresponding to the multiple participants respectively. For example, the first participant a gets the inverse sharding of the intermediate matrix<H^-1>_AThe second participant B gets the inverse of the intermediate matrix<H^-1>_BThe slicing of the inverses of the plurality of intermediate matrices yields the inverse H of the intermediate matrix under the assumption of reconstruction^-1. The multiple participants do not really perform the reconstruction of the slices of the intermediate matrix inverse, and here only the relation between the slices of the multiple intermediate matrix inverses is shown.

In the step 1, when determining the intermediate matrix segments corresponding to the multiple participants respectively, the following steps 1_1 and 1_2 may be included.

Step 1_1, dividing a Hessian matrix expression obtained based on a functional relation in a service prediction model into a plurality of blocks according to an assumed splicing relation of service data of a plurality of participants. The hessian matrix expression includes joint data and predictive value data.

Step 1_2, according to the participant data associated with the multiple blocks, the multiple participants respectively determine the data slices of the multiple blocks by using the joint data slices of the multiple participants and the corresponding data in the predictive value data of the selected participant, and respectively determine the corresponding Hessian matrix slices as middle matrix slices based on the data slices of the blocks.

When the business prediction model is a logistic regression model, the functional relation of the business prediction model, that is, the functional relation of the model prediction value, is shown in the above formula (1). After the logistic regression model is trained, the corresponding model parameters, such as β, are obtained. The hessian matrix expression is actually a second derivative of the model parameter β, i.e., equation (5)

Wherein H is a Hessian matrix obtained based on a functional relation in a service prediction model, H_krIs the expression of the elements in the kth row and the r column in the Hessian matrix H, x_ikAnd x_irIs an element in the joint data X, X_iCharacteristic data, pi (X), representing the ith sample_i) For predictive value data, i can be taken from 1 to N.

In this embodiment, the hessian matrix H may be divided into a plurality of blocks according to an assumed splicing relationship of the service data of a plurality of participants. The assumed splicing relation of the business data of the multiple participants comprises a splicing relation that the business data of the multiple participants are spliced into joint data based on the horizontal or vertical assumed splicing of the given participant sequence.

In the application scenario of data vertical slicing, the business data of any one participant includes feature values of part of feature items of all objects. Referring to table 1, where the service data of any one participant includes feature values of partial feature items of N objects, the service data of multiple participants may be combined based on horizontal concatenation to obtain joint data. Table 1 is merely an example of data distribution in which rows represent objects and columns represent feature items. In other embodiments, rows may be used to represent feature items and columns may be used to represent objects. In the following examples, the division of the hessian matrix in the data vertical slicing scenario is illustrated by taking the data distribution shown in table 1 as an example.

In such an application scenario, when the hessian matrix expression is partitioned, the hessian matrix expression may be partitioned into a first partition associated with data of a selected participant and a second partition associated with a plurality of participants.

For example, taking two participants as an example, the business data of the first participant a and the second participant B are conceptually spliced into joint data as shown in table 3. It should be noted that the splicing here is a virtual splicing and is not executed in practice.

TABLE 3

Wherein, the service data of the first party A comprises d_AAnd assuming that the first party a is the selected party, the service data of which includes the tag data. The service data of the second party B contains d_BA feature item, the second party B is not the selected party. If expressed in matrix form, the business data of the first party a and the second party B can be assumed to be spliced into a joint data matrix X similar to table 3, where the total number of rows is N and the total number of columns is d_AAnd d_BThe sum of X_N*(dA+dB)。

The Hessian matrix H is divided according to the expression of the formula (5) to obtain

Here, the hessian matrix H is divided into 4 blocks, each of which is a sub-matrix. It may be noted here that the total number of columns of conjoined data is d_A+d_BX in the formula (5)_ikAnd x_irI takes values in the rows 1-N of the joint data, and k and r take values in the columns 1-D of the joint data. This is the theoretical case.

For theUpper left corner block [ A ]]Matrix, in which H is determined_krThe required data of the elements are taken from the data of the first participant A and the value data pi (X) is predicted_i) Also owned by the first party a. That is, when k and r are in the columns 0 to d_AWhen taking the value in (1), x in the formula (5)_ikAnd x_irAll take values in the service data of the first party a. Thus, the upper left corner tile [ A ] can be determined]May be independently calculated by the first party a. Upper left corner block [ A ]]Is the first chunk associated with the data of the selected participant.

In formula (9), the upper right corner is blocked [ AB ]]And the lower left corner block [ BA]The matrices are transposed matrices of each other, and after one of the matrices is determined, the other matrix can be determined by transposing the other matrix. Following to determine the block [ AB ]]The description is given for the sake of example. Block [ AB ]]H in (1)_krThe data of the rows and columns of elements are taken from the service data of the first party a and the second party B. That is, when k and r are one in the columns 0 to d_AMiddle value, one in column number d_A+1～d_A+d_BWhen the value is medium, x in the formula (5) is_ikAnd x_irA service data X at the first party A_AA middle value, a service data X at the second party B_BTaking the value in the step (1). Thus, the upper right block [ AB ] can be determined]May be jointly computed by a first party a and a second party B. Upper right block [ AB ]]Belonging to a second partition associated with both the first party a and the second party B data.

Determining a lower right corner partition [ B]H in (1)_krThe data required by the element is fetched from the data of the first party a and the second party B. I.e., when k and r are in column number d_A+1～d_A+d_BWhen taking the value in (1), x in the formula (5)_ikAnd x_irService data X both at the second party B_BMedian, but predictive value data is owned by the first participant a, so the lower right hand corner is blocked B]Joint computation of the first party a and the second party B is also required. Lower right block [ B ]]Belonging to a second partition associated with both the first party a and the second party B data.

Thus, the hessian matrix H is divided into 4 blocks,accordingly, the Hessian matrix of the first party A<H>_AAnd Hessian matrix sharding of the second participant B<H>_BAre divided into 4 blocks, and the corresponding blocks in the two hessian matrix slices constitute the blocks in the hessian matrix under the assumed reconstruction.

Wherein the Hessian matrix of the first party A is sliced<H>_AMiddle, upper left data slicing<[A]>_A(belonging to the first partition), federated data shards owned by the first participant A<X>_AAnd predicted value data pi (X)_i) The data determination in (1); data slicing of upper right corner blocks<[AB]>_A(belonging to the second partition), the federated data slice owned by the first participant A<X>_AAnd predicted value data pi (X)_i) And federated data sharding owned by the second participant B<X>_BThe data in (1) is obtained by multi-party security calculation; data slicing of lower left corner tiles<[BA]>_A(belonging to the second partition), by slicing the data of the upper right partition<[AB]>_ATaking and transposing to obtain; data slicing of lower right corner partitions<[B]>_A(belonging to the second partition), predictive value data pi (X) owned by the first party A_i) And federated data sharding owned by the second participant B<X>_BThe data in (2) is obtained through multi-party security calculation.

Hessian matrix sharding of the second participant B<H>_BMiddle, upper left data slicing<[A]>_BThe value (belonging to the first block) can be obtained by filling with a value of 0; data slicing of upper right corner blocks<[AB]>_B(belonging to the second partition), the federated data slice owned by the first participant A<X>_AAnd predicted value data pi (X)_i) And a federated data shard owned by the second participant B<X>_BThe data in (1) is obtained by multi-party security calculation; data slicing of lower left corner tiles<[BA]>_B(belonging to the second partition), by slicing the data of the upper right partition<[AB]>_BTaking and transposing to obtain; data slicing of lower right corner partitions<[B]>_B(belonging to the second partition) fromPredicted value data pi (X) owned by a party A_i) And a federated data shard owned by the second participant B<X>_BThe data in (2) is obtained through multi-party security calculation.

To summarize, for a first partition, for a selected participant (e.g., first participant a), the selected participant device may determine a data slice of the first partition using its joint data slice and predictor data; for other participants than the selected participant (e.g., second participant B), the other participant devices may populate the 0 value into the first chunk, resulting in their data fragmentation;

for the second partition, a Secret Matrix Multiplication (SMM) in the multi-party security computation may be utilized, and through interaction among the multiple participants, the data fragments of the second partition corresponding to the multiple participants are determined by utilizing the joint data fragments of the multiple participants and the corresponding data in the predicted value data of the selected participant.

The first participant device may splice the data shards of the first partition and the data shards of the second partition of the first participant a to obtain the hessian matrix shards of the first participant a<H>_A. The second participant device may splice the data shards of the first chunk and the data shards of the second chunk of the second participant B to obtain the hessian matrix shard of the second participant B<H>_B。

The following describes embodiments for determining a data shard of a second chunk corresponding to a plurality of participants through interaction between the plurality of participants using federated data shards of the plurality of participants and corresponding data in predictive value data of a selected participant using SMM. With federated data sharding owned by the first participant A<X>_AAnd predicted value data pi (X)_i) And federated data sharding owned by the second participant B<X>_BDetermining, by SMM, a data slice of the top right corner partitions of the first participant A and the second participant B<[AB]>_AAnd<[AB]>_Bfor example.

In accordance with the formula x_ikπ(X_i)[π(X_i)-1]*x_irIn the calculation, the joint data of the first party A needs to be adopted<x>_APartial data, predicted value data pi (X)_i) And federated data fragmentation of the second participant B<x>_BBy using SMM joint computation, the first and second parties a and B respectively determine the data slice<[AB]>_AAnd<[AB]>_B。

in order to make the description more concise and clear, the amount to be calculated as described above is simplified as follows. Having matrix data shards in a first participant A<x>_AAnd<y>_Athe second party B owns the matrix data slice<x>_BAnd<y>_Bthe target parameter to be calculated is xy. Wherein x ═<x>_A+<x>_B，y＝<y>_A+<y>_B. Through SMM federated computation, the first participant A is made available<Y>_AThe second party B gets<Y>_BAnd xy ═<Y>_A+<Y>_B. The specific calculation process may be performed according to the flow chart shown in fig. 3.

Fig. 3 is a schematic diagram of a calculation flow of secret sharing matrix multiplication application in this embodiment, which includes the following specific steps.

Step 1, both sides respectively obtain a random number matrix triple. First party A acquisition_A、<v>_A、<z>_ASecond party B acquisition_B、<v>_B、<z>_BAnd z ═ u × v is satisfied, wherein z ═ u × v<z>_A+<z>_B，u＝_A+_B，v＝<v>_A+<v>_B。

And 2, the first participant A splits the private data of the first participant A by using the random number so as to realize the shielding of the private data and further obtain a secret matrix. First Party A computation<d>_A＝<x>_A-_A，<e>_A＝<y>_A-<v>_A. Second participant B followsAnd the machine number splits the private data to obtain a secret matrix. Second Party B computation<d>_B＝<x>_B-_B，<e>_B＝<y>_B-<v>_B。

And 3, transmitting the respective secret matrixes among the participants, and processing the secret matrixes based on the respective secret matrixes and the received secret matrixes. First party A sends to second party B<d>_AAnd<e>_Athe second party B sends to the first party A<d>_BAnd<e>_B. The first participant a calculates d ═<d>_A-<d>_B，e＝<e>_A-<e>_BThe first participant B calculates d ═<d>_A-<d>_B，e＝<e>_A-<e>_B。

And 4, respectively calculating respective data fragments by the participants. First Party A computation<Y>_A＝<z>_A+_A*e+d*<v>_A+ d × e, second participant B calculation<Y>_B＝<z>_B+_B*e+d*<v>_B. And, in theory, satisfy<Y>_A+<Y>_B＝x*m。

In this way, the syndicated data sharding may be based on the first party A<X>_APartial data and predictive value data of (1), joint data fragmentation of the second participant B<X>_BThe SMM is adopted to determine the partitions of the product between the parts in the joint data and the predicted value data, which correspond to the multiple participants respectively.

In this embodiment, an implementation manner is provided for determining hessian matrix fragmentation by using blocks in a data vertical segmentation scene, where a part of the blocks require data interaction among multiple participants, and a part of the blocks are calculated locally, so that a large amount of interaction data can be saved, and the overall processing flow is improved.

In the data level segmentation application scenario, the business data of any one participant includes feature values of all feature items of a part of objects. Referring to table 2, where the service data of any one participant includes feature values of D feature items of a partial object, the service data of multiple participants may be combined based on vertical concatenation to obtain joint data. Table 2 is merely an example of data distribution in which rows represent objects and columns represent feature items. In other embodiments, rows may be used to represent feature items and columns may be used to represent objects. In the following examples, the division of the hessian matrix in the data horizontal slicing scenario is illustrated by taking the data distribution shown in table 2 as an example.

In such an application scenario, when the hessian matrix expression is divided, it may be divided into a plurality of blocks respectively associated with data of a plurality of parties.

For example, taking two participants as an example, the business data of the first participant a and the second participant B are conceptually spliced into joint data as shown in table 4. It should be noted that the splicing here is a virtual splicing and is not executed in practice.

TABLE 4

Wherein, the service data of the first party A comprises n_AAnd the first party A is the selected party corresponding to the part of the object, and the business data of the first party A comprises the label data of the part of the object. The service data of the second party B contains n_BAnd the second party B is the selected party corresponding to the part of the object, and the business data of the second party B comprises the label data of the part of the object. If expressed in the form of a matrix, the service data of the first party a and the second party B can be assumed to be spliced into a joint data matrix X similar to table 4, where the total number of rows N is N_AAnd n_BThe total number of columns is D and X_(nA+nB)*D。

Here, the hessian matrix H is divided into 2 blocks, each of which is a sub-matrix. It may be noted here that the total number of rows of conjoined data is n_A+n_BX in the formula (5)_ikAnd x_irI takes values in the rows 1-N of the joint data, and k and r take values in the columns 1-D of the joint data. This is the theoretical case.

For elements in the Hessian matrix

As calculated by the first party a,

calculated by the second party B and stored locally, respectively, and H_kr＝<H_kr>_A+<H_kr>_B. In the same way as above, the first and second,<H>_Aand<H>_Bmay be calculated locally by the respective intended participant.

Specifically, according to the formula (5), the upper partition [ A ] in the formula (10) is determined]The elements in the matrix, the required data, are all in the first party a. I.e. when i is in the number of rows 1 to n_AWhen taking the value in (1), x in the formula (5)_ikAnd x_irAll values are taken in the service data of the first participant A, and the data pi (X) is predicted_i) Also owned by the first party a. Thus, the upper partition [ A ] can be determined]May be independently calculated by the first party a. Upper block [ A ]]Is the chunk associated with the first party a's data.

From equation (5), the lower partition [ B ] in equation (10) is determined]The elements in the matrix, the required data, are all in the second participant B. I.e. when i is in the number n of rows_A+1～n_A+n_BWhen taking the value in (1), x in the formula (5)_ikAnd x_irAll values are taken in the service data of the second party B, and the corresponding predicted value data pi (X)_i) Also owned by the second party B. Thus, the lower partition [ B ] can be determined]Can be independently calculated by the second party B. Lower block [ B ]]That is with the second party BData associated chunking.

Thus, the hessian matrix H is divided into 2 blocks, the hessian matrix of the first party A being correspondingly sliced<H>_AAnd Hessian matrix sharding of the second participant B<H>_BAre divided into 2 sub-blocks, and the corresponding sub-blocks in the two hessian matrix slices constitute the sub-blocks in the hessian matrix under the assumed reconstruction.

Wherein the Hessian matrix of the first party A is sliced<H>_AData slicing of middle and upper blocks<[A]>_AFederated data shards owned by the first participant A<X>_AAnd predicted value data pi (X)_i) The data determination in (1); data slicing of lower partitions<[B]>_AThe value in (1) can be obtained by filling with a value of 0.

Hessian matrix sharding of the second participant B<H>_BData slicing of middle and lower blocks<[B]>_BFederated data sharding owned by second participant B<X>_BAnd predicted value data pi (X)_i) The data determination in (1); data slicing of upper slicing<[A]>_BThe value in (1) can be obtained by filling with a value of 0.

In summary, for the first participant a, the first participant device determines the data segment of the segment associated with the data of the first participant a by using the joint data segment and the predicted value data of the first participant a, fills the 0 value into the segment associated with the data of the other participants to obtain the data segment of the segment, and splices the data segments of the multiple segments to obtain the hessian matrix segment of the first participant a.

For a first participant B, a second participant device determines a data fragment of a block associated with the data of the second participant B by using a joint data fragment and predicted value data of the second participant B, fills a value 0 into the block associated with the data of other participants to obtain the data fragment of the block, and splices a plurality of data fragments of the block to obtain a Hessian matrix fragment of the second participant B.

In this embodiment, an implementation manner is provided for dividing the hessian matrix in a scene of data horizontal segmentation and determining hessian matrix fragments in a block form, where hessian matrix fragments can be respectively determined without mutual interaction among multiple participants, so that the amount of interaction data among the participants is saved in the overall process.

Returning to step 2, in the intermediate matrix slicing based on multiple participants<H>Computing the inverse slices of the intermediate matrix corresponding to each of the plurality of participants<H^-1>Obtaining the covariance matrix patches corresponding to the multiple participants respectively<Cov>The steps of (a) may be performed based on the partitioning of an intermediate Matrix of multiple participants using a secret Sharing Matrix Inversion (SMI) algorithm<H>Obtaining covariance matrix fragments corresponding to multiple participants respectively through iterative computation<Cov>. Wherein the covariance matrix is equal to the inverse of the intermediate matrix, Cov ═ H^-1。

For example, the intermediate matrix shard of the first participant a is known<H>_AAnd the intermediate matrix shard of the second participant B<H>_BTo calculate<H^-1>_AAnd<H^-1>_Bas a result, an iterative calculation can be performed using SMI. Wherein the intermediate matrix is sliced<H>_AAnd<H>_Bobtaining an intermediate matrix H, H upon hypothetical reconstruction^-1Is the inverse of H, but the first party a and the second party B do not reconstruct H. Therefore, it is necessary to know<H>_AAnd<H>_Band without reconstructing it, causes the first party a and the second party B to determine separately<H^-1>_AAnd<H^-1>_B. The intermediate matrix H is not reconstructed, and the leakage of private data can be avoided.

The following describes a process of iteratively calculating covariance matrix shards using SMI by taking two participants as an example. It is known that the first participant a owns the intermediate matrix shard<H>_AThe second participant B has an intermediate matrix slice<H>_B，H＝<H>_A+<H>_B. It is desired that: so that the first party a gets<H^-1>_AThe second party B gets<H^-1>_B，H^-1＝<H^-1>_A+<H^-1>_B。

During initialization, the first party A and the second party B respectively obtain L through joint calculation₀，

L₀＝tr(H)^-1＝[tr(<H>_A)+tr(<H>_B)]^-1

Where tr is the trace of the matrix.

In any one iteration calculation, SMM is utilized among a plurality of participants, and the calculation is respectively carried out according to the following iteration formula

L_k+1＝L_k(2*I-H L_k)＝(<L_k>_A+<L_k>_B)[2*I-(<H>_A+<H>_B)(<L_k>_A+<L_k>_B)]

Wherein I is an identity matrix. In one iteration, 2 SMMs need to be performed. The number of iteration rounds may be preset, and may be set to 20 to 32 times, for example, where k is the number of iterations.

Returning to step S230, when determining the effective value of the feature item corresponding to the model parameter in improving the effect of the service prediction model based on the model parameter shards and the covariance matrix shards of the multiple participants, the method may use the formula (2) of Wald test

Or adopt the formula (8)

And calculating a significance test value (or a significance level value) of the kth model parameter, and determining an effective value of the feature item corresponding to the model parameter on improving the effect of the business prediction model based on the significance test value and an initial hypothesis.

In the determination of Wald_kOr z_kWhen the molecular moiety is

Model parameters, denominator part

The standard deviation is the standard deviation of the model parameters, which can be obtained according to the square root of the variance of the model parameters, and the diagonal elements of the covariance matrix are the variances of the corresponding model parameters. Next, the effective value of the feature item corresponding to the model parameter may be determined based on the model parameter shards and the covariance matrix shards of the multiple participants by using a secret sharing root Number inverse (SNSI) algorithm. Specifically, the following steps 1b and 2b may be included.

And step 1b, the plurality of participant devices take diagonal elements in the covariance matrix fragments of the plurality of participants as variance fragments respectively corresponding to the plurality of model parameters. The diagonal element here may refer to the main diagonal element. In the covariance matrix, the main diagonal element is the variance of the feature term. Correspondingly, in covariance matrix slicing, the main diagonal elements are variance slices of feature items.

And 2b, the first participant equipment determines the significance test value fragment of the first participant A aiming at any model parameter by utilizing an SNSI algorithm and a significance test method and jointly performing safe root number inverse operation through interaction among the plurality of participant equipment on the basis of the corresponding model parameter fragment of the first participant A and the corresponding variance fragments of the plurality of participants. And determining the effective value of the feature item corresponding to the model parameter based on the significance test value shards of the multiple participants for the model parameter.

Similarly, the second participant device determines the significance check value fragment of the model parameter of the second participant B by performing the security root number inversion operation in a combined manner through interaction among the plurality of participant devices based on the corresponding model parameter fragment of the second participant B and the corresponding variance fragments of the plurality of participants by using the SNSI algorithm and the significance check value for any model parameter.

In one embodiment, the saliency check value slices of multiple participants may be sent to a certain participant device or a third-party device, the saliency check value is reconstructed by the participant device or the third-party device, and based on the saliency check value, the effective value of the corresponding feature item may be determined according to a predetermined transformation manner. In another embodiment, the significance check value slices of the multiple participants can also be directly used as valid value slices, and the multiple significance check value slices can be reconstructed to obtain valid values.

The significance check value can be calculated based on the above formula (2) or formula (8), or p _ value formula, and the obtained significance check value fragment can be, but is not limited to, Wald_kValue sharding, z_kValue sharding or p-value sharding.

The model parameter slices of multiple participants derive the model parameters when a reconstruction is assumed. For example, for any one model parameter β₁Model parameter slicing of the first participant<β₁>_AAnd the second participant B's model parameter sharding<β₁>_BObtaining the model parameter beta when assuming reconstruction₁. The model parameter slices are not actually reconstructed, and the description is only for illustrating the relationship between the model parameter slices and the model parameters.

It can be seen that, in the embodiment, when the significance test value is calculated, diagonal elements in covariance matrix fragments of multiple participants are used, and data in the covariance matrix is not reconstructed, so that security of private data in the covariance matrix can be well protected.

In step 2b, the following description will be made with respect to any one model parameter β_kThe first participant device is sharded based on the model parameters of the first participant a through interaction between the plurality of participant devices using the SNSI algorithm and the significance test method<β_k>_AAnd the variance fragments of a plurality of participants jointly perform the inverse operation of the safety root number to determine the model parameter beta of the first participant A_kThe significance check value slicing step. In the same way can obtainDetermining, by a two-participant device, a model parameter β for a second participant B_kThe significance check value of (1).

In the significance test method (8)

For example. For the first party, this equation (8) can be modified to

Wherein the content of the first and second substances,<z_k>_Amodel parameter β for the first participant A_kThe significance check value of (a) is sliced, the molecular part is the model parameter slice of the first participant a, in the denominator part,<Cov_kk>_Amodel parameters β owned by the first participant a_kThe corresponding variance partition, which is also the kth element (diagonal element) in the covariance matrix partition of the first participant a,<Cov_kk>_Bmodel parameter β owned by the second participant B_kThe corresponding variance partition is also the kth element in the covariance matrix partition of the second participant B.

The numerator portion is owned by the first party a and the denominator portion is owned by both the first party a and the second party B. Therefore, the present problem is focused on how to calculate the root inverse in expression (11). In this embodiment, the SNSI algorithm is used to determine the model parameter β of the first party a_kWith the model parameter β of the second participant B_kIs inverse to the root of the sum of the variance patches based on the inverse of the root and the model parameter patches of the first participant a<β_k>_AMay yield the first party a for the model parameter β_kThe significance check value of (1). Wherein the root number in formula (11) is inverted as follows

Next, go through step 1c &3c specifies how to calculate the root number inverse using the SNSI algorithm (<Cov_kk>_A+<Cov_kk>_B)^-1/2. For convenience of description, let n_a＝<Cov_kk>_A，n_b＝<Cov_kk>_BLet n denote the model parameter β_kI.e. n ═ n_a+n_bThe expectation is calculated such that the first participant device gets c_aThe second participant device gets c_bAnd c is and_a+c_b＝(n_a+n_b)^-1/2＝n^-1/2。

and step 1c, the first party equipment and the second party equipment convert the addition fragmentation into the multiplication fragmentation through interaction.

The first participant device locally generates a random number x_aThen make a request for

The first party device and the second party device jointly calculate through secret sharing matrix multiplication

Respectively obtain x_ba2，x_bb；

First participant device calculates x_ba＝x_ba1+x_ba2And x is_baSending to the second participant device (x)_ba1，x_ba2Not separately transmittable);

the second participant device calculates x_b＝x_ba+x_bbWhere n is x_a×x_bRealize the addition slicing n ═ n_a+n_bConversion into multiplication shards n ═ x_a×x_b. At this point, the first party A owns x_aThe second party has x_b。

And 2c, respectively carrying out initialization of iterative estimation values locally by the two participant devices.

Taking the first participant a as an example, the first participant device will float a 64-bit floating point number x_aIs read as a 64-bit integer and shifted to the right by one bit (divided by 2 and rounded down), denoted as int_a(ii) a Calculate 0x5fe6eb50c7b537a9-int_aAnd reading according to the storage mode of 64-bit floating point number and recording as y_a. Thus, i.e. x_aInitialized to y_a。

Similarly, the second participant device performs the above initialization, and x may be set_bInitialized to y_b. At this point, the first party A owns y_aThe second party has y_b。

Step 3c, two parties jointly use Newton method to iteratively calculate n^-1/2。

The initial value of the iteration is Y₀＝Y_0a×Y_0b＝y_a×y_bOwned by two participants respectively. The iterative formula is as follows

In the iteration process, two times of secret sharing matrix multiplication are used, 1 time of iteration is performed in total, and the floating point number c is obtained by the first party A and the second party B respectively_aAnd c_b。

The implementation process of step 2b may also be implemented in other manners. For example, firstly, the variance fragment of the first party a and the variance fragment of the second party B are subjected to security standardization, then an iteration initial value is obtained through linear approximation calculation, and finally iteration is performed based on the Goldschmidt algorithm. In this embodiment, the secret sharing matrix multiplication operation may be performed based on the variance shard of the first party a and the variance shard of the second party B, and then other operations may be performed.

In this specification, the first party, the first feature item "first", and the second feature item "second" are used for convenience of distinction and description only, and do not have any limiting meaning.

In this specification, the number of the plurality of participants may be 2, 3 or more, each participant performs various operations through a corresponding participant device, and the participant device may be implemented by any device, platform, device cluster, etc. having computing and processing capabilities.

In the embodiments of the present specification, two participants are exemplified in more detail. For example, in the description of the embodiments of algorithms such as secret sharing matrix multiplication, secret sharing root number inversion, secret sharing matrix inversion, and the like for multi-party security calculation, the implementation of two parties can be easily extended to a more multi-party participating scenario, and the detailed process is not repeated.

The foregoing describes certain embodiments of the present specification, and other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily have to be in the particular order shown or in sequential order to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Fig. 4 is a schematic block diagram of an apparatus for determining a valid value of a service data characteristic of control traffic according to an embodiment. The business data are distributed in a plurality of participants, the business data of each of the plurality of participants form joint data under the condition of splicing, and the joint data comprise characteristic values of a plurality of objects for a plurality of characteristic items. The apparatus 400 is deployed in any first participant device, which may be implemented by any apparatus, device, platform, cluster of devices, etc. having computing and processing capabilities. This embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2. The apparatus 400 comprises:

an obtaining module 410, configured to obtain joint data fragments of the first participant, obtain predicted value fragments corresponding to the plurality of objects, respectively, and model parameter fragments corresponding to the plurality of feature items, respectively; the predicted value fragment and the model parameter fragment are obtained based on a trained service prediction model;

a reconstruction module 420 configured to reconstruct complete predictor data in a selected participant device using a plurality of predictor slices through transmission of predictor slices between the participant device and the selected participant device;

an interaction module 430, configured to determine, by using multi-party security computation, through interaction between multiple participant devices, relevance data segments corresponding to multiple participants respectively based on joint data segments of the multiple participants and predicted value data of a selected participant, where the relevance data segments include relevance data between multiple feature items;

the checking module 440 is configured to determine, by using a significance checking method, effective values of feature items corresponding to model parameters in improving the effect of the business prediction model based on the model parameter segments of the multiple participants and corresponding data in the relevance data segments through secure interaction among the multiple participant devices.

In one embodiment, the obtaining module 410, when obtaining the federated data segment of the first participant, includes:

In an embodiment, when obtaining the predicted value slices corresponding to the plurality of objects and the model parameter slices corresponding to the plurality of feature items, the obtaining module 410 includes:

In one embodiment, the selected participant comprises a participant having tag data; the reconstruction module 420 is specifically configured to:

In one embodiment, the correlation data comprises covariance matrix data, and the correlation data slice comprises a covariance matrix slice; the interaction module 430 includes:

the determining submodule 431 is configured to determine intermediate matrix fragments corresponding to multiple participants respectively based on joint data fragments of the multiple participants, predicted value data of the selected participants and a functional relation in the service prediction model;

the calculating submodule 432 is configured to calculate inverse partitions of intermediate matrices corresponding to the multiple participants, respectively, based on the intermediate matrix partitions of the multiple participants, and obtain covariance matrix partitions corresponding to the multiple participants, respectively.

In one embodiment, the determining submodule 431 is specifically configured to:

the determining submodule 431, when dividing the hessian matrix expression obtained based on the functional relation in the service prediction model into a plurality of blocks, includes:

the determining submodule 431, when enabling the multiple parties to respectively determine the data shards of the multiple chunks, and respectively determine the corresponding hessian matrix shards based on the data shards of the chunks, includes:

the determining submodule 431, when dividing the hessian matrix expression obtained based on the functional relation in the service prediction model into a plurality of blocks, includes: dividing the hessian matrix expression into a plurality of partitions respectively associated with data of a plurality of parties;

In one embodiment, the calculation submodule 432 is specifically configured to:

In one embodiment, the verification module 440 is specifically configured to:

In one embodiment, the apparatus 400 further comprises a determining module (not shown in the figures) configured to:

In one embodiment, the apparatus 400 further comprises a removal module (not shown) configured to:

The above device embodiments correspond to the method embodiments, and specific descriptions may refer to descriptions of the method embodiments, which are not repeated herein. The device embodiment is obtained based on the corresponding method embodiment, has the same technical effect as the corresponding method embodiment, and for the specific description, reference may be made to the corresponding method embodiment.

Embodiments of the present specification also provide a computer-readable storage medium having a computer program stored thereon, which, when executed in a computer, causes the computer to perform the method of any one of fig. 1 to 3.

The embodiment of the present specification further provides a computing device, which includes a memory and a processor, where the memory stores executable code, and the processor executes the executable code to implement the method described in any one of fig. 1 to 3.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the storage medium and the computing device embodiments, since they are substantially similar to the method embodiments, they are described relatively simply, and reference may be made to some descriptions of the method embodiments for relevant points.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in connection with the embodiments of the invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments further describe the objects, technical solutions and advantages of the embodiments of the present invention in detail. It should be understood that the above description is only exemplary of the embodiments of the present invention, and is not intended to limit the scope of the present invention, and any modification, equivalent replacement, or improvement made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A method for determining effective values of characteristics of service data for controlling traffic, wherein the service data are distributed in a plurality of participants, the service data of each of the participants form joint data under the condition of splicing, and the joint data comprise the characteristic values of a plurality of objects for a plurality of characteristic items; the method is performed by any first participant device, comprising:

2. The method of claim 1, wherein the step of obtaining the federated data segment of the first party comprises:

3. The method according to claim 1, wherein the service prediction model is obtained by performing security association training based on respective association data segments of a plurality of participants; the business prediction model is used for conducting business prediction on the object.

4. The method according to claim 3, wherein the step of obtaining the predicted value slices corresponding to the plurality of objects and the model parameter slices corresponding to the plurality of feature items comprises:

5. The method of claim 1, the selected party comprising a party possessing tag data; the step of reconstructing complete predictive value data comprises:

6. The method of claim 1, the correlation data comprising covariance matrix data, the correlation data patches comprising covariance matrix patches;

7. The method according to claim 6, wherein the step of determining the intermediate matrix slices respectively corresponding to the plurality of participants comprises:

8. The method of claim 7, wherein the business data of any one participant comprises feature values of part of feature items of all objects;

9. The method of claim 7, wherein the business data of any one participant comprises feature values of all feature items of a part of the objects;

10. The method of claim 6, wherein the step of calculating inverse partitions of the intermediate matrix corresponding to the participants respectively based on the partitions of the intermediate matrix corresponding to the participants to obtain covariance matrix partitions corresponding to the participants respectively comprises:

11. The method according to claim 6, wherein the step of determining the effective value of the feature item corresponding to the model parameter in improving the effect of the business prediction model comprises:

12. The method of claim 11, further comprising:

13. The method of claim 1, further comprising:

14. The method of claim 1, the object comprising one of a user, a good, an event; the characteristic items include at least one of: basic attribute information, incidence relation information, interaction information and historical behavior information; the business prediction model is used for conducting business prediction on the object.

15. The method of claim 1, wherein the traffic prediction model is derived based on a logistic regression model.

16. An apparatus for determining effective values of characteristics of service data for controlling traffic, wherein the service data is distributed among a plurality of participants, the service data of each of the plurality of participants forms joint data under the condition of splicing, and the joint data comprises the characteristic values of a plurality of objects for a plurality of characteristic items; the apparatus is deployed in any first participant device, and comprises:

17. The apparatus of claim 16, the obtaining module, when obtaining the federated data segment of the first party, comprises:

18. The apparatus of claim 16, the selected party comprising a party possessing tag data; the reconstruction module is specifically configured to:

19. The apparatus of claim 16, the correlation data comprising covariance matrix data, the correlation data tile comprising a covariance matrix tile;

the interaction module comprises:

20. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-15.

21. A computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any of claims 1-15.