WO2022227644A1 - Procédé et appareil de traitement de données, et dispositif, support de stockage et produit-programme - Google Patents

Procédé et appareil de traitement de données, et dispositif, support de stockage et produit-programme Download PDF

Info

Publication number
WO2022227644A1
WO2022227644A1 PCT/CN2021/140955 CN2021140955W WO2022227644A1 WO 2022227644 A1 WO2022227644 A1 WO 2022227644A1 CN 2021140955 W CN2021140955 W CN 2021140955W WO 2022227644 A1 WO2022227644 A1 WO 2022227644A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
data
matrix
sample
participant
Prior art date
Application number
PCT/CN2021/140955
Other languages
English (en)
Chinese (zh)
Inventor
魏文斌
范涛
陈天健
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2022227644A1 publication Critical patent/WO2022227644A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Definitions

  • the present application relates to the technical field of artificial intelligence, and relates to, but is not limited to, a data processing method, apparatus, device, storage medium and program product.
  • Machine learning is a science that studies how to use computers to simulate or realize human learning activities. It is one of the most intelligent and cutting-edge research fields in artificial intelligence.
  • the research of machine learning is mainly divided into two research directions: the first is the research of traditional machine learning, which mainly studies the learning mechanism, focusing on the exploration of the learning mechanism of simulated human; the second is the research of machine learning in the big data environment.
  • Research, this type of research mainly studies how to effectively use information, focusing on obtaining hidden, effective and understandable knowledge from huge amounts of data.
  • Federated learning technology is an emerging privacy protection technology that can effectively combine data from all parties for model training without the need for local data.
  • Many business problems in the field of big data can be solved by corresponding machine learning models.
  • Removing collinear data is the key to training a good model.
  • related technologies cannot quantify the collinearity of multi-party data in federated learning, and cannot efficiently screen and eliminate collinear training data, resulting in low accuracy and poor stability of the trained model. .
  • the embodiments of the present application provide a data processing method, device, device, computer-readable storage medium, and computer program product, which can eliminate collinear data in linear federated modeling, improve the accuracy and stability of the federated model, and improve the The modeling effect of the model.
  • the embodiment of the present application provides a data processing method, which is applied to the first participant of federated learning, and the method includes:
  • a virtual feature correlation matrix is constructed, and the security computing model is composed of the first participant and other participants in federated learning.
  • party is pre-trained based on secure multi-party computation;
  • the feature data of the target feature and the feature data of at least one feature among other features, and the other features include features other than the target feature held by the first participant and the other features. characteristics held by the party.
  • An embodiment of the present application provides a data processing apparatus, which is applied to a first participant of federated learning, and the apparatus includes:
  • the building module is configured to construct a virtual feature correlation matrix based on the first sample feature data held by the first participant and a pre-trained security computing model, and the security computing model is composed of the first participant and the pre-trained security computing model.
  • Other participants in federated learning are pre-trained based on secure multi-party computation;
  • a first determining module configured to determine, based on the feature correlation matrix, a collinear quantization factor of each feature corresponding to the first sample feature data
  • a second determining module configured to determine a target feature from each feature corresponding to the first sample feature data based on the collinearity quantization factor
  • a deletion module configured to delete the feature data of the target feature from the first sample feature data, to obtain the first training data for the joint training of the first participant and the other participants;
  • the feature data of the target feature and the feature data of at least one feature among other features, and the other features include features other than the target feature held by the first participant and the other features. characteristics held by the party.
  • An embodiment of the present application provides a data processing device, and the device includes:
  • a memory configured to store executable instructions
  • the processor when configured to execute the executable instructions stored in the memory, implements the method provided by the embodiment of the present application.
  • Embodiments of the present application provide a computer-readable storage medium, where executable instructions are stored on the computer-readable storage medium, and are configured to cause a processor to execute the method to implement the method provided by the embodiments of the present application.
  • the embodiments of the present application provide a computer program product, including a computer program, which implements the methods provided by the embodiments of the present application when the computer program is executed by a processor.
  • the embodiments of the present application have the following beneficial effects: in the data processing method provided by the embodiments of the present application, during data processing, first, the first participant and other participants of the federated learning are pre-trained to obtain a secure computing model based on secure multi-party computation; Then the first participant obtains the first sample feature data held by the first participant, and builds a virtual feature correlation matrix based on the first sample feature data and the pre-trained security computing model; and then determines the first sample based on the feature correlation matrix.
  • the feature data has a linear relationship, and other features here include not only the features held by the first participant except the target feature, but also all features held by other participants; after determining the target feature, from the first sample feature data
  • the feature data of the target feature is deleted to obtain the first training data for the joint training of the first participant and other participants. In this way, under the premise of protecting data privacy, the data with collinearity in the feature data held by each participant can be screened and eliminated, and the training data without linear relationship can be obtained.
  • Using training data without linear relationship for joint training can improve the accuracy and stability of the federated model and improve the modeling effect of the federated model.
  • FIG. 1 is a schematic diagram of a network architecture of a data processing method provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of the composition and structure of a data processing device provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a realization flow of the data processing method provided by the embodiment of the present application.
  • FIG. 4 is a schematic flowchart of another implementation of the data processing method provided by the embodiment of the present application.
  • FIG. 5 is a schematic flowchart of the calculation flow of the variance inflation factor under the vertical federation situation provided by the embodiment of the present application;
  • FIG. 6 is a schematic flowchart of a calculation flow of a determinant of a correlation matrix provided by an embodiment of the present application.
  • first ⁇ second ⁇ third is only used to distinguish similar objects, and does not represent a specific ordering of objects. It is understood that “first ⁇ second ⁇ third” Where permitted, the specific order or sequence may be interchanged to enable the embodiments of the application described herein to be practiced in sequences other than those illustrated or described herein.
  • VIF Variance Inflation Factor
  • Variance Inflation Factor also known as Variance Inflation Factor
  • VIF Variance Inflation Factor
  • Homomorphic Encryption is a cryptographic technique based on the computational complexity theory of mathematical problems. Processing the homomorphically encrypted data to obtain an output, decrypting this output yields the same output as processing the unencrypted raw data in the same way.
  • the following describes an exemplary application of the apparatus for implementing the embodiment of the present application, and the apparatus provided by the embodiment of the present application may be implemented as a terminal device.
  • exemplary applications covering terminal devices when the apparatus is implemented as a terminal device will be described.
  • FIG. 1 is a schematic diagram of a network architecture of a data processing method provided by an embodiment of the present application.
  • the network architecture at least includes a first participant 100 , a second participant 200 , and a network 300 .
  • the first participant 100 and the second participant 200 may be the participants in the vertical federated learning that jointly train the machine learning model.
  • the first participant 100 and the second participant 200 may be clients, for example, various banks or hospitals and other participant devices that store user characteristic data, and the clients may be laptops, tablet computers, desktop computers, special training equipment and other devices with model training capabilities.
  • the first participant 100 is connected to the second participant 200 through a network 300.
  • the network 300 may be a wide area network or a local area network, or a combination of the two, using wireless or wired links to realize data transmission.
  • the first participant 100 first obtains first sample feature data from its own data, and processes the first sample feature data to obtain processed first sample feature data.
  • the second participant 200 obtains second sample feature data from its own data, and processes the second sample feature data to obtain processed second sample feature data.
  • the first sample feature data and the second sample feature data have the same identifier, that is, the first sample feature data and the second sample feature data are the same identifiers held by the first participant 100 and the second participant 200 respectively. Data of different characteristics of batch samples.
  • the first participant 100 and the second participant 200 use the processed first sample feature data and the processed second sample feature data to determine the first matrix E based on secure multi-party computation, the first participant 100 and the second participant Each of the squares 200 holds the first matrix E.
  • the first participant 100 can only obtain the first matrix E, and cannot know the processed second sample feature data held by the second participant 200; similarly, the second participant 200 only has To obtain the first matrix E, it is impossible to obtain the processed first sample feature data held by the first participant 100 .
  • the first participant 100 constructs a virtual feature correlation matrix according to the first matrix E and the processed first sample feature data; then calculates the determinant of the feature correlation matrix, and the feature correlation matrix.
  • the determinant of each cosubformula corresponding to the matrix based on the determinant of the feature correlation matrix and the determinant of each cosubformula, determine the collinearity quantization factor of each feature corresponding to the first sample feature data, and the collinearity quantization factor can be the variance
  • the expansion coefficient is used to quantify the collinearity of each feature with all other features. All other features here include not only other features held by the first participant 100 except the quantified features this time, but also the second participant 200. have all the characteristics.
  • the first participant 100 determines which features have collinearity between the feature data according to the collinearity quantification factor of each feature, determines these features as target features, and finally deletes the feature data of the target feature from the first sample feature data , to obtain the first training data for the joint training of the first participant 100 and the second participant 200 .
  • the second participant 200 obtains second training data for joint training. Therefore, during joint training, the first participant 100 and the second participant 200 use the first training data and the second training data without collinearity to perform joint training, so that a federated model with high accuracy and good stability can be obtained.
  • the data with collinearity in the feature data held by each participant can be screened and eliminated, and the training data without linear relationship can be obtained.
  • each participant uses the training data without linear relationship for joint training, which can improve the accuracy and stability of the federated model and improve the modeling effect of the federated model.
  • the apparatuses provided in the embodiments of the present application may be implemented in a manner of hardware or a combination of software and hardware, and various exemplary implementations of the apparatuses provided in the embodiments of the present application are described below.
  • the data processing device 10 is shown by taking the device applied to the first participant of federated learning as an example, and other exemplary structures of the data processing device 10 can be foreseen, Therefore, the structure described here should not be regarded as a limitation. For example, some components described below may be omitted, or components not described below may be added to meet the special requirements of certain applications.
  • the data processing device 10 shown in FIG. 2 includes: at least one processor 110, a memory 140, at least one network interface 120 and a user interface 130. Each component in data processing device 10 is coupled together by bus system 150 . It can be understood that the bus system 150 is used to implement the connection communication between these components. In addition to the data bus, the bus system 150 also includes a power bus, a control bus, and a status signal bus. However, for clarity of illustration, the various buses are labeled as bus system 150 in FIG. 2 .
  • User interface 130 may include a display, keyboard, mouse, touch pad, touch screen, and the like.
  • Memory 140 may be volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory.
  • the non-volatile memory may be a read-only memory (ROM, Read Only Memory).
  • the volatile memory may be random access memory (RAM, Random Access Memory).
  • RAM Random Access Memory
  • the memory 140 in the embodiment of the present application can store data to support the operation of the data processing device 10 .
  • Examples of such data include: any computer programs used to operate on data processing device 10, such as operating systems and applications.
  • the operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and processing hardware-based tasks.
  • Applications can contain various applications.
  • the method provided by the embodiment of the present application may be directly embodied as a combination of software modules executed by the processor 110, and the software module may be located in a storage medium, and the storage medium is located in the memory 140,
  • the processor 110 reads the executable instructions included in the software module in the memory 140, and combines necessary hardware (for example, including the processor 110 and other components connected to the bus 150) to complete the method provided by the embodiments of the present application.
  • the processor 110 may be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (DSP, Digital Signal Processor), or other programmable logic devices, discrete gates or transistor logic devices , discrete hardware components, etc., where a general-purpose processor may be a microprocessor or any conventional processor, or the like.
  • DSP Digital Signal Processor
  • FIG. 3 is a schematic diagram of an implementation flow of the data processing method provided by the embodiment of the present application, which is applied to the first participant of the network architecture shown in FIG. 1 , and will be described with reference to the steps shown in FIG. 3 .
  • Step S301 construct a virtual feature correlation matrix based on the first sample feature data held by the first participant and the pre-trained security computing model.
  • the feature data of different features may have a linear relationship. If a collinear data training model is used, the trained model has low accuracy and poor stability.
  • the first participant constructs a virtual feature correlation matrix, denoted as H, according to the first sample feature data and the pre-trained security computing model.
  • the secure computing model is pre-trained by the first participant and other participants of federated learning based on secure multi-party computation, and can perform matrix addition and subtraction and matrix multiplication under privacy.
  • the data processing method is described by taking two participants performing federated learning as an example. In practical applications, the data processing method can also be applied to three or more participants for federated learning.
  • the first participant and the second participant conduct training based on privacy protection technology to obtain a trained security computing model.
  • the trained security computing model is owned by each participant.
  • the training data needs to come from the same user to perform joint training, that is, the first sample feature used by the first participant for joint training
  • the data is the same as the second sample feature data used by the second participant for joint training, and their corresponding user identifiers (ID, Identity Document) are the same. Therefore, before step S301, the method further includes: the first participant acquires the first sample characteristic data.
  • acquiring the feature data of the first sample may be implemented as: acquiring a common training sample held by the first participant and other participants; screening out this training sample from the common training samples; The characteristic data of the sample is determined as the characteristic data of the first sample.
  • the common training samples are the common users held by each participant. Each participant obtains the identifier of the data held by each participant, and based on the encrypted calculation, the identifier of the data held by each participant is used to determine the common user of these participants. Then, the target users participating in the training are selected from these common users as the training samples, and the feature data of these target users is used as the first sample feature data.
  • Step S302 based on the feature correlation matrix, determine the collinearity quantization factor of each feature corresponding to the feature data of the first sample.
  • methods for determining the quantitative factor of collinearity among the feature data of multiple features include variance inflation factor method, eigenroot analysis method and condition number method.
  • an intuitive judgment method that can qualitatively analyze the degree of collinearity between feature data of multiple features. The intuitive judgment method is generally used for preliminary judgment and cannot be quantitatively analyzed.
  • the first sample feature data held by the first participant and the second sample feature data held by the second participant correspond to the data in different columns in the feature correlation matrix. Whether there is collinearity between the column data, it can be determined whether there is collinearity between the feature data of each feature in the first sample feature and the feature data of other features.
  • the existence of collinearity between the feature data of different features includes the following possibilities: the feature data of one feature is a multiple of the feature data of another feature; the feature data of one feature is equal to the feature data of another feature plus a constant term; There is a feature whose feature data is equal to the sum of the feature data of the other two features.
  • the collinearity quantization factor of each feature corresponding to the first sample feature data is determined, and which features can be determined according to the collinearity quantization factor of each feature corresponding to the first sample feature data There is feature data that is collinear with it.
  • Step S303 based on the collinearity quantization factor, determine the target feature from the features corresponding to the first sample feature data.
  • the target features with collinearity are determined.
  • the variance inflation factor in the absence of multicollinearity, the variance inflation factor is close to 1, and the stronger the multicollinearity, the larger the variance inflation factor. In fact, there is always more or less multicollinearity between the data, so it is unrealistic to use the variance expansion factor equal to 1 as the criterion for evaluating collinearity.
  • a boundary value can be preset according to the actual application scenario, In this embodiment of the present application, the preset boundary value may be 10. Determine whether the variance expansion factor of each feature is greater than 10. When the variance expansion factor of a feature is greater than 10, it is considered that the feature data of this feature and the feature data of other features have strong collinearity, and the feature is determined as the target. feature.
  • Step S304 delete the feature data of the target feature from the first sample feature data to obtain the first training data for the joint training of the first participant and other participants.
  • the other features here include features held by the first participant other than the target feature and features held by other participants.
  • the variance expansion factor of each feature in the first sample feature data of the first participant may be calculated based on the feature correlation matrix, and the target feature with the variance expansion factor greater than the preset boundary value is determined, and then the first sample The feature data of the target feature in the feature data is deleted, and the feature data without collinearity is obtained as the first training data. Therefore, when performing joint training, the first participant uses the first training data without collinearity to perform joint training, and can obtain a federated model with high accuracy and good stability.
  • the data processing method provided by the embodiment of the present application is applied to the first participant of federated learning, and the data processing method includes: based on the first sample feature data held by the first participant and the pre-trained security computing model, constructing A virtual feature correlation matrix, the secure computing model is pre-trained by the first participant and other participants in federated learning based on secure multi-party computation; based on the feature correlation matrix, determine the collinearity quantification of each feature corresponding to the feature data of the first sample factor; based on the collinearity quantification factor, the target feature is determined from the features corresponding to the first sample feature data; the feature data of the target feature is deleted from the first sample feature data, and the first participant and other participants are obtained to combine The first training data for training; wherein, the feature data of the target feature has a linear relationship with the feature data of at least one of the other features, and the other features include the features held by the first participant other than the target feature and the features held by other participants.
  • the data with collinearity in the feature data held by each participant can be screened and eliminated, and the training data without linear relationship can be obtained.
  • the participants use the training data without linear relationship for joint training, which can improve the accuracy and stability of the federated model and improve the modeling effect of the federated model.
  • the first participant before step S301 in the embodiment shown in FIG. 3 , the first participant first obtains the first sample feature data from the feature data held by itself.
  • the training data needs to come from the same user for joint training.
  • the second sample feature data used by the party for joint training, and their corresponding IDs are the same.
  • the first participant and the second participant each hold feature data of different features of a batch of users, for example, the first participant has feature data of a batch of users' birthday AA, age BB, weight CC, and deposit DD. , the second party has a batch of feature data of users' age BB, consumption ability EE, and hobby FF.
  • the database table of the first participant and the database table of the second participant are shown in Table 1 and Table 2 below:
  • the characteristic data held by the first participant is obtained Obtain characteristic data held by the second party according to the database table of the second party
  • the first participant determines the target users participating in this training, for example, randomly selects random users from the common users as the target users participating in this training, and then obtains these targets from the data stored by itself.
  • the feature data of the user is taken as the first sample feature data.
  • the first participant sends the IDs of the screened target users to the second participant, and the second participant obtains the characteristic data of these target users from the data stored by itself as the second sample characteristic data.
  • the second participant may also determine the target user participating in the training, and send the ID of the determined target user to the first participant.
  • the second participant selects the characteristic data of the target user S1 from the characteristic data held by the second participant to obtain the second sample characteristic data
  • the first participant obtains the first sample feature data
  • the second participant obtains the second sample feature data
  • step S301 based on the first sample feature data held by the first participant and the pre-trained security computing model, construct a virtual feature correlation matrix" can be performed through the following Steps to achieve:
  • Step S3011 based on the first sample feature data, determine the feature data of each feature corresponding to the first sample feature data and the number of samples corresponding to the first sample feature data.
  • the corresponding features of the first sample feature data are: AA, BB, CC, and DD
  • the feature data of feature AA is ⁇ a1, a3, a4 ⁇
  • the feature data of feature BB The feature data is ⁇ b1,b3,b4 ⁇
  • the feature data of feature CC is ⁇ c1,c3,c4 ⁇
  • the feature data of feature DD is ⁇ d1,d3,d4 ⁇
  • the samples corresponding to the first sample feature data are: ⁇ a1,b1,c1,d1 ⁇ , ⁇ a3,b3,c3,d3 ⁇ and ⁇ a4,b4,c4,d4 ⁇
  • Step S3012 Calculate the mean and standard deviation corresponding to the feature data of each feature respectively.
  • Step S3013 Determine the processed first sample feature data based on the feature data of each feature, the mean value corresponding to the feature data of each feature, the standard deviation corresponding to the feature data of each feature, and the number of samples;
  • the mean value corresponding to the feature data of each feature may be The standard deviation ⁇ x corresponding to the feature data of each feature and the number of samples n are used to update the feature data x of each feature to obtain each feature data x after processing.
  • Step S3014 Input the processed first sample characteristic data into the security calculation model to obtain a first matrix.
  • each participant performs training based on a secure multi-party computing protocol, such as the SPDZ protocol or MASCOT, to obtain a trained secure computing model.
  • the trained model is capable of matrix addition and subtraction and matrix multiplication under privacy.
  • Secure Multi-Party Computation (MPC, Secure Multi-Party Computation) research is mainly aimed at the problem of how to safely calculate a contract function in the absence of a trusted third party.
  • MPC Secure Multi-Party Computation
  • the m participants all hold the security computing model, and each participant inputs its own private input value to obtain the corresponding output value. For example, the i-th participant inputs x i and outputs y i , Among them, 1 ⁇ i ⁇ m.
  • the first matrix E is determined based on a secure multi-party computation protocol, each participant cannot know the private data of other participants, and the privacy and security of each participant's data can be ensured.
  • Step S3015 construct a virtual feature correlation matrix according to the processed first sample feature data and the first matrix.
  • Step S30151 Determine a first symmetric matrix according to the processed first sample feature data.
  • Symmetric Matrices refers to a square matrix with the main diagonal as the symmetry axis and each element corresponding to the same.
  • the first symmetric matrix F is constructed according to the processed first sample characteristic data C, and the transposed matrix of the processed first sample data C may be used. and the processed first sample data
  • Step S30152 generate an empty matrix whose number of rows and columns are equal to the number of columns of the obtained first matrix.
  • a virtual feature correlation matrix is constructed by the first participant, which is related to the feature data of the feature of the second participant.
  • the associated data is empty.
  • the first matrix E is a 4*3 matrix, and the number of columns is 3, so the generated empty matrix is a matrix with a dimension of 3*3, denoted as G'.
  • the second symmetric matrix G corresponding to the empty matrix G' is determined by the second participant according to the second sample feature data B.
  • Step S30153 construct a virtual feature correlation matrix according to the first symmetric matrix, the first matrix, the transposed matrix of the first matrix, and the empty matrix.
  • the first symmetric matrix F the first matrix E, the transposed matrix E T of the first matrix, and the empty matrix G', construct a virtual feature correlation matrix H, which is expressed as
  • the dimension of the first symmetric matrix F is 4*4
  • the dimension of the first matrix E is 4*3
  • the dimension of the transposed matrix E T of the first matrix is 3*4
  • the dimension of the empty matrix G' is 3* 3, so the dimension of the feature correlation matrix H is 7*7.
  • the first participant obtains a secure computing model based on the secure multi-party computing protocol and training with other participants, which cannot be quantified on the premise that the data held by each participant is not local and the data privacy is guaranteed.
  • the problem of collinearity of multi-party data in federated learning is converted into a determinant problem of block matrix that can be processed, so that the collinearity factor of multi-party data in federated learning can be quantified based on the feature correlation matrix, in order to eliminate the collinearity of training data,
  • the federated model provides the basis for obtaining accuracy and stability.
  • step S302 "based on the feature correlation matrix, determine the collinearity quantization factor of each feature corresponding to the first sample feature data" in the embodiment shown in FIG. 3 can be implemented by the following steps:
  • Step S3021 determine the determinant of the feature correlation matrix.
  • the determinant of the feature correlation matrix H is expressed as
  • determining the determinant of the feature correlation matrix can be implemented through steps S30211 to S30214:
  • Step S30211 generating a first random matrix whose determinant is a preset value and has the same dimension as the empty matrix.
  • the first random matrix generated by the first participant satisfies: a matrix whose determinant is a preset value and has the same dimension as the empty matrix G′.
  • the preset value can be set to 1.
  • the empty matrix G' is a matrix of 3*3, so the generated first random matrix is a matrix with a determinant of 1 and a dimension of 3*3, Denoted as R 1 .
  • Step S30212 the feature correlation matrix and the first random matrix are input into the security calculation model to obtain the second matrix.
  • the second matrix J is determined based on the secure multi-party computation protocol, and each participant uses the generated random matrix to confuse their data, so that each participant cannot know the private data of other participants, so it can Ensure the privacy and security of the data of all parties involved.
  • Step S30213 Calculate the determinants of the first symmetric matrix and the second matrix respectively.
  • step S30151 obtain the second matrix J according to step S30212, calculate the determinants of the first symmetric matrix F and the second matrix J respectively, and obtain the determinant
  • Step S30214 Multiply the determinant of the first symmetric matrix and the determinant of the second matrix to obtain the determinant of the feature correlation matrix.
  • each participant calculates the determinant of the virtual feature correlation matrix based on the security computing model, and the entire calculation process does not need to acquire the private data of other participants, which can ensure the data security of each participant.
  • Step S3022 delete the data in the ith row and the ith column of the feature correlation matrix, and obtain each cosubform corresponding to the feature correlation matrix.
  • i 1, 2, ..., m 1 , where m 1 is the number of features corresponding to the first sample feature data.
  • m 1 is the number of features corresponding to the first sample feature data.
  • Step S3023 determine the determinant of each cofactor.
  • the calculation method for determining the determinant of each cofactor may be the same as the method for determining the determinant of the feature correlation matrix in step S3021.
  • determining the determinant of the cofactor corresponding to the ith feature can be implemented as follows: generate a random matrix whose determinant is a preset value and has the same dimension as the empty matrix; input the ith cofactor and the random matrix into the security computing model , obtain the matrix corresponding to the ith cofactor; calculate the determinant of the first symmetric matrix and the matrix corresponding to the ith cofactor respectively; compare the determinant of the first symmetric matrix with the determinant of the matrix corresponding to the ith cofactor Multiply to get the determinant
  • Step S3024 based on the determinant of the feature correlation matrix and the determinant of each cosubformula, determine the collinear quantization factor of each feature corresponding to the feature data of the first sample.
  • the variance expansion factor is used as an example to determine the collinearity quantization factor
  • VIF i of the ith feature can be determined according to the determinant
  • the obtained VIF 1 is the collinear quantization factor of the feature AA
  • the obtained VIF 2 is the collinear quantization factor of the feature BB
  • the obtained VIF 3 is the feature
  • the obtained VIF 4 is the collinear quantization factor of the feature DD.
  • the first participant calculates the determinant of the feature correlation matrix and the determinant of the cofactor of each feature corresponding to the first sample feature data based on the secure multi-party computation protocol, so as to obtain the first sample
  • the variance expansion factor of each feature corresponding to this feature data provides a basis for eliminating collinear training data and obtaining a federated model with accuracy and stability.
  • step S303 in the above-mentioned embodiment shown in FIG. 3 "based on the collinearity quantization factor, determine the target feature from each feature corresponding to the first sample feature data", can be realized by the following steps:
  • Step S3031 judging whether the collinear quantization factor of each feature corresponding to the first sample feature data is greater than a preset boundary value.
  • the size of the collinear quantization factor of each feature corresponding to the first sample feature data and the preset boundary value is determined, that is, the sizes of VIF 1 , VIF 2 , VIF 3 and VIF 4 and the preset boundary value are determined.
  • step S3032 the feature with the collinear quantization factor greater than the preset boundary value is determined as the target feature.
  • the preset boundary value is taken as an example of 10. If it is determined that VIF 1 and VIF 2 are greater than 10, the feature AA corresponding to VIF 1 and the feature BB corresponding to VIF 2 are determined as target features.
  • the target feature AA of the first participant is birthday
  • the target feature BB of the first participant is age
  • the feature BB of the second participant is age
  • step S304 is executed to delete the feature data of the target feature in the first sample feature data, that is, delete the first sample feature data
  • the target feature in each feature corresponding to the second sample feature data held by the second participant is determined to be:
  • the feature BB of the second participant delete the feature data of the feature BB in the second sample feature data, that is, delete the second sample feature data In the first column of data (feature data of feature BB), the second training data obtained
  • the first participant and the second participant perform joint training according to the first training data A' and the second training data B', and the collinear relationship between the first training data A' and the second training data B' has been deleted. Therefore, during joint training, the first participant and the second participant use the first training data and the second training data that do not have a linear relationship to perform joint training, and the obtained federated model has a higher accuracy and higher stability. Yes, it can improve the modeling effect of the federated model.
  • FIG. 4 is a schematic flowchart of another implementation of the data processing method provided by the embodiment of the present application, which is applied to the network architecture shown in FIG. 1 , as shown in FIG. 4, the data processing method includes the following steps:
  • Step S401 the first participant and the second participant determine a common user held by the first participant and the second participant based on the secure multi-party computing protocol.
  • the training data needs to come from the same user for joint training.
  • the second sample feature data used by the party for joint training, and their corresponding IDs are the same.
  • the first participant and the second participant determine a common user based on privacy protection technology. Determining a common user based on the privacy protection technology can be implemented as follows: the first participant and the second participant obtain the identity of the user they hold respectively; then based on the privacy protection technology, the first participant holds the user's identity and the second participant. Hold the user's identity to calculate the intersection, and the result obtained is the common user.
  • Step S402 the first participant determines the target users participating in this training.
  • the first participant determines the target users participating in the training, for example, randomly selecting random users from the common users as the target users participating in the training.
  • Step S403 the first participant sends the identifier of the target user to the second participant.
  • the second participant can also determine the target user participating in this training, and in this case, steps S402 and S403 can be replaced with:
  • Step S402' the second participant determines the target users participating in this training.
  • Step S403' the second participant sends the identifier of the target user to the first participant.
  • Step S404 the first participant acquires the first sample feature data held by itself.
  • the first participant acquires the feature data of the target user from the data stored by itself as the first sample feature data.
  • Step S405 the first participant constructs a virtual first feature correlation matrix based on the first sample feature data and the pre-trained security computing model.
  • the construction of the first feature correlation matrix by the first participant may be implemented as: based on the first sample feature data, determining that feature data of each feature corresponding to the first sample feature data corresponds to the first sample feature data Calculate the mean value and standard deviation corresponding to the feature data of each feature; determine the corresponding mean value based on the feature data of each feature, the mean value corresponding to the feature data of each feature, the standard deviation corresponding to the feature data of each feature, and the number of samples.
  • the processed first sample feature data input the processed first sample feature data into the security calculation model to obtain a first matrix; determine a first symmetric matrix according to the processed first sample feature data; generate a row A first empty matrix whose number and number of columns are equal to the number of columns of the first matrix; a virtual first feature correlation matrix is constructed according to the first symmetric matrix, the first matrix, the transposed matrix of the first matrix, and the first empty matrix.
  • Step S406 the first participant determines, based on the first feature correlation matrix, the collinear quantization factor of each feature corresponding to the first sample feature data.
  • the collinearity quantification factor may be a variance expansion factor, a regression factor, or a correlation coefficient.
  • determining the collinearity quantization factor of each feature corresponding to the first sample feature data can be implemented as: according to the determinant of each cofactor and The determinant of the feature correlation matrix is used to calculate the variance expansion factor of each feature; the variance expansion factor of each feature is determined as the collinear quantization factor of each feature corresponding to the first sample feature data.
  • of the first feature correlation matrix can be implemented as: generating a first random matrix whose determinant is a preset value and has the same dimension as the first empty matrix; Input the matrix into the security calculation model to obtain the second matrix; calculate the determinant of the first symmetric matrix and the second matrix respectively; multiply the determinant of the first symmetric matrix and the determinant of the second matrix to obtain the first characteristic correlation matrix determinant of .
  • of the i-th cofactor corresponding to the i-th feature can be implemented as: generating a random matrix whose determinant is a preset value and has the same dimension as the first empty matrix; The formula and the random matrix are input into the security calculation model, and the matrix corresponding to the i-th cofactor is obtained; the determinant of the matrix corresponding to the first symmetric matrix and the i-th cofactor is calculated respectively; Multiply the determinants of the matrices corresponding to the cofactors to obtain the determinant
  • Step S407 the first participant determines the target feature from the features corresponding to the first sample feature data based on the collinearity quantization factor.
  • determining the target feature by the first participant according to the collinearity quantization factor may be implemented as: judging whether the collinearity quantization factor of each feature corresponding to the first sample feature data is greater than a preset boundary value; Features larger than the preset boundary value are determined as target features.
  • Step S408 the first participant deletes the feature data of the target feature from the first sample feature data to obtain the first training data for the joint training of the first participant and other participants.
  • Step S409 the second participant acquires the second sample feature data held by itself.
  • the second participant obtains the feature data of the target user from the data stored by itself as the second sample feature data.
  • Step S410 the second participant constructs a virtual second feature correlation matrix based on the second sample feature data and the pre-trained security computing model.
  • the construction of the second feature correlation matrix by the second participant may be implemented as: determining the feature data of each feature corresponding to the second sample feature data and the number of samples corresponding to the second sample feature data based on the second sample feature data ; Calculate the mean and standard deviation corresponding to the feature data of each feature respectively; based on the feature data of each feature, the mean value corresponding to the feature data of each feature, the standard deviation corresponding to the feature data of each feature and the number of samples, determine the processed first Two-sample feature data; input the processed second sample feature data into the security computing model to obtain a first matrix; determine a second symmetric matrix according to the processed second sample feature data; generate a number of rows and columns equal to the first matrix A second empty matrix of the number of columns of a matrix; according to the second symmetric matrix, the first matrix, the transposed matrix of the first matrix, and the second empty matrix, a virtual first feature correlation matrix is constructed.
  • the second empty matrix generated by the second participant is F'; the second feature correlation matrix constructed by the second participant is obtained
  • Step S411 the second participant determines the collinear quantization factor of each feature corresponding to the feature data of the second sample based on the feature correlation matrix.
  • determining the collinear quantization factor of each feature corresponding to the second sample feature data by the second participant may be implemented as: determining the determinant of the second feature correlation matrix; deleting the i+m 1th of the second feature correlation matrix row, i+m 1th column data (because the first m 1 column of the second feature correlation matrix H' is the feature of the first sample feature data, so the collinear quantization factor of each feature corresponding to the second sample feature data is determined.
  • of the second feature correlation matrix can be implemented as: generating a second random matrix whose determinant is a preset value and has the same dimension as the second empty matrix; Input the random matrix into the security calculation model to obtain the second matrix; calculate the determinant of the second symmetric matrix and the second matrix respectively; multiply the determinant of the second symmetric matrix and the determinant of the second matrix to obtain the second characteristic correlation The determinant of the matrix.
  • determine the determinant of the ith cofactor corresponding to the ith feature It can be implemented as: generating a random matrix whose determinant is a preset value and the dimension is the same as that of the second empty matrix; inputting the i-th cofactor and the random matrix into the security calculation model to obtain the matrix corresponding to the i-th cofactor; calculating separately The second symmetric matrix and the determinant of the matrix corresponding to the ith cofactor; multiply the determinant of the second symmetric matrix and the determinant of the matrix corresponding to the ith cofactor to obtain the determinant of the ith cofactor
  • Step S412 the second participant determines the target feature from the features corresponding to the second sample feature data based on the collinearity quantization factor.
  • determining the target feature by the second participant according to the collinearity quantization factor may be implemented as: judging whether the collinearity quantization factor of each feature corresponding to the second sample feature data is greater than a preset boundary value; The feature of the preset boundary value is determined as the target feature.
  • Step S413 the second participant deletes the feature data of the target feature from the second sample feature data to obtain second training data for the second participant to perform joint training with other participants.
  • the first participant and the second participant use the first training data and the second training data to perform a joint model based on the secure multi-party computation protocol, since the first training data and the second training data
  • the training data with no linear relationship in the data can improve the accuracy and stability of the federated model obtained by training, thereby improving the modeling effect of the federated model.
  • the first participant and the second participant obtain a secure computing model based on secure multi-party computation pre-training; then the first participant and the second participant obtain the first sample feature data and the third Two-sample feature data, the first participant constructs a virtual first feature correlation matrix based on the first sample feature data and the security computing model, and the second participant constructs a virtual second feature based on the second sample feature data and the security computing model.
  • feature correlation matrix then the first participant determines the collinear quantization factor of each feature corresponding to the first sample feature data based on the first feature correlation matrix, and the second participant determines the second sample feature data corresponding to the second sample feature data based on the second feature correlation matrix.
  • the collinearity quantization factor of each feature the first participant determines the target feature in each feature corresponding to the first sample feature data by the collinearity quantization factor of each feature corresponding to the first sample feature data, and the second participant determines the target feature of each feature corresponding to the first sample feature data.
  • the collinear quantification factor of each feature corresponding to the two-sample feature data determines the target feature in each feature corresponding to the second-sample feature data.
  • the feature data of these target features has a linear relationship with the feature data of at least one feature in the other features.
  • the Other features include all features held by the first participant and the second participant except the target feature; after determining the target feature, the first participant deletes the feature data of the target feature from the first sample feature data to obtain the first Training data; the second participant deletes the feature data of the target feature from the second sample feature data to obtain the second training data.
  • the collinearity data in the characteristic data held by the first participant and the second participant can be screened and eliminated, and the training data without linear relationship can be obtained.
  • the first participant and the second participant perform joint training using training data that does not have a linear relationship, which can improve the accuracy and stability of the federated model and improve the modeling effect of the federated model.
  • VIF Variance Inflation Factor
  • the calculation method usually needs to aggregate the data of all parties in one place.
  • the data of various parties may involve personal privacy or business secrets, and direct opening to other parties will result in information leakage.
  • the feature screening schemes for linear federation modeling in the related art are mainly for single-column data, lacking VIF, a method that can describe the collinearity of multi-column data.
  • Related technologies cannot combine data of multiple parties to calculate VIF while protecting data privacy.
  • the embodiment of the present application for the scenario of two independent participants, it may be assumed that data joint modeling is performed between two companies.
  • the raw data of the respective companies cannot leave the respective companies for reasons such as compliance, privacy and commercial confidentiality.
  • the intermediate data exchanged during the modeling process also cannot derive or reveal unnecessary original data information.
  • the two companies each hold different feature data of the same batch of ID users, and there may be collinearity between these features, which may affect the modeling effect of the subsequent linear model.
  • the embodiments of the present application use the technical solutions of privacy and security of the two parties to calculate the variance inflation factor VIF, and perform feature screening based on this.
  • FIG. 5 is a schematic flowchart of the calculation flow of the variance inflation factor in the vertical federation situation provided by the embodiment of the present application
  • FIG. 6 is a schematic flowchart of the calculation flowchart of the determinant of the correlation matrix provided by the embodiment of the present application. The calculation method of the variance inflation factor in the vertical federation situation provided by the application embodiments will be described in detail.
  • the embodiment of this application provides two independent participants (referred to as Alice and Bob respectively), each of which holds data with different characteristics of the same batch of IDs, which are respectively recorded as matrices A and B (where A and B have the same The number of rows is n, A has the number of columns m 1 , and B has the number of columns m 2 ).
  • Alice and Bob respectively normalize the features (columns of the matrix) of A and B locally to obtain matrices C and D.
  • the way to normalize is: x represents a column of features, represents the mean of the feature, and ⁇ x represents the standard deviation of the feature.
  • H ii is the cofactor of the matrix H (that is, the matrix left by deleting the i-th row and the i-th column of the matrix), and
  • the calculation problem of the determinant of the original matrix H which cannot be processed, is converted into the block matrix F ii , E i* , and determinant calculation problem.
  • Alice locally computes
  • Bob locally generates a random matrix R 2 whose determinant is 1 and has the same dimension as M 4 .
  • a special random matrix is constructed to confuse the original matrix, and the determinant problem of a matrix involving original data information is transformed into a determinant calculation problem of a random matrix with the same determinant.
  • the method provided by the embodiment of the present application converts the original determinant calculation problem that cannot be handled into a blockable matrix determinant problem through ingenious matrix transformation; and confuses the original matrix by constructing a special random matrix, which will involve the original data information.
  • the determinant problem of the matrix is transformed into the determinant calculation problem of a random matrix with the same determinant; through the ingenious cooperation of the SPDZ protocol, the whole calculation process does not leak any additional information other than the result, and has both high security and practical calculation efficiency.
  • the method provided by the embodiment of the present application makes it possible to safely calculate the variance expansion factor VIF, thereby enabling efficient feature screening and improving the overall effect of the subsequent linear model; Unnecessary data information is leaked, and the overall computing overhead is controlled within the practical efficiency range of production.
  • the data processing apparatus 70 stored in the memory 140 is applied to jointly train the model
  • the software module in the data processing device 70 may include:
  • the building module 71 is configured to construct a virtual feature correlation matrix based on the first sample feature data held by the first participant and a pre-trained security computing model, and the security computing model is determined by the first participant and other participants of federated learning are pre-trained based on secure multi-party computation;
  • a first determination module 72 configured to determine, based on the feature correlation matrix, a collinear quantization factor of each feature corresponding to the first sample feature data
  • the second determination module 73 is configured to determine a target feature from each feature corresponding to the first sample feature data based on the collinearity quantization factor;
  • a deletion module 74 configured to delete the feature data of the target feature from the first sample feature data, to obtain the first training data for the joint training of the first participant and the other participants;
  • the feature data of the target feature and the feature data of at least one feature among other features, and the other features include features other than the target feature held by the first participant and the other features. characteristics held by the party.
  • the building block 71 includes:
  • a first determination submodule configured to determine, based on the first sample feature data, feature data of each feature corresponding to the first sample feature data and the number of samples corresponding to the first sample feature data;
  • a calculation submodule configured to calculate the mean and standard deviation corresponding to the characteristic data of each characteristic
  • the second determination sub-module is configured to determine the processed sample based on the feature data of each feature, the mean value corresponding to the feature data of each feature, the standard deviation corresponding to the feature data of each feature, and the number of samples.
  • the first sample characteristic data is configured to determine the processed sample based on the feature data of each feature, the mean value corresponding to the feature data of each feature, the standard deviation corresponding to the feature data of each feature, and the number of samples.
  • an input submodule configured to input the processed first sample characteristic data into the security computing model to obtain a first matrix
  • a construction submodule is configured to construct a virtual feature correlation matrix according to the processed first sample feature data and the first matrix.
  • the building blocks include:
  • a first determining unit configured to determine a first symmetric matrix according to the processed first sample feature data
  • a first generating unit configured to generate an empty matrix whose number of rows and columns are equal to the number of columns of the first matrix
  • a construction unit configured to construct a virtual feature correlation matrix according to the first symmetric matrix, the first matrix, the transposed matrix of the first matrix, and the empty matrix.
  • the first determining module 72 includes:
  • a third determination submodule configured to determine the determinant of the feature correlation matrix
  • a fourth determination submodule configured to determine the determinant of each cosubformula
  • the fifth determination submodule is configured to determine the collinear quantization factor of each feature corresponding to the first sample feature data based on the determinant of the feature correlation matrix and the determinant of each cofactor.
  • the third determination submodule includes:
  • a second generating unit configured to generate a first random matrix whose determinant is a preset value and has the same dimension as the empty matrix
  • an input unit configured to input the feature correlation matrix and the first random matrix into the security computing model to obtain a second matrix
  • a first calculation unit configured to calculate the determinants of the first symmetric matrix and the second matrix respectively
  • the second calculation unit is configured to multiply the determinant of the first symmetric matrix and the determinant of the second matrix to obtain the determinant of the feature correlation matrix.
  • the fifth determination submodule includes:
  • a third calculation unit configured to calculate the variance expansion factor of each feature according to the determinant of each cofactor and the determinant of the feature correlation matrix
  • the second determining unit is configured to determine the variance expansion factor of each feature as a collinear quantization factor of each feature corresponding to the first sample feature data.
  • the second determining module 73 includes:
  • a judging submodule configured to judge whether the collinear quantization factor of each feature corresponding to the first sample feature data is greater than a preset boundary value
  • the sixth determination sub-module is configured to determine a feature whose collinearity quantization factor is greater than a preset boundary value as a target feature.
  • the data processing apparatus 70 further includes:
  • an acquisition module configured to acquire common training samples held by the first participant and the other participants
  • a screening module configured to screen out this training sample from the common training samples
  • the third determination module is configured to determine the characteristic data of the training sample this time as the characteristic data of the first sample.
  • the obtaining module includes:
  • an acquisition submodule configured to acquire the identifier of the data held by the first participant
  • the seventh determination sub-module is configured to, based on encrypted calculation, use the identifier of the data held by the first participant and the identifier of the data held by the other participants to determine the data of the first participant and the other participants. common training samples.
  • Embodiments of the present application provide a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the data processing methods described above in the embodiments of the present application.
  • the embodiments of the present application provide a storage medium storing executable instructions, wherein the executable instructions are stored, and when the executable instructions are executed by a processor, the processor will cause the processor to execute the method provided by the embodiments of the present application, for example, as shown in FIG. 3 to the method shown in FIG. 6 .
  • the storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; it may also be various devices including one or any combination of the above-mentioned memories .
  • executable instructions may take the form of programs, software, software modules, scripts, or code, written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and which Deployment may be in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • executable instructions may, but do not necessarily correspond to files in a file system, may be stored as part of a file that holds other programs or data, for example, in Hyper Text Markup Language (HTML)
  • HTML Hyper Text Markup Language
  • One or more scripts in a document stored in a single file dedicated to the program in question, or in multiple cooperating files (e.g., files that store one or more modules, subprograms, or code sections) .
  • executable instructions may be deployed to be executed on one computing device, or on multiple computing devices located at one site, or alternatively, distributed across multiple sites and interconnected by a communication network execute on.
  • the embodiments of the present application provide a data processing method, device, device, storage medium and program product, the method includes: constructing a virtual feature correlation matrix based on first sample feature data and a pre-trained security computing model; feature correlation matrix, to determine the collinearity quantization factor of each feature corresponding to the first sample feature data; based on the collinearity quantization factor, determine the target feature from each feature corresponding to the first sample feature data; from the first sample feature data Deleting the feature data of the target feature in the process to obtain the first training data for joint training; wherein, the feature data of the target feature has a linear relationship with the feature data of at least one of the other features.

Abstract

Sont prévus un procédé et un appareil de traitement de données, ainsi qu'un dispositif, un support de stockage et un produit-programme. Le procédé de traitement de données consiste : à construire une matrice de corrélation de caractéristiques virtuelles en fonction de premières données de caractéristiques d'échantillon et d'un modèle de calcul de sécurité pré-entraîné (S301) ; à déterminer, en fonction de la matrice de corrélation de caractéristiques, des facteurs de quantification colinéaires de caractéristiques correspondant aux premières données de caractéristiques d'échantillon (S302) ; à déterminer, en fonction des facteurs de quantification colinéaires, une caractéristique cible parmi les caractéristiques correspondant aux premières données de caractéristiques d'échantillon (S303) ; à supprimer des données de caractéristiques de la caractéristique cible des premières données de caractéristiques d'échantillon, de façon à obtenir des premières données d'apprentissage afin d'effectuer un apprentissage conjoint (S304), une relation linéaire existant entre les données de caractéristiques de la caractéristique cible et les données de caractéristiques d'au moins une caractéristique d'autres caractéristiques.
PCT/CN2021/140955 2021-04-26 2021-12-23 Procédé et appareil de traitement de données, et dispositif, support de stockage et produit-programme WO2022227644A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110454684.2 2021-04-26
CN202110454684.2A CN113095514A (zh) 2021-04-26 2021-04-26 数据处理方法、装置、设备、存储介质及程序产品

Publications (1)

Publication Number Publication Date
WO2022227644A1 true WO2022227644A1 (fr) 2022-11-03

Family

ID=76679959

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/140955 WO2022227644A1 (fr) 2021-04-26 2021-12-23 Procédé et appareil de traitement de données, et dispositif, support de stockage et produit-programme

Country Status (2)

Country Link
CN (1) CN113095514A (fr)
WO (1) WO2022227644A1 (fr)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113095514A (zh) * 2021-04-26 2021-07-09 深圳前海微众银行股份有限公司 数据处理方法、装置、设备、存储介质及程序产品
CN113345597B (zh) * 2021-07-15 2021-11-16 中国平安人寿保险股份有限公司 传染病概率预测模型的联邦学习方法、装置及相关设备
CN114692201B (zh) * 2022-03-31 2023-03-31 北京九章云极科技有限公司 一种多方安全计算方法及系统
CN115293252A (zh) * 2022-07-29 2022-11-04 脸萌有限公司 信息分类的方法、装置、设备和介质
CN114996749B (zh) * 2022-08-05 2022-11-25 蓝象智联(杭州)科技有限公司 一种用于联邦学习的特征过滤方法
CN115545216B (zh) * 2022-10-19 2023-06-30 上海零数众合信息科技有限公司 一种业务指标预测方法、装置、设备和存储介质

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160283735A1 (en) * 2015-03-24 2016-09-29 International Business Machines Corporation Privacy and modeling preserved data sharing
CN110909216A (zh) * 2019-12-04 2020-03-24 支付宝(杭州)信息技术有限公司 检测用户属性之间的关联性的方法及装置
CN111062487A (zh) * 2019-11-28 2020-04-24 支付宝(杭州)信息技术有限公司 基于数据隐私保护的机器学习模型特征筛选方法及装置
CN111160573A (zh) * 2020-04-01 2020-05-15 支付宝(杭州)信息技术有限公司 保护数据隐私的双方联合训练业务预测模型的方法和装置
US20200285984A1 (en) * 2019-03-06 2020-09-10 Hcl Technologies Limited System and method for generating a predictive model
CN111654853A (zh) * 2020-08-04 2020-09-11 索信达(北京)数据技术有限公司 一种基于用户信息的数据分析方法
CN111966473A (zh) * 2020-07-24 2020-11-20 支付宝(杭州)信息技术有限公司 一种线性回归任务的运行方法及装置、电子设备
CN112597540A (zh) * 2021-01-28 2021-04-02 支付宝(杭州)信息技术有限公司 基于隐私保护的多重共线性检测方法、装置及系统
CN113095514A (zh) * 2021-04-26 2021-07-09 深圳前海微众银行股份有限公司 数据处理方法、装置、设备、存储介质及程序产品

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160283735A1 (en) * 2015-03-24 2016-09-29 International Business Machines Corporation Privacy and modeling preserved data sharing
US20200285984A1 (en) * 2019-03-06 2020-09-10 Hcl Technologies Limited System and method for generating a predictive model
CN111062487A (zh) * 2019-11-28 2020-04-24 支付宝(杭州)信息技术有限公司 基于数据隐私保护的机器学习模型特征筛选方法及装置
CN110909216A (zh) * 2019-12-04 2020-03-24 支付宝(杭州)信息技术有限公司 检测用户属性之间的关联性的方法及装置
CN111160573A (zh) * 2020-04-01 2020-05-15 支付宝(杭州)信息技术有限公司 保护数据隐私的双方联合训练业务预测模型的方法和装置
CN111966473A (zh) * 2020-07-24 2020-11-20 支付宝(杭州)信息技术有限公司 一种线性回归任务的运行方法及装置、电子设备
CN111654853A (zh) * 2020-08-04 2020-09-11 索信达(北京)数据技术有限公司 一种基于用户信息的数据分析方法
CN112597540A (zh) * 2021-01-28 2021-04-02 支付宝(杭州)信息技术有限公司 基于隐私保护的多重共线性检测方法、装置及系统
CN113095514A (zh) * 2021-04-26 2021-07-09 深圳前海微众银行股份有限公司 数据处理方法、装置、设备、存储介质及程序产品

Also Published As

Publication number Publication date
CN113095514A (zh) 2021-07-09

Similar Documents

Publication Publication Date Title
WO2022227644A1 (fr) Procédé et appareil de traitement de données, et dispositif, support de stockage et produit-programme
WO2021120676A1 (fr) Procédé de formation de modèle pour réseau d'apprentissage fédéré et dispositif afférent
CN110189192B (zh) 一种信息推荐模型的生成方法及装置
TWI689841B (zh) 資料加密、機器學習模型訓練方法、裝置及電子設備
CN112085159B (zh) 一种用户标签数据预测系统、方法、装置及电子设备
WO2021204268A1 (fr) Procédé et système d'exécution d'un apprentissage de modèle sur la base de données de confidentialité
Shi et al. Secure multi-pArty computation grid LOgistic REgression (SMAC-GLORE)
Zhang et al. Secure distributed genome analysis for GWAS and sequence comparison computation
CN113159327A (zh) 基于联邦学习系统的模型训练方法、装置、电子设备
US11450439B2 (en) Realizing private and practical pharmacological collaboration using a neural network architecture configured for reduced computation overhead
US20160125141A1 (en) Method for privacy-preserving medical risk test
WO2021120855A1 (fr) Procédé et système d'exécution d'une formation de modèle sur la base de données de confidentialité
Wu et al. MNSSp3: Medical big data privacy protection platform based on Internet of things
Singh Accurate and efficient approximations for generalized population balances incorporating coagulation and fragmentation
Wang et al. Differentially private SGD with non-smooth losses
Omer et al. Privacy-preserving of SVM over vertically partitioned with imputing missing data
Zhang et al. Joint intelligence ranking by federated multiplicative update
Chen et al. Privacy-preserving SVM on outsourced genomic data via secure multi-party computation
Song et al. Group decision making with hesitant fuzzy linguistic preference relations based on multiplicative DEA cross-efficiency and stochastic acceptability analysis
Wang et al. An intelligent blockchain-based access control framework with federated learning for genome-wide association studies
CN112949866A (zh) 泊松回归模型的训练方法、装置、电子设备及存储介质
CN112016698A (zh) 因子分解机模型构建方法、设备及可读存储介质
Shah et al. Maintaining privacy in medical imaging with federated learning, deep learning, differential privacy, and encrypted computation
Lv et al. On the sign consistency of the Lasso for the high-dimensional Cox model
Xu et al. Efficient and privacy-preserving similar electronic medical records query for large-scale ehealthcare systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21939103

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE