CN115545216A - Service index prediction method, device, equipment and storage medium - Google Patents

Service index prediction method, device, equipment and storage medium Download PDF

Info

Publication number
CN115545216A
CN115545216A CN202211278879.7A CN202211278879A CN115545216A CN 115545216 A CN115545216 A CN 115545216A CN 202211278879 A CN202211278879 A CN 202211278879A CN 115545216 A CN115545216 A CN 115545216A
Authority
CN
China
Prior art keywords
initiator
partner
data
data set
correlation coefficient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211278879.7A
Other languages
Chinese (zh)
Other versions
CN115545216B (en
Inventor
孙银银
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Lingshuzhonghe Information Technology Co ltd
Original Assignee
Shanghai Lingshuzhonghe Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Lingshuzhonghe Information Technology Co ltd filed Critical Shanghai Lingshuzhonghe Information Technology Co ltd
Priority to CN202211278879.7A priority Critical patent/CN115545216B/en
Publication of CN115545216A publication Critical patent/CN115545216A/en
Priority to PCT/CN2023/079369 priority patent/WO2024082514A1/en
Application granted granted Critical
Publication of CN115545216B publication Critical patent/CN115545216B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/008Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols involving homomorphic encryption

Abstract

The invention discloses a business index prediction method, which comprises the following steps: determining an initiator data set of an initiator and a partner data set of a partner in longitudinal federal learning; encrypting the initiator data set through the initiator device and sending the encrypted initiator data set to the partner device; decrypting the ciphertext correlation coefficient matrix through the initiator device to determine a plaintext correlation coefficient matrix; determining an initiator correlation coefficient matrix and a partner correlation coefficient matrix according to the initiator data set and the partner data set; performing characteristic multiple co-linear analysis on the initiator and the partner according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix and the plaintext correlation coefficient matrix; and determining model training data from the initiator data set and the partner data set according to the analysis result, and training the federated learning model by adopting the model training data. The training efficiency of the Federal learning model is improved, and the feature interpretability of the linear model is guaranteed.

Description

Service index prediction method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the field of computers, in particular to a service index prediction method, a service index prediction device, service index prediction equipment and a storage medium.
Background
The longitudinal federated learning is often used for solving the problem that one party of longitudinal federated learning has too few data dimensions, and the modeling target cannot be well realized only by using one party of data, so that the method is mostly used for joint modeling among different industries. When longitudinal multiple collinearity federal modeling is carried out, a task initiator and a partner have a common sample space and different feature spaces, an encryption algorithm is needed to be used for ensuring the data privacy safety of a data user and a data party, meanwhile, the multiple collinearity of each feature and other features is needed to be calculated, the features with larger collinearity are removed, the modeling efficiency and accuracy are improved, the federal multiple collinearity calculation realized by a linear model method needs multiple interactive iterative training to calculate the fitting degree, the communication consumption and the calculation complexity are larger, and the calculation result depends on the setting of model hyperparameters. Therefore, how to ensure the accuracy of the federal learning model while improving the training efficiency of the federal learning model is a problem to be solved.
Disclosure of Invention
The invention provides a business index prediction method, a business index prediction device, business index prediction equipment and a storage medium, which can improve the calculation efficiency and calculation precision of a linear model in longitudinal federal learning and improve the prediction precision of a federal learning model, so that the prediction precision of user business indexes can be improved when the business indexes of users are predicted through the federal learning model.
According to an aspect of the present invention, a method for predicting a service index is provided, including:
determining an initiator data set and a partner data set according to initiator service data of an initiator and partner service data of a partner in longitudinal federal learning;
encrypting the initiator data set through initiator equipment and sending the encrypted initiator data set to partner equipment; acquiring a ciphertext correlation coefficient matrix between the initiator and the partner from the partner device, decrypting the ciphertext correlation coefficient matrix through the initiator device, and determining a plaintext correlation coefficient matrix according to a decryption result;
determining an initiator correlation coefficient matrix of the initiator and a partner correlation coefficient matrix of the partner according to the initiator data set and the partner data set;
performing characteristic multiple collinearity analysis on the initiator and the partner according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix and the plaintext correlation coefficient matrix;
determining model training data from the initiator data set and the partner data set according to the result of the characteristic multiple collinearity analysis, and training a federal learning model by adopting the model training data; the federal learning model is used for predicting the business index value of the user.
According to another aspect of the present invention, there is provided a service index prediction apparatus, including:
the data set determining module is used for determining an initiator data set and a partner data set according to initiator business data of an initiator and partner business data of a partner in longitudinal federal learning;
the data set encryption transmission module is used for encrypting the initiator data set through initiator equipment and sending the encrypted initiator data set to partner equipment; acquiring a ciphertext correlation coefficient matrix between the initiator and the partner from the partner device, decrypting the ciphertext correlation coefficient matrix through the initiator device, and determining a plaintext correlation coefficient matrix according to a decryption result;
a correlation coefficient matrix determining module, configured to determine an initiator correlation coefficient matrix of the initiator and a partner correlation coefficient matrix of the partner according to the initiator data set and the partner data set;
the multiple collinearity analysis module is used for carrying out characteristic multiple collinearity analysis on the initiator and the partner according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix and the plaintext correlation coefficient matrix;
the model training module is used for determining model training data from the initiator data set and the partner data set according to a characteristic multiple collinearity analysis result and training a federal learning model by adopting the model training data; the federal learning model is used for predicting the business index value of the user.
According to another aspect of the present invention, there is provided an electronic apparatus including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform a method of traffic indicator prediction according to any of the embodiments of the present invention.
According to another aspect of the present invention, a computer-readable storage medium is provided, where computer instructions are stored, and the computer instructions are configured to enable a processor to implement the business index prediction method according to any embodiment of the present invention when executed.
According to the technical scheme of the embodiment of the invention, an initiator data set and a partner data set are determined according to initiator business data of an initiator and partner business data of a partner in longitudinal federal learning; encrypting the initiator data set through the initiator device and sending the encrypted initiator data set to the partner device; acquiring a ciphertext correlation coefficient matrix between the initiator and the partner from the partner device, decrypting the ciphertext correlation coefficient matrix through the initiator device, and determining a plaintext correlation coefficient matrix according to a decryption result; determining an initiator correlation coefficient matrix of the initiator and a partner correlation coefficient matrix of the partner according to the initiator data set and the partner data set; performing characteristic multiple collinearity analysis on the initiator and the partner data set according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix and the plaintext correlation coefficient matrix; and determining model training data from the initiator data set and the partner data set according to the characteristic multiple collinearity analysis result, and training the federal learning model by adopting the model training data. The method solves the problems that when a regression coefficient method is used for analyzing the multiple collinearity of the characteristics of an initiator and a partner in longitudinal federated learning, multiple times of interactive iterative training are needed to calculate the fitting degree, the communication consumption and the calculation complexity are high, and the calculation result depends on the setting of the model hyperparameters. According to the scheme, the characteristic multiple collinearity of the initiator and the partner in the longitudinal federal learning is calculated based on a correlation coefficient method, parameters do not need to be adjusted in the calculation process, the data interaction times when the characteristic multiple collinearity of the initiator and the feature multiple collinearity of the interaction party are calculated are reduced, a homomorphic encryption algorithm and a correlation coefficient method are used, the training data of the federal learning model are determined through distributed calculation, the training of the federal learning model is achieved, the training efficiency of the federal learning model is improved, and the interpretability of a linear model in the federal learning model is guaranteed. The method adopts the Federal learning model to predict the business index value of the user, and can improve the prediction precision of the business index value.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a service index prediction method according to an embodiment of the present invention;
fig. 2 is a flowchart of a service index prediction method according to a second embodiment of the present invention;
fig. 3 is a flowchart of a service index prediction method according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a service index prediction apparatus according to a fourth embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solutions of the present invention, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, shall fall within the protection scope of the present invention.
It is to be understood that the terms "target" and the like in the description and claims of the present invention and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and any other variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Example one
Fig. 1 is a flowchart of a service index prediction method according to an embodiment of the present invention, which is applicable to a case where a federal learning model for predicting a service index is trained according to initiator service data and partner service data of an initiator in longitudinal federal learning. The method is particularly suitable for the situation that multiple collinearity analysis is carried out on initiator business data and partner business data of an initiator in longitudinal federal learning through a correlation coefficient method, training data of a federal learning model is determined according to multiple collinearity analysis results, and the federal learning model used for predicting business indexes is trained through the training data. The method may be performed by a service index prediction device, which may be implemented in hardware and/or software, and which may be configured in an electronic device. As shown in fig. 1, the method includes:
s110, determining an initiator data set and a partner data set according to initiator business data of the initiator and partner business data of the partner in longitudinal federal learning.
The vertical federated learning is generally applicable to a federated learning scenario composed of participants with the same sample space and different feature spaces on a data set. A federated learning model may be cooperatively trained for different participants through longitudinal federated learning. In this embodiment, the participants are the initiator and the partner in the longitudinal federal learning. The initiator and the partner may be enterprises of different industries with cooperative needs. The initiator service data refers to sample data which can represent the service index of the user at the initiator and is acquired by the initiator under the condition of user permission; the partner business data refers to sample data which can represent business indexes of the user at the partner and is acquired by the partner under the condition of user permission. The business index refers to an index for measuring the behavior of a certain aspect of the user, for example, the business index can be a credit index or a performance index of the user at the initiator and the partner. The initiator dataset refers to a dataset that contains characteristic data of the initiator. A partner data set refers to a data set that contains characteristic data of a partner.
Specifically, under the condition of user permission corresponding to the initiator, initiator service data corresponding to the initiator in longitudinal federal learning is obtained, feature extraction is performed on the initiator service data, feature data of the initiator service data is determined, and an initiator data set is determined according to the feature data of the initiator service data. Under the condition of user permission corresponding to the partner, partner business data corresponding to the partner in longitudinal federal learning are obtained, feature extraction is carried out on the partner business data, feature data of the partner business data are determined, and a partner data set is determined according to the feature data of the partner business data.
For example, the method for determining the initiator data set and the partner data set may be: determining the same users of an initiator and a partner in longitudinal federal learning, and determining a data intersection between initiator service data of the initiator and partner service data of the partner based on the identity of the same users; and respectively processing the initiator service data and the partner service data according to the data intersection to determine an initiator data set and a partner data set.
The identity mark refers to data capable of representing the identity of a user, and the identity mark can comprise an identity card number or a mobile phone number.
Specifically, the same user may exist between the initiator and the partner of the longitudinal federal learning, so that when the federal learning model is trained, the same user of the initiator and the partner in the longitudinal federal learning needs to be determined, and under the condition of user permission, the identity of the same user is obtained. And determining a data intersection between the initiator service data of the initiator and the partner service data of the partner based on the identity of the same user. Integrating the data intersection and the characteristic data of the service data of the initiator to determine an initiator data set; and integrating the data intersection and the feature data of the partner service data to determine a partner data set.
It can be understood that, according to the same users of the initiator and the partner in the longitudinal federal learning, the data intersection between the service data of the initiator and the service data of the partner is determined, and according to the data intersection and the service data of the initiator, the data set of the initiator is determined; and determining the partner data set according to the data intersection and the partner service data, so that the intuitiveness of the incidence relation between the initiator data set and the partner data set can be improved, and the subsequent calculation of the multiple co-linearity of the characteristics of the initiator and the partner according to the initiator data set and the partner data set is facilitated.
S120, encrypting the initiator data set through the initiator device, and sending the encrypted initiator data set to the partner device; and acquiring a ciphertext correlation coefficient matrix between the initiator and the partner from the partner device, decrypting the ciphertext correlation coefficient matrix through the initiator device, and determining a plaintext correlation coefficient matrix according to a decryption result.
The ciphertext correlation coefficient matrix is a ciphertext correlation coefficient matrix between the feature data in the initiator data set and the feature data in the partner data set. The plaintext correlation coefficient matrix is specifically a plaintext correlation coefficient matrix between the feature data in the initiator data set and the feature data in the partner data set. The initiator device refers to a terminal device corresponding to the initiator, and the partner device refers to a terminal device corresponding to the partner. The correlation coefficient is a statistical index used for reflecting the degree of closeness of the correlation between the variables.
Illustratively, encrypting the initiator data set and decrypting the ciphertext correlation coefficient matrix may be implemented by the following sub-steps:
s1201, generating a key pair by adopting a homomorphic encryption algorithm through initiator equipment, and encrypting an initiator data set by adopting a public key in the key pair.
The homomorphic encryption algorithm is an encryption algorithm meeting the homomorphic operation property of a ciphertext, namely after homomorphic encryption is carried out on data, specific calculation is carried out on the ciphertext, the plaintext obtained after corresponding homomorphic decryption is carried out on the ciphertext calculation result is equal to that the plaintext data is directly subjected to the same calculation, and the data can be calculated and cannot be seen.
Specifically, a key pair is generated by the initiator device by adopting a homomorphic encryption algorithm, and the key pair comprises a public key and a private key. The initiator device encrypts the initiator dataset using the public key of the key pair.
S1202, the encrypted initiator data set is sent to the partner device, so that a ciphertext feature correlation coefficient between feature data in the initiator data set and feature data in the partner data set is calculated through the partner device according to the encrypted initiator data set and multiplication characteristics of a homomorphic encryption algorithm, and a ciphertext correlation coefficient matrix between the initiator and the partner is determined according to the ciphertext feature correlation coefficient.
The ciphertext feature correlation coefficient is a ciphertext correlation coefficient obtained by calculating a ciphertext between the feature data in the partner data set and the feature data in the encrypted initiator data set.
Specifically, the encrypted initiator data set is sent to the partner device through the initiator device, so that the partner device calculates ciphertext feature correlation coefficients between each feature data in the encrypted initiator data set and each feature data in the partner data set according to the multiplication characteristic of a homomorphic encryption algorithm and the encrypted initiator data set, integrates the calculation results, and determines a ciphertext correlation coefficient matrix between the feature data of the initiator and the feature data of the partner.
S1203, sending the ciphertext correlation coefficient matrix to the initiator device, so that the initiator device decrypts the ciphertext correlation coefficient matrix by using a private key in the key pair, and determining a plaintext correlation coefficient matrix between the feature data in the initiator data set and the feature data in the partner data set according to a decryption result.
Specifically, the ciphertext correlation coefficient matrix is sent to the initiator device through the partner device. And decrypting the ciphertext correlation coefficient matrix by the initiator device by adopting a private key in the key pair, and determining a plaintext correlation coefficient matrix between the characteristic data in the initiator data set and the characteristic data in the partner data set according to a decryption result.
It can be understood that, in the above scheme, the problem of calculating the feature correlation coefficient of the initiator and the partner in the longitudinal federal learning is converted into the problem of calculating the ciphertext correlation coefficient matrix between the initiator and the partner by using a homomorphic encryption algorithm, so that the data interaction times of a regression coefficient method in calculating the multiple collinearity of the features of the initiator and the partner are reduced, the calculation process is simplified, and the calculation efficiency is improved.
S130, determining an initiator correlation coefficient matrix of the initiator and a partner correlation coefficient matrix of the partner according to the initiator data set and the partner data set.
The initiator correlation coefficient matrix is a matrix formed by correlation coefficients among the feature data in the initiator data set. The partner correlation coefficient matrix is a matrix formed by correlation coefficients among the feature data in the partner data set.
Specifically, the feature data in the initiator data set, that is, the feature data of the initiator, is determined, the correlation coefficient between the feature data of the initiator is calculated, and the initiator correlation coefficient matrix of the initiator is determined according to the correlation coefficient between the feature data of the initiator. Determining feature data in the partner data set, namely feature data of a partner, calculating a correlation coefficient between the feature data of the partner, and determining a partner correlation coefficient matrix of the partner according to the correlation coefficient between the feature data of the partner.
For example, the initiator correlation coefficient matrix of the initiator and the partner correlation coefficient matrix of the partner may be determined according to the following sub-steps:
s1301, carrying out standardization processing on the feature data in the initiator data set and the feature data in the partner data set, and determining initiator standardized data and partner standardized data.
The standardization processing is to convert the original data according to a certain proportion by a certain mathematical transformation mode to make the original data fall into a small specific interval, wherein the specific interval can be [0,1] or [ -1,1] to eliminate the difference of characteristic attributes such as properties, dimensions and magnitude between different variables, and convert the difference into a dimensionless relative value, namely a standardized value, so that the values of all indexes are in the same quantity level, thereby facilitating the comprehensive analysis and comparison of the indexes of different units or magnitude levels.
Specifically, the feature data in the initiator data set and the feature data in the partner data set may be normalized by a Z-score normalization method to determine the initiator normalized data and the partner normalized data.
For example, the initiator standardized data may be obtained by: determining feature data in an initiator data set and the number of samples corresponding to the feature data; calculating the average value of the feature data in the initiator data set, determining the variance of the feature values according to the average value of the feature data in the initiator data set and the number of the samples of the management party, and determining the standardized data of the initiator according to the variance of the feature values, the feature data in the initiator data set and the average value of the feature data in the initiator data set.
S1302, determining an initiator correlation coefficient matrix according to correlation coefficients among the characteristic data of the initiator standardized data, and determining a partner correlation coefficient matrix of a partner according to correlation coefficients among the characteristic data of the partner standardized data.
Each feature data of the initiator standardized data is a standardized result of the feature data in the initiator data set. Each feature data of the partner standardized data refers to a standardized result of feature data in the partner dataset.
Specifically, correlation coefficients among characteristic data of the initiator standardized data are calculated and integrated, and an initiator correlation coefficient matrix of the initiator is determined. And calculating and integrating correlation coefficients among the characteristic data of the partner standardized data, and determining a correlation coefficient matrix among the characteristics of the partner.
Before the correlation coefficient matrix of the initiator and the correlation coefficient matrix of the partner are calculated, the data set of the initiator and the data set of the partner are subjected to standardization processing, the problem of calculating the correlation coefficient matrix of the feature data of the initiator and the feature data of the partner can be converted into the problem of homomorphic multiplication of the feature data of the initiator and the feature data of the partner, and the data privacy safety of the initiator and the partner is guaranteed.
And S140, performing characteristic multiple collinearity analysis on the initiator and the partner according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix and the plaintext correlation coefficient matrix.
Specifically, according to the correlation coefficient matrix of the initiator, the correlation coefficient matrix of the partner, and the plaintext correlation coefficient matrix between the feature data of the initiator and the feature data of the partner, multiple co-linearity analysis of features of the initiator and the partner is performed to obtain multiple co-linearity values of each feature data of the initiator and multiple co-linearity values of each feature data of the partner. And the multiple collinearity value of each feature data of the initiator is the federal multiple collinearity value of each feature data of the initiator in federal learning. The multiple collinearity value of each feature data of the partner is the federal multiple collinearity value, that is, the federal multiple collinearity value of each feature data of the partner in the federal learning.
And S150, determining model training data from the initiator data set and the partner data set according to the characteristic multiple collinearity analysis result, and training the federal learning model by adopting the model training data.
The federated learning model is used for predicting the business index value of the user.
Specifically, according to the result of the characteristic multiple collinearity analysis, multiple collinearity values of each characteristic data of the initiator and multiple collinearity values of each characteristic data of the partner are determined. Determining model training data from the feature data in the initiator data set and the feature data in the partner data set according to the multiple collinearity values of the feature data of the initiator, the multiple collinearity values of the feature data of the partner and preset model training data screening conditions, and training the federal learning model by adopting the model training data. And determining the prediction result of the business index value of the user according to the output data of the trained federal learning model by taking the business data of the user needing business index value prediction as the input data of the trained federal learning model.
Illustratively, feature data in the initiator data set and feature data in the partner data set can be screened according to the feature multiple co-linearity analysis result and the multiple co-linearity threshold, the feature data in the initiator data set and the feature data in the partner data set, of which the multiple co-linearity values are smaller than the multiple co-linearity threshold, are determined to be model training data, and the federal learning model is trained through the model training data.
It can be understood that feature data with a multiple collinearity value smaller than a multiple collinearity threshold value are screened out from the initiator data set and the partner data set and used as model training data of the federal learning model, and therefore model training efficiency and model reliability can be improved.
According to the technical scheme provided by the embodiment, an initiator data set and a partner data set are determined according to initiator service data of an initiator and partner service data of a partner in longitudinal federal learning; encrypting the initiator data set through the initiator device and sending the encrypted initiator data set to the partner device; acquiring a ciphertext correlation coefficient matrix between the initiator and the partner from the partner device, decrypting the ciphertext correlation coefficient matrix through the initiator device, and determining a plaintext correlation coefficient matrix according to a decryption result; determining an initiator correlation coefficient matrix of the initiator and a partner correlation coefficient matrix of the partner according to the initiator data set and the partner data set; performing characteristic multiple collinearity analysis on the initiator and the partner data set according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix and the plaintext correlation coefficient matrix; and determining model training data from the initiator data set and the partner data set according to the characteristic multiple collinearity analysis result, and training the federal learning model by adopting the model training data. The method solves the problems that when a regression coefficient method is used for analyzing the multiple collinearity of the characteristics of an initiator and a partner in longitudinal federated learning, multiple times of interactive iterative training are needed to calculate the fitting degree, the communication consumption and the calculation complexity are high, and the calculation result depends on the setting of the model hyperparameters. According to the scheme, the characteristic multiple collinearity of the initiator and the partner in the longitudinal federal learning is calculated based on a correlation coefficient method, parameters do not need to be adjusted in the calculation process, the data interaction times when the characteristic multiple collinearity of the initiator and the feature multiple collinearity of the interaction party are calculated are reduced, a homomorphic encryption algorithm and a correlation coefficient method are used, the training data of the federal learning model are determined through distributed calculation, the training of the federal learning model is achieved, the training efficiency of the federal learning model is improved, and the interpretability of a linear model in the federal learning model is guaranteed. The method adopts the Federal learning model to predict the business index value of the user, and can improve the prediction precision of the business index value.
Example two
Fig. 2 is a flowchart of a service index prediction method provided in the second embodiment of the present invention, and this embodiment optimizes the service index prediction method based on the above embodiments and provides a preferred implementation manner of performing feature multiple collinearity analysis on the initiator and the partner according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix, and the plaintext correlation coefficient matrix. Specifically, as shown in fig. 2, the method includes:
s210, determining an initiator data set and a partner data set according to initiator business data of the initiator and partner business data of the partner in longitudinal federal learning.
S220, encrypting the initiator data set through the initiator device, and sending the encrypted initiator data set to the partner device; and acquiring a ciphertext correlation coefficient matrix between the initiator and the partner from the partner device, decrypting the ciphertext correlation coefficient matrix through the initiator device, and determining a plaintext correlation coefficient matrix according to a decryption result.
And S230, determining an initiator correlation coefficient matrix of the initiator and a partner correlation coefficient matrix of the partner according to the initiator data set and the partner data set.
S240, determining a correlation coefficient fusion matrix according to the correlation coefficient matrix of the initiator, the correlation coefficient matrix of the partner and the correlation coefficient matrix of the plaintext, and determining a complete matrix determinant value of the correlation coefficient fusion matrix.
The correlation coefficient fusion matrix is a matrix obtained by combining an initiator correlation coefficient matrix, a partner correlation coefficient matrix and a plaintext correlation coefficient matrix.
Specifically, the initiator correlation coefficient matrix, the partner correlation coefficient matrix and the plaintext correlation coefficient matrix are combined together to obtain a correlation coefficient fusion matrix. And calculating a determinant value of the correlation coefficient fusion matrix, and taking the determinant value of the correlation coefficient fusion matrix as a complete matrix determinant value of the correlation coefficient fusion matrix.
And S250, determining a characteristic matrix determinant value after deleting characteristic rows and characteristic columns of matrix characteristic data from the correlation coefficient fusion matrix.
The matrix characteristic data refers to characteristic data contained in the correlation coefficient fusion matrix. The characteristic row refers to a row where the matrix characteristic data are located in the correlation coefficient fusion matrix. The characteristic column refers to a column in which matrix characteristic data are located in the correlation coefficient fusion matrix.
Specifically, determining characteristic data of each matrix forming the correlation coefficient fusion matrix, deleting characteristic rows and characteristic columns of each matrix data from the correlation coefficient fusion matrix respectively, obtaining a characteristic matrix corresponding to each matrix characteristic data, and determining a characteristic matrix determinant value corresponding to each matrix characteristic data.
S260, taking the ratio of the determinant value of the characteristic matrix and the determinant value of the complete matrix as a multiple co-linearity value of the characteristic data of the matrix, integrating the multiple co-linearity values of all the characteristic data of the matrix, determining the multiple co-linearity values of an initiator and a partner, and taking the multiple co-linearity values of the initiator and the partner as a characteristic multiple co-linearity analysis result.
Specifically, the ratio of the determinant value of the feature matrix to the determinant value of the complete matrix is used as the multiple collinearity value of the matrix feature data corresponding to the determinant value of the feature matrix, and the multiple collinearity values corresponding to all the matrix feature data are integrated to obtain the multiple collinearity value of each feature data of the initiator and the multiple collinearity value of each feature data of the partner. And taking the multicollinearity value of each characteristic data of the initiator and the multicollinearity value of each characteristic data of the partner as the characteristic multicollinearity analysis result.
And S270, determining model training data from the initiator data set and the partner data set according to the characteristic multiple collinearity analysis result, and training the federal learning model by adopting the model training data.
The federated learning model is used for predicting the business index value of the user.
The technical scheme of the embodiment provides a calculation method for performing characteristic multiple collinearity analysis on an initiator and a partner to determine multiple collinearity values of the initiator and the partner. According to the scheme, the correlation coefficient fusion matrix is determined according to the correlation coefficient matrix of the initiator, the correlation coefficient matrix of the partner and the correlation coefficient matrix of the plaintext, and the determinant value of the characteristic matrix corresponding to each characteristic data in the correlation coefficient fusion matrix is calculated according to the correlation coefficient fusion matrix. According to the characteristic matrix determinant value corresponding to each characteristic data and the complete matrix determinant value of the correlation coefficient fusion matrix, the accurate multiple co-linearity value of each characteristic data can be obtained, and therefore the accuracy of the characteristic multiple co-linearity analysis results of the initiator and the partner is improved.
EXAMPLE III
Fig. 3 is a flowchart of a service index prediction method provided in the third embodiment of the present invention, and this embodiment optimizes the service index prediction method based on the above embodiments, and provides a preferred embodiment of performing filtering processing on feature data in an initiator data set and feature data in a partner data set according to data contribution degrees of the feature data in the initiator data set and the feature data in the partner data set. Specifically, as shown in fig. 3, the method includes:
s310, determining an initiator data set and a partner data set according to initiator business data of the initiator and partner business data of the partner in the longitudinal federal study.
S320, determining the data contribution degree of the feature data in the initiator data set and the feature data in the partner data set.
The data contribution degree may be determined by an IV (information Value) Value of the data. The prediction capability of a feature, which is used to indicate the degree of contribution of feature data to target prediction, is generally higher, the higher the IV value is, the stronger the prediction capability of the feature is, and the higher the information contribution degree is.
Specifically, the IV values of the feature data in the initiator data set and the feature data in the partner data set may be calculated by performing WOE (Weight of Evidence) weighted summation on the feature data in the initiator data set and the feature data in the partner data set, and the IV values are used as the data contribution degrees.
For example, according to the data contribution degree, performing WOE calculation on the feature data in the initiator data set and the feature data in the partner data set, calculating an IV value of each feature data, performing feature co-linearity analysis on the calculation result of the WOE, performing feature co-linearity screening on the data sets through the IV value and a feature co-linearity threshold, and comparing the contribution degree of the feature data before feature co-linearity screening and the contribution degree of the feature data after screening, it can be known that the contribution degree of the feature data is changed from a negative number before screening to a positive number after screening, and therefore, the feature co-linearity analysis has an influence on the model interpretability.
And S330, filtering the characteristic data in the initiator data set and the characteristic data in the partner data set according to the data contribution degree.
Specifically, according to actual needs, a contribution threshold is set, feature data in the initiator data set and feature data in the partner data set are filtered according to the contribution threshold and the data contribution degree, and the feature data with the data contribution degree smaller than the contribution threshold is filtered and deleted from the feature data in the initiator data set and the feature data in the partner data set. For example, the contribution threshold may be 0.1.
It should be noted that all the features of the initiator and the partner that are not subjected to the filtering process are subjected to model training, and the feature data in all the initiator data sets and the feature data in the partner data sets have weights. The feature data in the initiator data set and the feature data in the partner data set are filtered, model training is carried out according to the feature data with high contribution rate in the initiator data set and the partner data set, the feature data obtained after filtering also have corresponding weights in the model training process, the weights of the same features in two times of training are compared, some feature data are found, the weights before data filtering are negative, the weights after data filtering are positive, and the fact that multiple collinearity analysis has influence on interpretability of the Federal learning model is explained. The characteristics are screened through the federal multiple collinearity analysis, the characteristic data with smaller IV value in the initiator data set and the partner data set are filtered, and the model training speed can be improved.
S340, encrypting the initiator data set through the initiator device, and sending the encrypted initiator data set to the partner device; and acquiring a ciphertext correlation coefficient matrix between the initiator and the partner from the partner device, decrypting the ciphertext correlation coefficient matrix through the initiator device, and determining a plaintext correlation coefficient matrix according to a decryption result.
It should be noted that, in this step, the feature data in the initiator data set and the feature data in the partner data set are respectively feature data in the initiator data set and feature data in the partner data set after filtering processing.
S350, determining an initiator correlation coefficient matrix of the initiator and a partner correlation coefficient matrix of the partner according to the initiator data set and the partner data set.
And S360, performing characteristic multiple collinearity analysis on the initiator and the partner according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix and the plaintext correlation coefficient matrix.
And S370, determining model training data from the initiator data set and the partner data set according to the characteristic multiple collinearity analysis result, and training the federal learning model by adopting the model training data.
The federated learning model is used for predicting the business index value of the user.
According to the technical scheme of the embodiment, before model training data is determined according to an initiator data set and a partner data set and a federated learning model is trained by adopting the model training data, the feature data in the initiator data set and the feature data in the partner data set are filtered according to the contribution degree of the feature data in the initiator data set and the contribution degree of the feature data in the partner data set, so that the training data of the federated learning model is obtained. By the scheme, the reliability of the federal learning model can be guaranteed, and meanwhile, the model training speed is increased.
Example four
Fig. 4 is a schematic structural diagram of a service indicator prediction apparatus according to a fourth embodiment of the present invention. The embodiment can be applied to the situation of training the federal learning model for predicting the service index according to the initiator service data and the partner service data of the initiator in the longitudinal federal learning. As shown in fig. 4, the service index prediction apparatus includes: a data set determination module 410, a data set encryption transmission module 420, a correlation coefficient matrix determination module 430, a multicollinearity analysis module 440, and a model training module.
The data set determining module 410 is configured to determine an initiator data set and a partner data set according to initiator service data of an initiator and partner service data of a partner in longitudinal federal learning;
a data set encryption transmission module 420, configured to encrypt the initiator data set by the initiator device and send the encrypted initiator data set to the partner device; acquiring a ciphertext correlation coefficient matrix between the initiator and the partner from the partner device, decrypting the ciphertext correlation coefficient matrix through the initiator device, and determining a plaintext correlation coefficient matrix according to a decryption result;
a correlation coefficient matrix determining module 430, configured to determine an initiator correlation coefficient matrix of the initiator and a partner correlation coefficient matrix of the partner according to the initiator data set and the partner data set;
the multiple collinearity analysis module 440 is configured to perform multiple collinearity analysis on the characteristics of the initiator and the partner according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix, and the plaintext correlation coefficient matrix;
the model training module 450 is used for determining model training data from the initiator data set and the partner data set according to the characteristic multiple collinearity analysis result, and training the federal learning model by adopting the model training data; the federal learning model is used for predicting the business index value of the user.
According to the technical scheme provided by the embodiment, an initiator data set and a partner data set are determined according to initiator service data of an initiator and partner service data of a partner in longitudinal federal learning; encrypting the initiator data set through the initiator device and sending the encrypted initiator data set to the partner device; acquiring a ciphertext correlation coefficient matrix between the initiator and the partner from the partner device, decrypting the ciphertext correlation coefficient matrix through the initiator device, and determining a plaintext correlation coefficient matrix according to a decryption result; determining an initiator correlation coefficient matrix of the initiator and a partner correlation coefficient matrix of the partner according to the initiator data set and the partner data set; performing characteristic multiple collinearity analysis on the initiator and the partner data set according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix and the plaintext correlation coefficient matrix; and determining model training data from the initiator data set and the partner data set according to the characteristic multiple collinearity analysis result, and training the federal learning model by adopting the model training data. The method solves the problems that when a regression coefficient method is used for analyzing the multiple collinearity of the characteristics of an initiator and a partner in longitudinal federated learning, multiple times of interactive iterative training are needed to calculate the fitting degree, the communication consumption and the calculation complexity are high, and the calculation result depends on the setting of the model hyperparameters. According to the scheme, the characteristic multiple collinearity of the initiator and the partner in the longitudinal federal learning is calculated based on a correlation coefficient method, parameters do not need to be adjusted in the calculation process, the data interaction times when the characteristic multiple collinearity of the initiator and the feature multiple collinearity of the interaction party are calculated are reduced, a homomorphic encryption algorithm and a correlation coefficient method are used, the training data of the federal learning model are determined through distributed calculation, the training of the federal learning model is achieved, the training efficiency of the federal learning model is improved, and the interpretability of a linear model in the federal learning model is guaranteed. The business index value of the user is predicted by adopting the federal learning model, so that the prediction precision of the business index value can be improved.
Illustratively, the data set encryption transmission module 420 is specifically configured to:
generating a key pair by adopting a homomorphic encryption algorithm through initiator equipment, and encrypting an initiator data set by adopting a public key in the key pair;
sending the encrypted initiator data set to partner equipment, calculating a ciphertext feature correlation coefficient between feature data in the initiator data set and feature data in the partner data set according to the encrypted initiator data set and the multiplication characteristic of a homomorphic encryption algorithm through the partner equipment, and determining a ciphertext correlation coefficient matrix between the initiator and the partner according to the ciphertext feature correlation coefficient;
and sending the ciphertext correlation coefficient matrix to initiator equipment so as to decrypt the ciphertext correlation coefficient matrix by the initiator equipment by adopting a private key in a key pair, and determining a plaintext correlation coefficient matrix between the characteristic data in the initiator data set and the characteristic data in the partner data set according to a decryption result.
Illustratively, the multicollinearity analysis module 440 is specifically configured to:
determining a correlation coefficient fusion matrix according to the correlation coefficient matrix of the initiator, the correlation coefficient matrix of the partner and the correlation coefficient matrix of the plaintext, and determining a complete matrix determinant value of the correlation coefficient fusion matrix;
determining a characteristic matrix determinant value after deleting a characteristic row and a characteristic column where matrix characteristic data are located from the correlation coefficient fusion matrix;
and taking the ratio of the determinant value of the characteristic matrix and the determinant value of the complete matrix as a multiple co-linearity value of the characteristic data of the matrix, integrating the multiple co-linearity values of all the characteristic data of the matrix, determining the multiple co-linearity values of the initiator and the partner, and taking the multiple co-linearity values of the initiator and the partner as a characteristic multiple co-linearity analysis result.
Illustratively, the data set determination module 410 is specifically configured to:
determining the same users of an initiator and a partner in longitudinal federal learning, and determining a data intersection between initiator service data of the initiator and partner service data of the partner based on the identity of the same users;
and respectively processing the initiator service data and the partner service data according to the data intersection to determine an initiator data set and a partner data set.
Illustratively, the correlation coefficient matrix determination module 430 is configured to:
standardizing the characteristic data in the initiator data set and the characteristic data in the partner data set to determine initiator standardized data and partner standardized data;
and determining an initiator correlation coefficient matrix of the initiator according to the correlation coefficient among the characteristic data of the initiator standardized data, and determining a partner correlation coefficient matrix of the partner according to the correlation coefficient among the characteristic data of the partner standardized data.
Illustratively, the model training module 450 is specifically configured to:
and screening the feature data in the initiator data set and the feature data in the partner data set according to the feature multiple collinearity analysis result and the multiple collinearity threshold value, determining the feature data of which the multiple collinearity values are smaller than the multiple collinearity threshold value in the initiator data set and the partner data set as model training data, and training the federal learning model through the model training data.
Exemplarily, the traffic indicator prediction apparatus further includes:
the data contribution degree determining module is used for determining the data contribution degrees of the characteristic data in the initiator data set and the characteristic data in the partner data set;
and the data filtering module is used for filtering the characteristic data in the initiator data set and the characteristic data in the partner data set according to the data contribution degree.
The service index prediction device provided by the embodiment can be applied to the service index prediction method provided by any embodiment, and has corresponding functions and beneficial effects.
EXAMPLE five
FIG. 5 illustrates a schematic diagram of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 5, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.
A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 11 performs the various methods and processes described above, such as the traffic indicator prediction method.
In some embodiments, the traffic indicator prediction method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When loaded into RAM 13 and executed by processor 11, the computer program may perform one or more of the steps of the traffic indicator prediction method described above. Alternatively, in other embodiments, the processor 11 may be configured to perform the traffic indicator prediction method in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
A computer program for implementing the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on a machine, as a stand-alone software package partly on a machine and partly on a remote machine or entirely on a remote machine or server.
In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.
The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solution of the present invention can be achieved.
The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for predicting a service index is characterized by comprising the following steps:
determining an initiator data set and a partner data set according to initiator service data of an initiator and partner service data of a partner in longitudinal federal learning;
encrypting the initiator data set through the initiator device and sending the encrypted initiator data set to the partner device; acquiring a ciphertext correlation coefficient matrix between the initiator and the partner from the partner device, decrypting the ciphertext correlation coefficient matrix through the initiator device, and determining a plaintext correlation coefficient matrix according to a decryption result;
determining an initiator correlation coefficient matrix of the initiator and a partner correlation coefficient matrix of the partner according to the initiator data set and the partner data set;
performing characteristic multiple collinearity analysis on the initiator and the partner according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix and the plaintext correlation coefficient matrix;
determining model training data from the initiator data set and the partner data set according to the characteristic multiple collinearity analysis result, and training a federated learning model by adopting the model training data; the federal learning model is used for predicting the business index value of the user.
2. The method of claim 1, wherein the initiator data set is encrypted by an initiator device and the encrypted initiator data set is sent to a partner device; and acquiring a ciphertext correlation coefficient matrix between the initiator and the partner from the partner device, decrypting the ciphertext correlation coefficient matrix by the initiator device, and determining a plaintext correlation coefficient matrix according to a decryption result, including:
generating a key pair by adopting a homomorphic encryption algorithm through initiator equipment, and encrypting the initiator data set by adopting a public key in the key pair;
sending the encrypted initiator data set to the partner device, so that ciphertext feature correlation coefficients between feature data in the initiator data set and feature data in the partner data set are calculated through the partner device according to the encrypted initiator data set and multiplication characteristics of a homomorphic encryption algorithm, and a ciphertext correlation coefficient matrix between the initiator and the partner is determined according to the ciphertext feature correlation coefficients;
and sending the ciphertext correlation coefficient matrix to the initiator device, decrypting the ciphertext correlation coefficient matrix by using a private key in the key pair through the initiator device, and determining a plaintext correlation coefficient matrix between the feature data in the initiator data set and the feature data in the partner data set according to a decryption result.
3. The method of claim 1, wherein performing a feature multicollinearity analysis on the initiator and the partner according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix, and the plaintext correlation coefficient matrix comprises:
determining a correlation coefficient fusion matrix according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix and the plaintext correlation coefficient matrix, and determining a complete matrix determinant value of the correlation coefficient fusion matrix;
determining a characteristic matrix determinant value after characteristic rows and characteristic columns where matrix characteristic data are located are deleted from the correlation coefficient fusion matrix;
and taking the ratio of the determinant value of the characteristic matrix and the determinant value of the complete matrix as a multiple co-linearity value of the characteristic data of the matrix, integrating multiple co-linearity values of all characteristic data of the matrix, determining multiple co-linearity values of the initiator and the partner, and taking the multiple co-linearity values of the initiator and the partner as a characteristic multiple co-linearity analysis result.
4. The method of claim 1, wherein determining the originator data set and the partner data set according to originator traffic data and partner traffic data of the originator in the longitudinal federal learning, comprises:
determining the same users of an initiator and a partner in longitudinal federal learning, and determining a data intersection between initiator business data of the initiator and partner business data of the partner based on the identity of the same users;
and respectively processing the initiator service data and the partner service data according to the data intersection to determine an initiator data set and a partner data set.
5. The method of claim 1, wherein determining an initiator correlation coefficient matrix for the initiator and a partner correlation coefficient matrix for the partner based on the initiator data set and the partner data set comprises:
standardizing the characteristic data in the initiator data set and the characteristic data in the partner data set to determine initiator standardized data and partner standardized data;
and determining an initiator correlation coefficient matrix of the initiator according to correlation coefficients among the characteristic data of the initiator standardized data, and determining a partner correlation coefficient matrix of the partner according to correlation coefficients among the characteristic data of the partner standardized data.
6. The method of claim 1, wherein determining model training data from the initiator dataset and the partner dataset based on the feature multiple collinearity analysis results and using the model training data to train the federated learning model comprises:
and screening the feature data in the initiator data set and the feature data in the partner data set according to a feature multiple collinearity analysis result and a multiple collinearity threshold value, determining the feature data of which the multiple collinearity values in the initiator data set and the partner data set are smaller than the multiple collinearity threshold value as model training data, and training a federal learning model through the model training data.
7. The method of claim 1, wherein after determining the originator data set and the partner data set according to the originator traffic data and the partner traffic data in the vertical federal learning, further comprising:
determining data contribution degrees of feature data in an initiator data set and feature data in a partner data set;
and according to the data contribution degree, filtering the characteristic data in the initiator data set and the characteristic data in the partner data set.
8. A traffic indicator prediction apparatus, comprising:
the data set determining module is used for determining an initiator data set and a partner data set according to initiator service data of an initiator and partner service data of a partner in longitudinal federal learning;
the data set encryption transmission module is used for encrypting the initiator data set through initiator equipment and sending the encrypted initiator data set to partner equipment; acquiring a ciphertext correlation coefficient matrix between the initiator and the partner from the partner device, decrypting the ciphertext correlation coefficient matrix through the initiator device, and determining a plaintext correlation coefficient matrix according to a decryption result;
a correlation coefficient matrix determining module, configured to determine an initiator correlation coefficient matrix of the initiator and a partner correlation coefficient matrix of the partner according to the initiator data set and the partner data set;
a multiple co-linearity analysis module, configured to perform a multiple co-linearity analysis on characteristics of the initiator and the partner according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix, and the plaintext correlation coefficient matrix;
the model training module is used for determining model training data from the initiator data set and the partner data set according to a characteristic multiple collinearity analysis result and training a federal learning model by adopting the model training data; the federal learning model is used for predicting the business index value of the user.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the traffic indicator prediction method of any one of claims 1-7.
10. A computer-readable storage medium storing computer instructions for causing a processor to implement the traffic indicator prediction method according to any one of claims 1-7 when executed.
CN202211278879.7A 2022-10-19 2022-10-19 Service index prediction method, device, equipment and storage medium Active CN115545216B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211278879.7A CN115545216B (en) 2022-10-19 2022-10-19 Service index prediction method, device, equipment and storage medium
PCT/CN2023/079369 WO2024082514A1 (en) 2022-10-19 2023-03-02 Service index prediction method and apparatus, and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211278879.7A CN115545216B (en) 2022-10-19 2022-10-19 Service index prediction method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115545216A true CN115545216A (en) 2022-12-30
CN115545216B CN115545216B (en) 2023-06-30

Family

ID=84735765

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211278879.7A Active CN115545216B (en) 2022-10-19 2022-10-19 Service index prediction method, device, equipment and storage medium

Country Status (2)

Country Link
CN (1) CN115545216B (en)
WO (1) WO2024082514A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116738196A (en) * 2023-06-19 2023-09-12 上海零数众合信息科技有限公司 Reputation evaluation method, device, equipment and storage medium
CN117252287A (en) * 2023-08-04 2023-12-19 上海零数众合信息科技有限公司 Index prediction method and system based on federal pearson correlation analysis

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021004551A1 (en) * 2019-09-26 2021-01-14 深圳前海微众银行股份有限公司 Method, apparatus, and device for optimization of vertically federated learning system, and a readable storage medium
CN112270597A (en) * 2020-11-10 2021-01-26 恒安嘉新(北京)科技股份公司 Business processing and credit evaluation model training method, device, equipment and medium
JP2021068456A (en) * 2020-10-15 2021-04-30 雅浩 白井 Calculation technique for eliminating "multicollinearity" or the like in regression analysis, and obtaining partial regression coefficient indicating contribution to proper objective variable of explanatory variable, as management material
WO2021103909A1 (en) * 2019-11-27 2021-06-03 支付宝(杭州)信息技术有限公司 Risk prediction method and apparatus, risk prediction model training method and apparatus, and electronic device
CN113095514A (en) * 2021-04-26 2021-07-09 深圳前海微众银行股份有限公司 Data processing method, device, equipment, storage medium and program product
US20210287111A1 (en) * 2020-03-12 2021-09-16 Capital One Services, Llc Aggregated feature importance for finding influential business metrics
WO2021249086A1 (en) * 2020-06-12 2021-12-16 深圳前海微众银行股份有限公司 Multi-party joint decision tree construction method, device and readable storage medium
CN114003939A (en) * 2021-11-16 2022-02-01 蓝象智联(杭州)科技有限公司 Multiple collinearity analysis method for longitudinal federal scene
CN114638274A (en) * 2020-12-15 2022-06-17 深圳前海微众银行股份有限公司 Feature selection method, device, readable storage medium and computer program product
CN114881247A (en) * 2022-06-10 2022-08-09 杭州博盾习言科技有限公司 Longitudinal federal feature derivation method, device and medium based on privacy computation
CN114936372A (en) * 2022-04-06 2022-08-23 湘潭大学 Model protection method based on three-party homomorphic encryption longitudinal federal learning
CN114996749A (en) * 2022-08-05 2022-09-02 蓝象智联(杭州)科技有限公司 Feature filtering method for federal learning

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021004551A1 (en) * 2019-09-26 2021-01-14 深圳前海微众银行股份有限公司 Method, apparatus, and device for optimization of vertically federated learning system, and a readable storage medium
WO2021103909A1 (en) * 2019-11-27 2021-06-03 支付宝(杭州)信息技术有限公司 Risk prediction method and apparatus, risk prediction model training method and apparatus, and electronic device
US20210287111A1 (en) * 2020-03-12 2021-09-16 Capital One Services, Llc Aggregated feature importance for finding influential business metrics
WO2021249086A1 (en) * 2020-06-12 2021-12-16 深圳前海微众银行股份有限公司 Multi-party joint decision tree construction method, device and readable storage medium
JP2021068456A (en) * 2020-10-15 2021-04-30 雅浩 白井 Calculation technique for eliminating "multicollinearity" or the like in regression analysis, and obtaining partial regression coefficient indicating contribution to proper objective variable of explanatory variable, as management material
CN112270597A (en) * 2020-11-10 2021-01-26 恒安嘉新(北京)科技股份公司 Business processing and credit evaluation model training method, device, equipment and medium
CN114638274A (en) * 2020-12-15 2022-06-17 深圳前海微众银行股份有限公司 Feature selection method, device, readable storage medium and computer program product
CN113095514A (en) * 2021-04-26 2021-07-09 深圳前海微众银行股份有限公司 Data processing method, device, equipment, storage medium and program product
CN114003939A (en) * 2021-11-16 2022-02-01 蓝象智联(杭州)科技有限公司 Multiple collinearity analysis method for longitudinal federal scene
CN114936372A (en) * 2022-04-06 2022-08-23 湘潭大学 Model protection method based on three-party homomorphic encryption longitudinal federal learning
CN114881247A (en) * 2022-06-10 2022-08-09 杭州博盾习言科技有限公司 Longitudinal federal feature derivation method, device and medium based on privacy computation
CN114996749A (en) * 2022-08-05 2022-09-02 蓝象智联(杭州)科技有限公司 Feature filtering method for federal learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DIXON VIMALAJEEWA 等: ""A service-based joint model used for distributed learning: Application for smart agriculture"", 《IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING》, vol. 10, no. 2, pages 838 *
FANGLAN ZHENG 等: ""A Vertical Federated Learning Method for Interpretable Scorecard and Its Application in Credit Scoring"", 《COMPUTER SCIENCE MACHINE LEARNING》, pages 1 - 12 *
李剑: ""基于联邦学习的个人信用智能评估系统的研究与实现"", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, vol. 2020, no. 1, pages 140 - 384 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116738196A (en) * 2023-06-19 2023-09-12 上海零数众合信息科技有限公司 Reputation evaluation method, device, equipment and storage medium
CN117252287A (en) * 2023-08-04 2023-12-19 上海零数众合信息科技有限公司 Index prediction method and system based on federal pearson correlation analysis

Also Published As

Publication number Publication date
CN115545216B (en) 2023-06-30
WO2024082514A1 (en) 2024-04-25

Similar Documents

Publication Publication Date Title
US10608905B2 (en) Method and system for temporal sampling in evolving network
CN115545216B (en) Service index prediction method, device, equipment and storage medium
US10872166B2 (en) Systems and methods for secure prediction using an encrypted query executed based on encrypted data
Zhao et al. A machine learning based trust evaluation framework for online social networks
CN110929799B (en) Method, electronic device, and computer-readable medium for detecting abnormal user
CN112508075A (en) Horizontal federation-based DBSCAN clustering method and related equipment thereof
CN116204773A (en) Causal feature screening method, causal feature screening device, causal feature screening equipment and storage medium
CN111949998B (en) Object detection and request method, data processing system, device and storage medium
CN110866040A (en) User portrait generation method, device and system
CN116432040B (en) Model training method, device and medium based on federal learning and electronic equipment
KR101928822B1 (en) System and method for computing a user's trust value of unknown device in IoT
CN115630078A (en) User data processing method, device, equipment and medium based on digital twin
CN114925275A (en) Product recommendation method and device, computer equipment and storage medium
CN114611143A (en) Data decryption sharing method, device, equipment and medium
CN114021642A (en) Data processing method and device, electronic equipment and storage medium
CN116738196A (en) Reputation evaluation method, device, equipment and storage medium
CN116596646A (en) Model training method based on federal learning, financial behavior prediction method and device
CN115965276A (en) Index set determination method and device, electronic equipment and storage medium
CN112966210A (en) Method and device for storing user data
CN116166506A (en) System operation data processing method, device, equipment and storage medium
CN117892877A (en) Mobile phone banking user behavior prediction method, device, equipment and medium
CN117312999A (en) Object classification method, device, computer equipment and storage medium
CN115455298A (en) Target object determination method and device, electronic equipment and storage medium
CN116228382A (en) Data processing method, device, electronic equipment and storage medium
CN114461502A (en) Model monitoring method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant