CN115545216B

CN115545216B - Service index prediction method, device, equipment and storage medium

Info

Publication number: CN115545216B
Application number: CN202211278879.7A
Authority: CN
Inventors: 孙银银
Original assignee: Shanghai Lingshuzhonghe Information Technology Co ltd
Current assignee: Shanghai Lingshuzhonghe Information Technology Co ltd
Priority date: 2022-10-19
Filing date: 2022-10-19
Publication date: 2023-06-30
Anticipated expiration: 2042-10-19
Also published as: CN115545216A

Abstract

The invention discloses a business index prediction method, which comprises the following steps: determining an initiator data set of an initiator and a partner data set of a partner in longitudinal federal learning; encrypting the initiator data set through the initiator device and sending the encrypted initiator data set to the partner device; decrypting the ciphertext correlation coefficient matrix by the initiator device to determine a plaintext correlation coefficient matrix; determining an initiator correlation coefficient matrix and a partner correlation coefficient matrix according to the initiator data set and the partner data set; performing characteristic multiple collinearity analysis on an initiator and a partner according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix and the plaintext correlation coefficient matrix; and determining model training data from the initiator data set and the partner data set according to the analysis result, and training the federal learning model by adopting the model training data. The training efficiency of the federal learning model is improved, and the feature interpretability of the linear model is ensured.

Description

Service index prediction method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the field of computers, in particular to a business index prediction method, a business index prediction device, business index prediction equipment and a storage medium.

Background

Longitudinal federation learning is often used for solving the problem that one party of data in the participants of the longitudinal federation learning is too small in dimension, and modeling targets cannot be well realized by only one party of data, so that the longitudinal federation learning is mostly used for joint modeling among different industries. When longitudinal multiple collinearity federal modeling is carried out, a data set of a task initiator and a data set of a partner have a common sample space and different feature spaces, an encryption algorithm is required to be used for guaranteeing data privacy safety of the data user and the data user, multiple collinearity of each feature and other features is required to be calculated, the feature with larger collinearity is removed, modeling efficiency and accuracy are improved, federal multiple collinearity calculation realized by a linear model method is required to be carried out, multiple interactive iterative training calculation fitting degree is required, communication consumption and calculation complexity are high, and a calculation result depends on setting of model super parameters. Therefore, how to improve the training efficiency of the federal learning model and ensure the accuracy of the federal learning model is a problem to be solved.

Disclosure of Invention

The invention provides a business index prediction method, a device, equipment and a storage medium, which can improve the calculation efficiency and calculation precision of a linear model in longitudinal federal learning and improve the prediction precision of a federal learning model, so that the prediction precision of the business index of a user can be improved when the business index of the user is predicted through the federal learning model.

According to an aspect of the present invention, there is provided a traffic index prediction method, including:

determining an initiator data set and a partner data set according to the initiator service data of the initiator and the partner service data of the partner in longitudinal federal learning;

encrypting the initiator data set through the initiator device and sending the encrypted initiator data set to the partner device; obtaining a ciphertext correlation coefficient matrix between an initiator and a partner from the partner device, decrypting the ciphertext correlation coefficient matrix through the initiator device, and determining a plaintext correlation coefficient matrix according to a decryption result;

determining an initiator correlation coefficient matrix of the initiator and a partner correlation coefficient matrix of the partner according to the initiator data set and the partner data set;

performing characteristic multiple collinearity analysis on the initiator and the partner according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix and the plaintext correlation coefficient matrix;

determining model training data from the initiator data set and the partner data set according to the characteristic multiple collinearity analysis result, and training a federal learning model by adopting the model training data; the federal learning model is used for predicting a business index value of a user.

According to another aspect of the present invention, there is provided a traffic index prediction apparatus, comprising:

the data set determining module is used for determining an initiator data set and a partner data set according to the initiator service data of the initiator and the partner service data of the partner in longitudinal federal learning;

the data set encryption transmission module is used for encrypting the initiator data set through the initiator device and sending the encrypted initiator data set to the partner device; obtaining a ciphertext correlation coefficient matrix between an initiator and a partner from the partner device, decrypting the ciphertext correlation coefficient matrix through the initiator device, and determining a plaintext correlation coefficient matrix according to a decryption result;

a correlation coefficient matrix determining module, configured to determine an initiator correlation coefficient matrix of the initiator and a partner correlation coefficient matrix of the partner according to the initiator data set and the partner data set;

the multi-collinearity analysis module is used for carrying out characteristic multi-collinearity analysis on the initiator and the partner according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix and the plaintext correlation coefficient matrix;

The model training module is used for determining model training data from the initiator data set and the partner data set according to the characteristic multiple collinearity analysis result and training a federal learning model by adopting the model training data; the federal learning model is used for predicting a business index value of a user.

According to another aspect of the present invention, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the traffic indicator prediction method according to any one of the embodiments of the present invention.

According to another aspect of the present invention, there is provided a computer readable storage medium storing computer instructions for causing a processor to implement the traffic index prediction method according to any one of the embodiments of the present invention when executed.

According to the technical scheme of the embodiment of the invention, the initiator data set and the partner data set are determined according to the initiator service data of the initiator and the partner service data of the partner in longitudinal federal learning; encrypting the initiator data set through the initiator device and sending the encrypted initiator data set to the partner device; obtaining a ciphertext correlation coefficient matrix between the initiator and the partner from the partner device, decrypting the ciphertext correlation coefficient matrix through the initiator device, and determining a plaintext correlation coefficient matrix according to a decryption result; determining an initiator correlation coefficient matrix of the initiator and a partner correlation coefficient matrix of the partner according to the initiator data set and the partner data set; performing characteristic multiple collinearity analysis on the initiator and the partner data sets according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix and the plaintext correlation coefficient matrix; and determining model training data from the initiator data set and the partner data set according to the characteristic multiple collinearity analysis result, and training a federal learning model by adopting the model training data. The method solves the problems that when the regression coefficient method is used for analyzing the characteristic multiple collinearity of an initiator and a partner in longitudinal federal learning, multiple interactive iterative training is needed to calculate the fitting degree, the communication consumption and the calculation complexity are high, and the calculation result depends on the setting of the model super-parameters. According to the scheme, the characteristic multiple collinearity of the initiator and the partner in longitudinal federal learning is calculated based on the correlation coefficient method, parameter adjustment is not needed in the calculation process, the data interaction times when the characteristic multiple collinearity of the initiator and the characteristic multiple collinearity of the interaction party are calculated are reduced, the homomorphic encryption algorithm and the correlation coefficient method are used, the training data of the federal learning model is determined through distributed calculation, the federal learning model is trained, the training efficiency of the federal learning model is improved, and meanwhile the interpretability of the linear model in the federal learning model is guaranteed. The business index value of the user is predicted by adopting the federal learning model, so that the prediction accuracy of the business index value can be improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a business index prediction method according to a first embodiment of the present invention;

fig. 2 is a flowchart of a business index prediction method according to a second embodiment of the present invention;

fig. 3 is a flowchart of a business index prediction method according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of a traffic index prediction device according to a fourth embodiment of the present invention;

fig. 5 is a schematic structural diagram of an electronic device according to a fifth embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the term "object" and the like in the description of the present invention and the claims and the above drawings are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "includes," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

Fig. 1 is a flowchart of a business index prediction method according to an embodiment of the present invention, where the embodiment is applicable to training a federal learning model for predicting business indexes according to initiator business data and partner business data of an initiator in longitudinal federal learning. The method is particularly suitable for carrying out multiple collinearity analysis on the initiator service data and the partner service data of the initiator in longitudinal federal learning through a correlation coefficient method, and determining training data of a federal learning model according to multiple collinearity analysis results so as to train the condition of the federal learning model for predicting service indexes through the training data. The method may be performed by a traffic indicator prediction device, which may be implemented in hardware and/or software, which may be configured in an electronic device. As shown in fig. 1, the method includes:

s110, determining an initiator data set and a partner data set according to the initiator service data of the initiator and the partner service data of the partner in longitudinal federal learning.

The longitudinal federation learning is generally applicable to federation learning scenes formed by participants with the same sample space and different feature spaces on a data set. A federal learning model can be trained cooperatively for different participants through longitudinal federal learning. In this embodiment, the participants are sponsors and partners in vertical federal learning. The initiator and the partner may be enterprises of different industries having partner needs. The initiator service data refers to sample data which can represent service indexes of a user at the initiator and is obtained by the initiator under the condition of user permission; the partner service data refers to sample data which can represent the service index of the user at the partner and is obtained by the partner under the condition of user permission. Business metrics refer to metrics that measure the behavior of a user in some way, for example, business metrics may be credit metrics or performance metrics of the user at the initiator and the partner. An initiator data set refers to a data set that contains characteristic data of the initiator. A partner data set refers to a data set that contains characteristic data of a partner.

Specifically, under the condition of user permission corresponding to the initiator, initiator service data corresponding to the initiator in longitudinal federal learning is acquired, feature extraction is performed on the initiator service data, feature data of the initiator service data are determined, and an initiator data set is determined according to the feature data of the initiator service data. Under the condition of user permission corresponding to the partner, the partner service data corresponding to the partner in longitudinal federal learning is acquired, the characteristic extraction is carried out on the partner service data, the characteristic data of the partner service data are determined, and a partner data set is determined according to the characteristic data of the partner service data.

By way of example, the method of determining the initiator data set and the partner data set may be: determining the same user of an initiator and a partner in longitudinal federal learning, and determining a data intersection between the initiator service data of the initiator and the partner service data of the partner based on the identity of the same user; and respectively processing the initiator service data and the partner service data according to the data intersection set to determine an initiator data set and a partner data set.

The identity mark refers to data which can represent the identity of a user, and the identity mark can comprise an identity card number or a mobile phone number.

Specifically, the same user may exist between the initiator and the partner of the longitudinal federal learning, so that when the federal learning model is trained, the same user of the initiator and the partner in the longitudinal federal learning needs to be determined, and under the condition of user permission, the identity of the same user is obtained. Based on the identity of the same user, a data intersection between the initiator service data of the initiator and the partner service data of the partner is determined. Integrating the data intersection with characteristic data of the service data of the initiator to determine the data set of the initiator; and integrating the data intersection with the characteristic data of the business data of the partner to determine the data set of the partner.

It can be appreciated that, according to the same user of the initiator and the partner in the longitudinal federal learning, a data intersection between the initiator service data and the partner service data of the partner is determined, and according to the data intersection and the initiator service data, an initiator data set is determined; and determining the partner data set according to the data intersection set and the partner service data, so that the intuitiveness of the association relationship between the initiator data set and the partner data set can be improved, and the characteristic multiple collinearity of the initiator and the partner can be conveniently calculated according to the initiator data set and the partner data set.

S120, encrypting the initiator data set through the initiator device and sending the encrypted initiator data set to the partner device; and acquiring a ciphertext correlation coefficient matrix between the initiator and the partner from the partner device, decrypting the ciphertext correlation coefficient matrix through the initiator device, and determining a plaintext correlation coefficient matrix according to a decryption result.

The ciphertext correlation coefficient matrix is specifically a ciphertext correlation coefficient matrix between the characteristic data in the initiator data set and the characteristic data in the partner data set. The plaintext correlation coefficient matrix is specifically a plaintext correlation coefficient matrix between the characteristic data in the initiator data set and the characteristic data in the partner data set. The initiator device refers to a terminal device corresponding to the initiator, and the partner device refers to a terminal device corresponding to the partner. The correlation coefficient is a statistical index for reflecting the degree of closeness of the correlation between the variables.

Illustratively, encrypting the initiator data set and decrypting the ciphertext correlation coefficient matrix may be accomplished by:

s1201, generating a key pair by the initiator device through a homomorphic encryption algorithm, and encrypting the initiator data set through a public key in the key pair.

The homomorphic encryption algorithm is an encryption algorithm meeting homomorphic operation property of ciphertext, namely, after data is homomorphic encrypted, specific calculation is carried out on the ciphertext, and plaintext obtained by corresponding homomorphic decryption of ciphertext calculation results is equivalent to directly carrying out the same calculation on plaintext data, so that "computable invisibility" of the data is realized.

Specifically, a secret key pair is generated by the initiator device through a homomorphic encryption algorithm, and the secret key pair comprises a public key and a private key. The initiator device encrypts the initiator data set using the public key of the key pair.

S1202, the encrypted initiator data set is sent to the partner device, so that a ciphertext feature correlation coefficient between feature data in the initiator data set and feature data in the partner data set is calculated according to the encrypted initiator data set and multiplication characteristics of a homomorphic encryption algorithm by the partner device, and a ciphertext correlation coefficient matrix between the initiator and the partner is determined according to the ciphertext feature correlation coefficient.

The ciphertext feature correlation coefficient is a ciphertext correlation coefficient obtained by ciphertext calculation between feature data in the partner data set and feature data in the encrypted initiator data set.

Specifically, the encrypted initiator data set is sent to the partner device through the initiator device, so that the partner device calculates ciphertext feature correlation coefficients between each feature data in the encrypted initiator data set and each feature data in the partner data set according to multiplication characteristics of homomorphic encryption algorithms and the encrypted initiator data set, integrates calculation results, and determines ciphertext correlation coefficient matrixes between the feature data of the initiator and the feature data of the partner.

And S1203, sending the ciphertext-related coefficient matrix to the initiator device, decrypting the ciphertext-related coefficient matrix by the initiator device by adopting a private key in the key pair, and determining a plaintext-related coefficient matrix between the characteristic data in the initiator data set and the characteristic data in the partner data set according to a decryption result.

Specifically, the ciphertext correlation coefficient matrix is sent to the initiator device by the partner device. Decrypting the ciphertext correlation coefficient matrix by the initiator device by adopting a private key in the key pair, and determining a plaintext correlation coefficient matrix between the characteristic data in the initiator data set and the characteristic data in the partner data set according to a decryption result.

It can be understood that the above scheme converts the problem of calculating the characteristic correlation coefficients of the initiator and the partner in the longitudinal federal learning into the use of homomorphic encryption algorithm to calculate the ciphertext correlation coefficient matrix between the initiator and the partner, thereby reducing the data interaction times of the regression coefficient method in calculating the characteristic multiple collinearity of the initiator and the partner, simplifying the calculation flow and improving the calculation efficiency.

S130, determining an initiator correlation coefficient matrix of the initiator and a partner correlation coefficient matrix of the partner according to the initiator data set and the partner data set.

The initiator correlation coefficient matrix refers to a matrix formed by correlation coefficients among characteristic data in an initiator data set. The partner correlation coefficient matrix is a matrix formed by correlation coefficients among characteristic data in the partner data set.

Specifically, the characteristic data in the initiator data set, namely the characteristic data of the initiator, are determined, the correlation coefficient among the characteristic data of the initiator is calculated, and the initiator correlation coefficient matrix of the initiator is determined according to the correlation coefficient among the characteristic data of the initiator. And determining characteristic data in the partner data set, namely, the characteristic data of the partner, calculating correlation coefficients among the characteristic data of the partner, and determining a partner correlation coefficient matrix of the partner according to the correlation coefficients among the characteristic data of the partner.

For example, an initiator correlation coefficient matrix for an initiator, and a partner correlation coefficient matrix for a partner, may be determined according to the following substeps:

s1301, carrying out standardization processing on the characteristic data in the initiator data set and the characteristic data in the partner data set, and determining the initiator standardization data and the partner standardization data.

The normalization processing is to convert the original data according to a certain proportion by a certain mathematical transformation mode, so that the original data falls into a small specific interval, wherein the specific interval can be [0,1] or [ -1,1], so as to eliminate the difference of characteristic properties such as properties, dimensions and orders of magnitude among different variables, and convert the characteristic properties into a dimensionless relative value, namely a normalization value, so that the values of all indexes are in the same number level, and the indexes of different units or orders of magnitude can be comprehensively analyzed and compared.

Specifically, the feature data in the initiator data set and the feature data in the partner data set may be subjected to a normalization process by a Z-score normalization method, so as to determine the initiator normalization data and the partner normalization data.

For example, the method for acquiring the standardized data of the initiator may be: determining characteristic data in an initiator data set and the number of samples corresponding to the characteristic data; calculating the average value of the characteristic data in the initiator data set, determining the characteristic value variance according to the average value of the characteristic data in the initiator data set and the number of samples of the manager, and determining the initiator standardized data according to the characteristic value variance, the characteristic data in the initiator data set and the average value of the characteristic data in the initiator data set.

S1302, determining an initiator correlation coefficient matrix of the initiator according to the correlation coefficient among the characteristic data of the initiator standardized data, and determining a partner correlation coefficient matrix of the partner according to the correlation coefficient among the characteristic data of the partner standardized data.

Wherein, each characteristic data of the initiator standardized data refers to the standardized result of the characteristic data in the initiator data set. Each feature data of the partner standardization data refers to a standardization result of feature data in the partner data set.

Specifically, calculating and integrating correlation coefficients among characteristic data of the standardized data of the initiator, and determining an initiator correlation coefficient matrix of the initiator. And calculating and integrating correlation coefficients among the characteristic data of the standardized data of the partner, and determining a correlation coefficient matrix among the characteristics of the partner.

Before calculating the correlation coefficient matrix of the initiator and the correlation coefficient matrix of the partner, the normalization processing is carried out on the initiator data set and the partner data set, so that the problem of calculating the correlation coefficient matrix of the characteristic data of the initiator and the characteristic data of the partner can be converted into the problem of homomorphic multiplication of the characteristic data of the initiator and the characteristic data of the partner, and the data privacy safety of the initiator and the partner is ensured.

And S140, performing characteristic multiple collinearity analysis on the initiator and the partner according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix and the plaintext correlation coefficient matrix.

Specifically, according to the correlation coefficient matrix of the initiator, the correlation coefficient matrix of the partner and the plaintext correlation coefficient matrix between the characteristic data of the initiator and the characteristic data of the partner, performing characteristic multiple collinearity analysis on all the characteristics of the initiator and the partner to obtain multiple collinearity values of the characteristic data of the initiator and multiple collinearity values of the characteristic data of the partner. In federal learning, which is a multiple collinearity value of each feature data of the initiator, the federal multiple collinearity value of each feature data of the initiator. In federal learning, which is a multiple colinearity value of each feature data of the partner, the federal multiple colinearity value of each feature data of the partner is calculated.

And S150, determining model training data from the initiator data set and the partner data set according to the characteristic multiple collinearity analysis result, and training a federal learning model by adopting the model training data.

The federal learning model is used for predicting the business index value of the user.

Specifically, according to the characteristic multiple collinearity analysis result, multiple collinearity values of the characteristic data of the initiator and multiple collinearity values of the characteristic data of the partner are determined. And determining model training data from the characteristic data in the initiator data set and the characteristic data in the partner data set according to the multiple colinear values of the characteristic data of the initiator, the multiple colinear values of the characteristic data of the partner and the preset model training data screening conditions, and training a federal learning model by adopting the model training data. And taking the business data of the user needing to conduct business index value prediction as the input data of the trained federal learning model, and determining the prediction result of the business index value of the user according to the output data of the trained federal learning model.

For example, the feature data in the initiator data set and the feature data in the partner data set may be screened according to the feature multiple collinearity analysis result and the multiple collinearity threshold, and the feature data in the initiator data set and the partner data set, where the multiple collinearity value is smaller than the multiple collinearity threshold, may be determined to be model training data, and the federal learning model may be trained by the model training data.

It can be appreciated that the feature data with multiple collinearity values smaller than the multiple collinearity threshold values are screened from the initiator data set and the partner data set and used as the model training data of the federal learning model, so that the model training efficiency and the reliability of the model can be improved.

According to the technical scheme provided by the embodiment, an initiator data set and a partner data set are determined according to the initiator service data of the initiator and the partner service data of the partner in longitudinal federal learning; encrypting the initiator data set through the initiator device and sending the encrypted initiator data set to the partner device; obtaining a ciphertext correlation coefficient matrix between the initiator and the partner from the partner device, decrypting the ciphertext correlation coefficient matrix through the initiator device, and determining a plaintext correlation coefficient matrix according to a decryption result; determining an initiator correlation coefficient matrix of the initiator and a partner correlation coefficient matrix of the partner according to the initiator data set and the partner data set; performing characteristic multiple collinearity analysis on the initiator and the partner data sets according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix and the plaintext correlation coefficient matrix; and determining model training data from the initiator data set and the partner data set according to the characteristic multiple collinearity analysis result, and training a federal learning model by adopting the model training data. The method solves the problems that when the regression coefficient method is used for analyzing the characteristic multiple collinearity of an initiator and a partner in longitudinal federal learning, multiple interactive iterative training is needed to calculate the fitting degree, the communication consumption and the calculation complexity are high, and the calculation result depends on the setting of the model super-parameters. According to the scheme, the characteristic multiple collinearity of the initiator and the partner in longitudinal federal learning is calculated based on the correlation coefficient method, parameter adjustment is not needed in the calculation process, the data interaction times when the characteristic multiple collinearity of the initiator and the characteristic multiple collinearity of the interaction party are calculated are reduced, the homomorphic encryption algorithm and the correlation coefficient method are used, the training data of the federal learning model is determined through distributed calculation, the federal learning model is trained, the training efficiency of the federal learning model is improved, and meanwhile the interpretability of the linear model in the federal learning model is guaranteed. The business index value of the user is predicted by adopting the federal learning model, so that the prediction accuracy of the business index value can be improved.

Example two

Fig. 2 is a flowchart of a business index prediction method provided by a second embodiment of the present invention, where the present embodiment is optimized based on the foregoing embodiment, and a preferred implementation manner of performing feature multiple co-linearity analysis on an initiator and a partner according to an initiator correlation coefficient matrix, a partner correlation coefficient matrix, and a plaintext correlation coefficient matrix is provided. Specifically, as shown in fig. 2, the method includes:

s210, determining an initiator data set and a partner data set according to the initiator service data of the initiator and the partner service data of the partner in longitudinal federal learning.

S220, encrypting the initiator data set through the initiator device and sending the encrypted initiator data set to the partner device; and acquiring a ciphertext correlation coefficient matrix between the initiator and the partner from the partner device, decrypting the ciphertext correlation coefficient matrix through the initiator device, and determining a plaintext correlation coefficient matrix according to a decryption result.

S230, determining an initiator correlation coefficient matrix of the initiator and a partner correlation coefficient matrix of the partner according to the initiator data set and the partner data set.

S240, determining a correlation coefficient fusion matrix according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix and the plaintext correlation coefficient matrix, and determining a complete matrix determinant value of the correlation coefficient fusion matrix.

The correlation coefficient fusion matrix is a matrix obtained by combining an initiator correlation coefficient matrix, a partner correlation coefficient matrix and a plaintext correlation coefficient matrix.

Specifically, an initiator correlation coefficient matrix, a partner correlation coefficient matrix and a plaintext correlation coefficient matrix are combined together to obtain a correlation coefficient fusion matrix. And calculating determinant values of the correlation coefficient fusion matrix, wherein the determinant values of the correlation coefficient fusion matrix are used as complete matrix determinant values of the correlation coefficient fusion matrix.

S250, determining a characteristic matrix determinant value after deleting characteristic rows and characteristic columns of matrix characteristic data from the correlation coefficient fusion matrix.

The matrix characteristic data refers to characteristic data contained in a correlation coefficient fusion matrix. The characteristic row refers to the row where the matrix characteristic data is located in the correlation coefficient fusion matrix. The characteristic columns refer to columns where matrix characteristic data are located in the correlation coefficient fusion matrix.

Specifically, each matrix characteristic data forming the correlation coefficient fusion matrix is determined, the characteristic row and the characteristic column where each matrix data is located are deleted from the correlation coefficient fusion matrix, the characteristic matrix corresponding to each matrix characteristic data is obtained, and the characteristic matrix determinant value corresponding to each matrix characteristic data is determined.

And S260, taking the ratio between the determinant value of the feature matrix and the determinant value of the complete matrix as the multiple collinearity value of the matrix feature data, integrating the multiple collinearity values of all the matrix feature data, determining the multiple collinearity values of the initiator and the partner, and taking the multiple collinearity values of the initiator and the partner as the feature multiple collinearity analysis result.

Specifically, the ratio between the determinant value of the feature matrix and the determinant value of the complete matrix is used as the multiple collinearity value of the matrix feature data corresponding to the determinant of the feature matrix, and the multiple collinearity values corresponding to all the matrix feature data are integrated to obtain the multiple collinearity value of each feature data of the initiator and the multiple collinearity value of each feature data of the partner. And taking the multiple collinearity values of the characteristic data of the initiator and the multiple collinearity values of the characteristic data of the partner as characteristic multiple collinearity analysis results.

S270, determining model training data from the initiator data set and the partner data set according to the characteristic multiple collinearity analysis result, and training a federal learning model by adopting the model training data.

The technical scheme of the embodiment provides a calculation method for performing characteristic multiple collinearity analysis on an initiator and a partner to determine multiple collinearity values of the initiator and the partner. According to the scheme, the correlation coefficient fusion matrix is determined according to the correlation coefficient matrix of the initiator, the correlation coefficient matrix of the partner and the plaintext correlation coefficient matrix, so that the characteristic matrix determinant value corresponding to each characteristic data in the correlation coefficient fusion matrix is calculated according to the correlation coefficient fusion matrix. According to the characteristic matrix determinant value corresponding to each characteristic data and the complete matrix determinant value of the correlation coefficient fusion matrix, the accurate multiple collinearity value of each characteristic data can be obtained, so that the accuracy of the characteristic multiple collinearity analysis results of the initiator and the partner is improved.

Example III

Fig. 3 is a flowchart of a business index prediction method provided in a third embodiment of the present invention, where the present embodiment is optimized based on the foregoing embodiment, and provides a preferred implementation manner of filtering feature data in an initiator data set and feature data in a partner data set according to a data contribution of the feature data in the initiator data set and the feature data in the partner data set. Specifically, as shown in fig. 3, the method includes:

S310, determining an initiator data set and a partner data set according to the initiator service data of the initiator and the partner service data of the partner in longitudinal federal learning.

S320, determining the data contribution degree of the characteristic data in the initiator data set and the characteristic data in the partner data set.

Wherein the data contribution degree may be determined by IV (Infromation Value, information value) values of the data. The characteristic data is used to represent the contribution degree of the characteristic data to the target prediction, namely the prediction capability of the characteristic, and in general, the higher the IV value is, the stronger the prediction capability of the characteristic is, and the higher the information contribution degree is.

Specifically, the IV values of the feature data in the initiator data set and the feature data in the partner data set can be calculated by performing WOE (Weight of Evidence, evidence weight) weighted summation on the feature data in the initiator data set and the feature data in the partner data set, and the IV values are used as the data contribution degree.

For example, according to the data contribution degree, WOE calculation is performed on the feature data in the initiator data set and the feature data in the partner data set, the IV value of each feature data is calculated, the calculation result of WOE is subjected to feature collinearity analysis, feature collinearity screening is performed on the data set through the IV value and the feature collinearity threshold, and the contribution degree of the feature data before feature collinearity screening and the contribution degree of the feature data after screening are compared, so that the contribution degree of the feature data is changed from the negative number before screening to the positive number after screening, and therefore, the feature collinearity analysis has an influence on the interpretation of the model.

And S330, filtering the characteristic data in the initiator data set and the characteristic data in the partner data set according to the data contribution degree.

Specifically, a contribution threshold is set according to actual needs, feature data in the initiator data set and feature data in the partner data set are filtered according to the contribution threshold and the data contribution degree, and feature data with the data contribution degree smaller than the contribution threshold is filtered and deleted from the feature data in the initiator data set and the feature data in the partner data set. For example, the contribution threshold may be 0.1.

It should be noted that, all the features of the initiator and the partner, which are not subjected to the filtering process, are subjected to model training, and the feature data in all the initiator data set and the feature data in the partner data set have weights. The characteristic data in the initiator data set and the characteristic data in the partner data set are filtered, so that model training is carried out according to the characteristic data with higher contribution rate in the initiator data set and the partner data set, the characteristic data obtained after filtering also has corresponding weight in the model training process, the weights of the same characteristic in the two training processes are compared, some characteristic data are found, the weight before the data filtering is negative, the weight after the data filtering is positive, and the influence of multiple collinearity analysis on the interpretability of the federal learning model is illustrated. The characteristic data with smaller IV value in the initiator data set and the partner data set is filtered through federal multiple collinearity analysis and screening characteristics, so that the model training speed can be improved.

S340, encrypting the initiator data set through the initiator device and sending the encrypted initiator data set to the partner device; and acquiring a ciphertext correlation coefficient matrix between the initiator and the partner from the partner device, decrypting the ciphertext correlation coefficient matrix through the initiator device, and determining a plaintext correlation coefficient matrix according to a decryption result.

In this step, the feature data in the initiator data set and the feature data in the partner data set are the feature data in the initiator data set and the feature data in the partner data set after the filtering process, respectively.

S350, determining an initiator correlation coefficient matrix of the initiator and a partner correlation coefficient matrix of the partner according to the initiator data set and the partner data set.

S360, performing characteristic multiple collinearity analysis on the initiator and the partner according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix and the plaintext correlation coefficient matrix.

And S370, determining model training data from the initiator data set and the partner data set according to the characteristic multiple collinearity analysis result, and training a federal learning model by adopting the model training data.

According to the technical scheme of the embodiment, before model training data is determined according to the initiator data set and the partner data set and a federal learning model is trained by adopting the model training data, filtering processing is carried out on the characteristic data in the initiator data set and the characteristic data in the partner data set according to the contribution degree of the characteristic data in the initiator data set and the contribution degree of the characteristic data in the partner data set so as to obtain the training data of the federal learning model. By the scheme, the reliability of the federal learning model can be ensured, and meanwhile, the model training speed can be improved.

Example IV

Fig. 4 is a schematic structural diagram of a traffic index prediction device according to a fourth embodiment of the present invention. The embodiment can be applied to the situation that the federal learning model for predicting the business index is trained according to the sponsor business data and the partner business data of the sponsor in the longitudinal federal learning. As shown in fig. 4, the traffic index prediction apparatus includes: a data set determination module 410, a data set encryption transmission module 420, a correlation coefficient matrix determination module 430, a multiple co-linearity analysis module 440, and a model training module.

Wherein, the data set determining module 410 is configured to determine an initiator data set and a partner data set according to initiator service data of an initiator and partner service data of a partner in longitudinal federal learning;

the data set encryption transmission module 420 is configured to encrypt an initiator data set by an initiator device, and send the encrypted initiator data set to a partner device; obtaining a ciphertext correlation coefficient matrix between the initiator and the partner from the partner device, decrypting the ciphertext correlation coefficient matrix through the initiator device, and determining a plaintext correlation coefficient matrix according to a decryption result;

a correlation coefficient matrix determining module 430, configured to determine an initiator correlation coefficient matrix of the initiator and a partner correlation coefficient matrix of the partner according to the initiator data set and the partner data set;

the multiple collinearity analysis module 440 is configured to perform characteristic multiple collinearity analysis on the initiator and the partner according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix, and the plaintext correlation coefficient matrix;

the model training module 450 is configured to determine model training data from the initiator data set and the partner data set according to the feature multiple collinearity analysis result, and train the federal learning model using the model training data; the federal learning model is used to predict business index values for users.

Illustratively, the data set encryption transmission module 420 is specifically configured to:

generating a key pair by the initiator device by adopting a homomorphic encryption algorithm, and encrypting an initiator data set by adopting a public key in the key pair;

sending the encrypted initiator data set to the partner device, so as to calculate ciphertext feature correlation coefficients between feature data in the initiator data set and feature data in the partner data set according to the encrypted initiator data set and multiplication characteristics of a homomorphic encryption algorithm by the partner device, and determining a ciphertext correlation coefficient matrix between the initiator and the partner according to the ciphertext feature correlation coefficients;

and sending the ciphertext correlation coefficient matrix to the initiator device, decrypting the ciphertext correlation coefficient matrix by the initiator device by adopting a private key in the key pair, and determining a plaintext correlation coefficient matrix between the characteristic data in the initiator data set and the characteristic data in the partner data set according to a decryption result.

Illustratively, the multiple collinearity analysis module 440 is specifically configured to:

determining a correlation coefficient fusion matrix according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix and the plaintext correlation coefficient matrix, and determining a complete matrix determinant value of the correlation coefficient fusion matrix;

Determining a characteristic matrix determinant value after deleting characteristic rows and characteristic columns of matrix characteristic data from a correlation coefficient fusion matrix;

and taking the ratio between the determinant value of the characteristic matrix and the determinant value of the complete matrix as the multiple collinearity value of the characteristic data of the matrix, integrating the multiple collinearity values of all the characteristic data of the matrix, determining the multiple collinearity values of the initiator and the partner, and taking the multiple collinearity values of the initiator and the partner as the characteristic multiple collinearity analysis result.

Illustratively, the data set determination module 410 is specifically configured to:

determining the same user of an initiator and a partner in longitudinal federal learning, and determining a data intersection between the initiator service data of the initiator and the partner service data of the partner based on the identity of the same user;

and respectively processing the initiator service data and the partner service data according to the data intersection set to determine an initiator data set and a partner data set.

Illustratively, the correlation coefficient matrix determination module 430 is configured to:

carrying out standardization processing on the characteristic data in the initiator data set and the characteristic data in the partner data set, and determining initiator standardization data and partner standardization data;

And determining an initiator correlation coefficient matrix of the initiator according to the correlation coefficient among the characteristic data of the initiator standardized data, and determining a partner correlation coefficient matrix of the partner according to the correlation coefficient among the characteristic data of the partner standardized data.

Exemplary, model training module 450 is specifically configured to:

and screening the characteristic data in the initiator data set and the characteristic data in the partner data set according to the characteristic multiple collinearity analysis result and the multiple collinearity threshold value, determining the characteristic data with the multiple collinearity value smaller than the multiple collinearity threshold value in the initiator data set and the partner data set as model training data, and training the federal learning model through the model training data.

The traffic index prediction device further includes:

the data contribution degree determining module is used for determining the data contribution degree of the characteristic data in the initiator data set and the characteristic data in the partner data set;

and the data filtering module is used for filtering the characteristic data in the initiator data set and the characteristic data in the partner data set according to the data contribution degree.

The service index prediction device provided by the embodiment is applicable to the service index prediction method provided by any embodiment, and has corresponding functions and beneficial effects.

Example five

Fig. 5 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 5, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the business index prediction method.

In some embodiments, the business index prediction method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the business index prediction method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the traffic index prediction method in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A business index prediction method, comprising:

determining model training data from the initiator data set and the partner data set according to the characteristic multiple collinearity analysis result, and training a federal learning model by adopting the model training data; the federal learning model is used for predicting a business index value of a user;

and performing feature multiple co-linearity analysis on the initiator and the partner according to the initiator correlation coefficient matrix, the partner correlation coefficient matrix and the plaintext correlation coefficient matrix, wherein the feature multiple co-linearity analysis comprises the following steps:

determining characteristic matrix determinant values after deleting characteristic rows and characteristic columns of matrix characteristic data from the correlation coefficient fusion matrix;

And taking the ratio between the characteristic matrix determinant value and the complete matrix determinant value as the multiple collinearity value of the matrix characteristic data, integrating the multiple collinearity values of all matrix characteristic data, determining the multiple collinearity values of the initiator and the partner, and taking the multiple collinearity values of the initiator and the partner as characteristic multiple collinearity analysis results.

2. The method of claim 1, wherein the initiator data set is encrypted by an initiator device and the encrypted initiator data set is sent to a partner device; and obtaining a ciphertext correlation coefficient matrix between the initiator and the partner from the partner device, decrypting the ciphertext correlation coefficient matrix by the initiator device, and determining a plaintext correlation coefficient matrix according to a decryption result, including:

generating a key pair by using homomorphic encryption algorithm through initiator equipment, and encrypting the initiator data set by using a public key in the key pair;

3. The method of claim 1, wherein determining the initiator data set and the partner data set from the initiator traffic data of the initiator and the partner traffic data of the partner in longitudinal federal learning comprises:

4. The method of claim 1, wherein determining an initiator correlation coefficient matrix for the initiator and a partner correlation coefficient matrix for the partner from the initiator data set and the partner data set comprises:

5. The method of claim 1, wherein determining model training data from the initiator data set and the partner data set based on the results of the feature multiple co-linearity analysis, and training a federal learning model using the model training data, comprises:

and screening the characteristic data in the initiator data set and the characteristic data in the partner data set according to the characteristic multiple collinearity analysis result and the multiple collinearity threshold value, determining the characteristic data with the multiple collinearity value smaller than the multiple collinearity threshold value in the initiator data set and the partner data set as model training data, and training a federal learning model through the model training data.

6. The method of claim 1, wherein after determining the initiator data set and the partner data set from the initiator traffic data of the initiator and the partner traffic data of the partner in longitudinal federal learning, further comprising:

determining characteristic data in an initiator data set and data contribution degree of the characteristic data in a partner data set;

and filtering the characteristic data in the initiator data set and the characteristic data in the partner data set according to the data contribution degree.

7. A traffic index prediction apparatus, comprising:

the model training module is used for determining model training data from the initiator data set and the partner data set according to the characteristic multiple collinearity analysis result and training a federal learning model by adopting the model training data; the federal learning model is used for predicting a business index value of a user;

the multiple collinearity analysis module is specifically used for:

8. An electronic device, the electronic device comprising:

at least one processor; and

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the business metric prediction method of any one of claims 1-6.

9. A computer readable storage medium storing computer instructions for causing a processor to implement the business index prediction method of any one of claims 1-6 when executed.