CN110210233B - Combined construction method and device of prediction model, storage medium and computer equipment - Google Patents
Combined construction method and device of prediction model, storage medium and computer equipment Download PDFInfo
- Publication number
- CN110210233B CN110210233B CN201910319424.7A CN201910319424A CN110210233B CN 110210233 B CN110210233 B CN 110210233B CN 201910319424 A CN201910319424 A CN 201910319424A CN 110210233 B CN110210233 B CN 110210233B
- Authority
- CN
- China
- Prior art keywords
- data
- enterprise
- model
- encryption
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000010276 construction Methods 0.000 title claims abstract description 30
- 238000000034 method Methods 0.000 claims abstract description 33
- 239000013598 vector Substances 0.000 claims description 30
- 238000003066 decision tree Methods 0.000 claims description 28
- 238000007477 logistic regression Methods 0.000 claims description 25
- 238000004422 calculation algorithm Methods 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 8
- 230000006870 function Effects 0.000 description 19
- 238000012417 linear regression Methods 0.000 description 10
- 238000007476 Maximum Likelihood Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a combined construction method, a device, a storage medium and computer equipment of a prediction model, which relate to the technical field of information and are mainly characterized in that the method, the device, the storage medium and the computer equipment can avoid the hooking of a third party and a data provider, reveal data of other data providers and ensure the safety of the data while the combined modeling of enterprises is realized. The method comprises the following steps: acquiring sample feature data of each enterprise and class labels corresponding to the sample feature data; constructing an encryption model of each enterprise according to the sample characteristic data and the class labels; respectively inputting the sample characteristic data of each enterprise into a corresponding encryption model for encryption to obtain encryption data of each enterprise; and constructing a prediction model according to the encrypted data of each enterprise and the corresponding class labels thereof. The method is suitable for joint construction of the prediction model.
Description
Technical Field
The present invention relates to the field of information technologies, and in particular, to a method and apparatus for jointly constructing a prediction model, a storage medium, and a computer device.
Background
The prediction model in the field of financial intelligent recommendation plays a key role in decision making, product recommendation and the like, in order to obtain the prediction model with higher prediction precision, enterprises are usually subjected to joint modeling, particularly when the phenomenon of current analysis is very complex and a large amount of data is required to train, real data cannot be shared among enterprises during joint modeling, the enterprises generally encrypt own data before sharing the data, so that the privacy of the enterprise data is ensured, and then the prediction model is constructed according to the encrypted data shared by each enterprise.
At present, the commonly used prediction models are a linear regression model and a logistic regression model, and for the data encryption modes of the linear regression model and the logistic regression model, corresponding random numbers or public keys are generally required to be provided for all enterprises in a third direction, and all enterprises encrypt own data through the random numbers or the public keys provided by the third party and then share the data with other enterprises. However, the data encryption process for the linear regression model and the logistic regression model both require the existence of a third party, and require the third party to be sufficiently trustworthy, otherwise, the third party leaks the random number provided for a certain enterprise to other enterprises, and the other enterprises can obtain the data of the enterprise after pushing back, so that the data in the enterprise is leaked, and in addition, the current encryption mode is determined according to the selected prediction model, and both the above two prediction models only involve addition and multiplication, so that the corresponding encryption mode is not applicable to all the prediction models.
Disclosure of Invention
The invention provides a combined construction method, a device, a storage medium and computer equipment of a prediction model, which mainly can avoid the hooking of a third party and a data provider, leak data of other data providers, and ensure the safety of the data while the combined modeling of enterprises is realized.
According to a first aspect of the present invention, there is provided a joint construction method of a prediction model, including:
Acquiring sample feature data of each enterprise and class labels corresponding to the sample feature data;
Constructing an encryption model of each enterprise according to the sample characteristic data and the class labels;
respectively inputting the sample characteristic data of each enterprise into a corresponding encryption model for encryption to obtain encryption data of each enterprise;
And constructing a prediction model according to the encrypted data of each enterprise and the corresponding class labels thereof.
According to a second aspect of the present invention, there is provided a joint construction apparatus of a predictive model, comprising:
the system comprises an acquisition unit, a classification unit and a classification unit, wherein the acquisition unit is used for acquiring sample characteristic data of each enterprise and class labels corresponding to the sample characteristic data;
the first construction unit is used for constructing encryption models of all enterprises according to the sample characteristic data and the category labels;
The encryption unit is used for inputting the sample characteristic data of each enterprise into a corresponding encryption model for encryption respectively to obtain encrypted data of each enterprise;
and the second construction unit is used for jointly constructing a prediction model according to the encrypted data of each enterprise and the corresponding category labels.
According to a third aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:
Acquiring sample feature data of each enterprise and class labels corresponding to the sample feature data;
Constructing an encryption model of each enterprise according to the sample characteristic data and the class labels;
respectively inputting the sample characteristic data of each enterprise into a corresponding encryption model for encryption to obtain encryption data of each enterprise;
And constructing a prediction model according to the encrypted data of each enterprise and the corresponding class labels thereof.
According to a fourth aspect of the present invention there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of:
Acquiring sample feature data of each enterprise and class labels corresponding to the sample feature data;
Constructing an encryption model of each enterprise according to the sample characteristic data and the class labels;
respectively inputting the sample characteristic data of each enterprise into a corresponding encryption model for encryption to obtain encryption data of each enterprise;
And constructing a prediction model according to the encrypted data of each enterprise and the corresponding class labels thereof.
Compared with the method for encrypting enterprise data and jointly modeling according to the encrypted data of the enterprise at present by the intervention of a third party, the method, the device and the computer equipment for jointly constructing the prediction model can acquire sample characteristic data of each enterprise and label data corresponding to the sample characteristic data; constructing an encryption model of each enterprise according to the sample characteristic data and the category labels; meanwhile, sample characteristic data of each enterprise are respectively input into a corresponding encryption model to be encrypted, so that encrypted data of each enterprise are obtained; the prediction model is jointly constructed according to the encrypted data of each enterprise and the corresponding category labels, so that intervention of a third party is not needed, the enterprise can encrypt the internal data through the encryption model, the third party is prevented from being hooked with other enterprises, the internal data of the enterprise are revealed, the safety of the internal data of the enterprise is improved, and meanwhile, the encryption mode of the enterprise data through the encryption model is not only suitable for a linear regression prediction model and a logistic regression prediction model, but also suitable for other prediction models.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 shows a flow chart of a joint construction method of a prediction model provided by an embodiment of the invention;
FIG. 2 is a flowchart of another method for jointly constructing a predictive model according to an embodiment of the invention;
FIG. 3 is a schematic structural diagram of a joint construction device of a prediction model according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a joint construction device of another prediction model according to an embodiment of the present invention;
Fig. 5 shows a schematic physical structure of a computer device according to an embodiment of the present invention.
Detailed Description
The application will be described in detail hereinafter with reference to the drawings in conjunction with embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.
As background technology, at present, the commonly used prediction models are a linear regression model and a logistic regression model, and for the data encryption modes of the linear regression model and the logistic regression model, a third party is generally required to provide corresponding random numbers or public keys for each enterprise. However, the data encryption process for both the linear regression model and the logistic regression model requires the existence of a third party, and requires that the third party be sufficiently trustworthy, otherwise, the third party and other enterprises are hooked up to cause the leakage of the data in the enterprises, and in addition, the current encryption mode is determined according to the selected prediction model, and both the above-mentioned prediction models only involve addition and multiplication, so that the corresponding encryption mode is not applicable to all the prediction models.
In order to solve the above problems, an embodiment of the present invention provides a method for jointly constructing a prediction model, as shown in fig. 1, where the method includes:
101. Sample feature data of each enterprise and class labels corresponding to the sample feature data are obtained.
The method comprises the steps that class labels corresponding to sample feature data are real classes to which the sample feature data belong, when enterprises are jointly modeled, the enterprise internal data are required to be shared with other enterprises, in order to prevent the real data of the enterprises from being leaked to other enterprises, encryption models of the enterprises are required to be established according to the enterprise internal data, the encrypted data are encrypted through the encryption models, then the encrypted data are shared to other enterprises, when the encryption models of the enterprises are established, for example, the sample feature data of each enterprise and class labels corresponding to the sample feature data are firstly acquired, the enterprises are jointly built into a prediction model to predict the gender of a person, the input of the prediction model is feature data, the output of the prediction model is the gender of a person, the feature data in a training set comprise the internet surfing time length, the internet surfing time period, the internet shopping amount, the favorite places and the favorite eating places, but the feature data are not shared by all enterprises, the feature data, the sample feature data held by P1 enterprise comprise the internet surfing time length, the internet shopping amount, the sample feature data corresponding to the sample feature data held by the enterprise P2 enterprise comprise the favorite places P1, the sample feature data of the enterprise and the sample feature data of the enterprise P2, the sample feature data corresponding to the sample feature data P1 and the sample feature data P2 are respectively acquired according to the sample feature data of the sample feature data and the sample feature data P1 and the sample feature data corresponding to the sample feature data P2 and the sample feature label is respectively corresponding to the sample feature data P2.
102. And constructing an encryption model of each enterprise according to the sample characteristic data and the category label.
For the embodiment of the invention, in order to improve the precision of the prediction model, the internal data of the enterprise is shared to other enterprises when each enterprise is jointly modeled, in order to not leak the real data of the enterprise to other enterprises, an encryption model needs to be built to encrypt the internal data of the enterprise, specifically, when the encryption model is built, the encryption model can be a gradient descent tree encryption model, a preset gradient descent tree algorithm is utilized to train the obtained enterprise sample characteristic data and class labels corresponding to the sample characteristic data, the encryption model of each enterprise is respectively built, for example, 100 groups of sample characteristic data of the P1 enterprise comprise the surfing time, surfing time period and online purchase amount, each group of characteristic data corresponds to a unique gender label, and the encryption model is built to encrypt the internal data of the enterprise by utilizing the gradient descent tree algorithm, so that the privacy of the internal data of the enterprise is ensured.
103. And respectively inputting the sample characteristic data of each enterprise into a corresponding encryption model to encrypt, so as to obtain the encrypted data of each enterprise.
For the embodiment of the invention, after each enterprise establishes an encryption model according to own sample feature data and label types, the sample feature data of the enterprise is input into a corresponding encryption model, and the sample feature data is converted into a sample feature vector consisting of 0-1 element, so that the internal data of the enterprise is encrypted.
For example, the P1 enterprise builds an encryption model according to own sample feature data, the encryption model is a gradient descent tree encryption model, the model comprises two trees, 5 leaf nodes are shared, a certain group of sample feature data of the P1 enterprise is input into the gradient descent tree encryption model, the group of sample feature data falls on a second leaf node of a first tree and a first leaf node of a second tree, the number of the leaf nodes represents the dimension of the sample feature vector, different leaf nodes represent different components of the sample feature vector, if the sample feature data falls on the leaf nodes, the component value of the sample feature vector corresponding to the leaf node is set to be 1, if the sample feature data does not fall on the leaf nodes, the component value of the sample feature vector corresponding to the leaf node is set to be 0, and therefore the group of sample feature data is converted into a five-dimensional vector Z1= [0,1,0,1,0] after being encrypted through the gradient descent tree encryption model, the sample feature data of the enterprise is encrypted through the encryption model, no third intervention is needed, and other enterprises cannot guarantee the safety of the internal data of the enterprise according to the encrypted data.
104. And constructing a prediction model according to the encrypted data of each enterprise and the corresponding class labels thereof.
For the embodiment of the invention, the encryption data of each enterprise and the corresponding class labels thereof as well as the sample feature data of the enterprise are combined into a prediction training set, and a prediction model is constructed according to the prediction training set, for example, the sample feature data x= [ X1, X2] are owned by the enterprise P1 and the enterprise P2 respectively, the enterprise P1 owns the sample feature data X1, the enterprise P2 owns the sample feature data X2, the sample feature data X1 is encrypted through the encryption model constructed by the P1 enterprise and is converted into the sample feature vector Z1, the sample feature data X2 is encrypted through the encryption model constructed by the P2 enterprise and is converted into the sample feature vector Z2, and in order to further improve the accuracy of the prediction model, each enterprise can construct a prediction model according to the prediction training set z= [ Z1, Z2] for the P1 enterprise, and can also construct a prediction training set according to the prediction training set, and can also construct a prediction training model according to the prediction training set z= [ Z1, Z2] for the P1 enterprise.
Compared with the method for encrypting enterprise data by intervention of a third party and performing enterprise joint modeling according to the encrypted data, the method for jointly constructing the prediction model provided by the embodiment of the invention can acquire sample characteristic data of each enterprise and label data corresponding to the sample characteristic data; constructing an encryption model of each enterprise according to the sample characteristic data and the tag data; meanwhile, the sample characteristic data of each enterprise are respectively input into a corresponding encryption model for encryption, so that encrypted data of each enterprise are obtained; and the prediction model is jointly constructed according to the encrypted data of each enterprise and the corresponding category labels, so that intervention of a third party is not needed, the enterprise can encrypt the internal data through the encryption model, thereby avoiding the third party from being hooked with other enterprises, revealing the internal data of the enterprise, improving the safety of the internal data of the enterprise, and simultaneously, the encryption mode of the enterprise data through the encryption model is not only suitable for a linear regression prediction model and a logistic regression prediction model, but also suitable for other prediction models.
Further, in order to better illustrate the above process of encrypting the enterprise internal data, as a refinement and extension to the above embodiment, an embodiment of the present invention provides another method for jointly constructing a prediction model, as shown in fig. 2, where the method includes:
201. sample feature data of each enterprise and class labels corresponding to the sample feature data are obtained.
For the embodiment of the invention, the sample feature data of each enterprise and the class label corresponding to the sample feature data are stored in the database of each enterprise in advance, and the sample feature data of the enterprise and the class label corresponding to the sample feature data are acquired from the database when the encryption model of each enterprise is constructed.
202. Training the sample characteristic data and the class labels by using a preset gradient descent tree algorithm to construct the gradient descent tree encryption model.
For the embodiment of the present invention, the encryption model is a gradient descent tree encryption model, and the step 202 may specifically include: performing preliminary training on the sample characteristic data and the class labels by using a preset decision tree algorithm to obtain a preliminary decision tree model; matching the category labels with the preliminary decision tree model to obtain real probability values of the categories corresponding to the leaf nodes of the preliminary decision tree model to which the sample characteristic data belong; inputting the sample characteristic data into the preliminary decision tree model for category prediction to obtain a prediction probability value of the category corresponding to each leaf node of the preliminary decision tree model to which the sample characteristic data belongs; determining a residual gradient descent value of the preliminary iterative training according to the difference value of the real probability value and the predicted probability value; performing iterative training on the preliminary decision tree model according to the residual gradient descent value, the sample characteristic data and the class label, and repeating the step of calculating the residual gradient descent value; and when the calculated residual gradient descent value is the minimum residual gradient descent value, determining a decision tree model trained by the iteration level corresponding to the minimum residual gradient descent value as the gradient descent tree encryption model.
For example, 100 groups of sample feature data of the P1 enterprise, including a surfing time period, a surfing time period and an online purchase amount, each group of feature data corresponds to a unique gender label, training the 100 groups of sample feature data of the P1 enterprise by using a gradient descent tree algorithm to construct a gradient descent tree encryption model, specifically, given an initial estimation function F k (x), an initial estimation function F k (x) =0, k=1, …, K can be set, wherein K represents K classifications, for character prediction, K is equal to 2, the sample feature data is estimated by using the initial estimation function to obtain an estimated value F 1(x),…,FK (x) of the sample feature data, then the estimated value of the sample feature data is subjected to logic transformation to obtain a probability P k (x) that the sample feature data belongs to each category K,
According to the true probability value of the sample characteristic data and the probability value estimated by the initial estimation function, the log likelihood loss function is obtained as follows:
Where y k is a true probability value of the sample feature data, for example, when a sample belongs to the class k, y k =1, otherwise y k =0, substituting the probability p k (x) that the sample feature data belongs to each class k into the loss function, and deriving the probability p k, the gradient of the loss function can be obtained as follows:
The gradient error of the class k corresponding to the i sample characteristic data can be calculated as y ik-pk,m-1, wherein m-1 represents the iteration times, namely, the initial estimation function is iterated for m-1 rounds, so that the gradient error can be known as the difference value between the true probability of the class k corresponding to the sample characteristic data and the prediction probability after the iteration for m-1 rounds, then a decision tree model is obtained according to the sample characteristic data and the gradient error, and the residual fitting value of each leaf node is calculated according to the generated decision tree model, wherein the residual fitting value is as follows:
Wherein J represents the number of leaf nodes of the decision tree model, and the sum of the residual fitting value of each leaf node and the estimated function of the previous iteration is calculated to obtain the estimated function of the current iteration as follows:
And each step of iteration establishes a decision tree according to the gradient error corresponding to the current sample characteristic data, so that the gradient of the loss function advances in the opposite direction, and the gradient is finally minimized through the preset iteration times, and the final estimation function is determined to be a gradient descent tree encryption model.
203. Inputting the sample characteristic data of each enterprise into the gradient descent tree encryption model for encryption to obtain sample characteristic vectors corresponding to the sample characteristic data; and determining the sample feature vector as encryption data of each enterprise.
For the embodiment of the present invention, sample feature data inside an enterprise is input to an encryption model of the enterprise for encryption, the sample feature data is converted into a sample feature vector composed of 0-1 elements, and the sample feature vector composed of 0-1 elements is used as encryption data of the enterprise and can be shared with other enterprises, specifically, step 203 further includes: inputting sample characteristic data of each enterprise into the gradient descent tree encryption model for matching so as to determine whether the sample characteristic data is matched with leaf nodes of the gradient descent tree encryption model; according to the matching result, determining each characteristic matching value of the sample characteristic data; determining the dimension of the sample feature vector according to the number of leaf nodes of the gradient descent tree encryption model; according to each feature matching value of the sample feature data and the dimension of the sample feature vector, determining a sample feature vector corresponding to the sample feature data, further, according to a matching result, determining each feature matching value of the sample feature data, and further comprising: if the sample characteristic data is matched with the leaf nodes of the gradient descent tree encryption model, determining a characteristic matching value of the sample characteristic data to be 1; if the sample characteristic data is not matched with the leaf nodes of the gradient descent tree encryption model, the characteristic matching value of the sample characteristic data is determined to be 0, so that the sample characteristic data is converted into a sample characteristic vector, the encryption mode does not need intervention of a third party, other enterprises cannot push back the original data according to the shared encryption data, and the safety of the enterprise internal data is ensured.
204. And training the encrypted data of each enterprise and the corresponding class labels by using a preset logistic regression algorithm to construct the logistic regression prediction model.
For the embodiment of the present invention, the prediction model is a logistic regression prediction model, and step 204 specifically further includes training the encrypted data of each enterprise and the class labels corresponding to the encrypted data by using a maximum likelihood estimation algorithm to obtain a maximum likelihood estimation prediction model; performing convergence calculation on the maximum likelihood estimation prediction model by using a gradient descent algorithm to obtain the logistic regression prediction model, for example, constructing a character prediction model by combining all enterprises to obtain 100 groups of encryption data Z1 of a P1 enterprise and 100 groups of encryption data Z2 of a P2 enterprise, wherein the encryption data corresponds to a unique character label, Z= [ Z1, Z2] is taken as a prediction training set, the logistic regression prediction model is constructed according to the prediction training set, and a prediction function is firstly constructed as follows:
The prediction function h θ (x) represents the probability that the prediction result takes 1, and for the input feature data to be predicted, the probabilities that the classification result is the category 1 and the category 0 are respectively:
p(y=1|x;θ)=hθ(x)
p(y=0|x;θ)=1-hθ(x)
Wherein y=1 represents that the classification result is male and y=0 represents that the classification result is female, and then constructing a loss function by using a maximum likelihood algorithm according to the prediction function as follows:
The method comprises the steps of determining a final prediction function as a logistic regression prediction model according to optimal parameters theta, wherein i represents ith sample data, m represents the number of samples, the solved parameters theta are the optimal parameters when the minimum value of a maximum likelihood loss function is solved by using a gradient descent algorithm, and the final prediction function is determined to be the logistic regression prediction model.
Compared with the method for encrypting enterprise data by intervention of a third party and performing enterprise joint modeling according to the encrypted data, the method for jointly constructing the prediction model provided by the embodiment of the invention can acquire sample characteristic data of each enterprise and label data corresponding to the sample characteristic data; the encryption model of each enterprise can be constructed according to the sample characteristic data and the label data; meanwhile, the sample characteristic data of each enterprise are respectively input into a corresponding encryption model for encryption, so that encrypted data of each enterprise are obtained; and the prediction model is jointly constructed according to the encrypted data of each enterprise and the corresponding category labels, so that intervention of a third party is not needed, the enterprise can encrypt the internal data through the encryption model, thereby avoiding the third party from being hooked with other enterprises, revealing the internal data of the enterprise, improving the safety of the internal data of the enterprise, and simultaneously, the encryption mode of the enterprise data through the encryption model is not only suitable for a linear regression prediction model and a logistic regression prediction model, but also suitable for other prediction models.
Further, as a specific implementation of fig. 1, an embodiment of the present invention provides a joint construction apparatus of a prediction model, as shown in fig. 3, where the apparatus includes: an acquisition unit 31, a first construction unit 32, an encryption unit 33 and a second construction unit 34.
The obtaining unit 31 may be configured to obtain sample feature data of each enterprise and a category label corresponding to the sample feature data. The obtaining unit 31 is a main functional module in the present device for obtaining sample feature data of each enterprise and a category label corresponding to the sample feature data.
The first construction unit 32 may be configured to construct an encryption model of each enterprise according to the sample feature data and the class label. The first construction unit 32 is a main functional module, which is also a core module, for constructing an encryption model of each enterprise according to the sample feature data and the class label in the present apparatus.
The encryption unit 33 may be configured to input the sample feature data of each enterprise to a corresponding encryption model to encrypt the sample feature data, so as to obtain encrypted data of each enterprise. The encryption unit 33 is a main functional module, which is also a core module, for respectively inputting the sample feature data of each enterprise to a corresponding encryption model to encrypt, thereby obtaining the encrypted data of each enterprise.
The second construction unit 34 may be configured to jointly construct a prediction model according to the encrypted data of each enterprise and the category labels corresponding to the encrypted data. The second construction unit 34 is a main functional module in the present apparatus for jointly constructing a prediction model according to the encrypted data of each enterprise and the corresponding class labels thereof.
For the embodiment of the present invention, the encryption model is a gradient descent tree encryption model, and the first construction unit 32 may specifically be configured to train the sample feature data and the class label by using a preset gradient descent tree algorithm to construct the gradient descent tree encryption model.
Furthermore, the first construction unit 32 further includes: a preliminary training module 321, a matching module 322, a prediction module 323, a determination module 324, and an iterative training module 325.
The preliminary training module 321 may be configured to perform preliminary training on the sample feature data and the class label by using a preset decision tree algorithm, so as to obtain a preliminary decision tree model.
The matching module 322 may be configured to match the class label with the preliminary decision tree model to obtain a true probability value of the sample feature data belonging to the class corresponding to each leaf node of the preliminary decision tree model.
The prediction module 323 may be configured to input the sample feature data to the preliminary decision tree model to perform class prediction, so as to obtain a predicted probability value of each class corresponding to each leaf node of the preliminary decision tree model to which the sample feature data belongs.
The determining module 324 may be configured to determine a residual gradient descent value of the preliminary iterative training according to a difference between the true probability value and the predicted probability value.
The iterative training module 325 may be configured to perform iterative training on the preliminary decision tree model according to the residual gradient descent value, the sample feature data, and the class label, and repeat the step of calculating the residual gradient descent value.
The determining module 324 may be further configured to determine, as the gradient descent tree encryption model, a decision tree model trained by an iteration level corresponding to a minimum residual gradient descent value when the calculated residual gradient descent value is the minimum residual gradient descent value.
For the embodiment of the present invention, the encryption unit 33 includes: an encryption module 331 and a determination module 332.
The encryption module 331 may be configured to input sample feature data of each enterprise to the gradient descent tree encryption model for encryption, so as to obtain a sample feature vector corresponding to the sample feature data.
The determining module 332 may be configured to determine the sample feature vector as encrypted data of the respective enterprises.
In addition, for the specific process of converting the sample feature data into the sample feature vector, the encryption module 331 further includes: a match submodule 3311 and a determine submodule 3312.
The matching submodule 3311 may be used to input sample feature data of the respective enterprises to the gradient descent tree encryption model for matching to determine whether the sample feature data matches leaf nodes of the gradient descent tree encryption model.
The determining submodule 3312 may be configured to determine respective feature matching values of the sample feature data based on the matching result.
The determining submodule 3312 may also be used to determine the dimension of the sample feature vector based on the number of leaf nodes of the gradient descent tree encryption model.
The determining submodule 3312 may be further configured to determine a sample feature vector corresponding to the sample feature data according to each feature matching value of the sample feature data and the dimension of the sample feature vector.
In addition, for the determination process of each feature value of the sample feature data, the determination submodule 3312 may be specifically configured to determine a feature matching value of the sample feature data as 1 if the sample feature data matches a leaf node of the gradient descent tree encryption model; and if the sample characteristic data is not matched with the leaf nodes of the gradient descent tree encryption model, determining the characteristic matching value of the sample characteristic data as 0.
For the embodiment of the present invention, the second construction unit 34 may specifically be configured to combine the encrypted data of each enterprise and the class label corresponding to the encrypted data of each enterprise with the sample feature data of each enterprise into a prediction training set, and construct a prediction model according to the prediction training set.
In addition, the prediction model is a logistic regression prediction model, and the second construction unit 34 may be specifically configured to train the encrypted data of each enterprise and the class labels corresponding to the encrypted data by using a preset logistic regression algorithm to construct the logistic regression prediction model.
Further, for the specific construction process of the logistic regression prediction model, the second construction unit 34 further includes: training module 341 and calculation module 342.
The training module 341 may be configured to train the encrypted data of each enterprise and the class labels corresponding to the encrypted data by using a maximum likelihood estimation algorithm, so as to obtain a maximum likelihood estimation prediction model.
The calculation module 342 may be configured to perform convergence calculation on the maximum likelihood estimation prediction model by using a gradient descent algorithm, so as to obtain the logistic regression prediction model.
It should be noted that, other corresponding descriptions of each functional module related to the combined construction device of the prediction model provided by the embodiment of the present invention may refer to corresponding descriptions of the method shown in fig. 1, which are not described herein again.
Based on the above method as shown in fig. 1, correspondingly, the embodiment of the present invention further provides a computer readable storage medium, on which a computer program is stored, which when being executed by a processor, implements the following steps: acquiring sample feature data of each enterprise and class labels corresponding to the sample feature data; constructing an encryption model of each enterprise according to the sample characteristic data and the class labels; respectively inputting the sample characteristic data of each enterprise into a corresponding encryption model for encryption to obtain encryption data of each enterprise; and constructing a prediction model according to the encrypted data of each enterprise and the corresponding class labels thereof.
Based on the embodiment of the method shown in fig. 1 and the device shown in fig. 3, the embodiment of the invention further provides a physical structure diagram of a computer device, as shown in fig. 5, where the computer device includes: a processor 41, a memory 42, and a computer program stored on the memory 42 and executable on the processor, wherein the memory 42 and the processor 41 are both arranged on a bus 43, the processor 41 performing the following steps when said program is executed: acquiring sample feature data of each enterprise and class labels corresponding to the sample feature data; constructing an encryption model of each enterprise according to the sample characteristic data and the class labels; respectively inputting the sample characteristic data of each enterprise into a corresponding encryption model for encryption to obtain encryption data of each enterprise; and constructing a prediction model according to the encrypted data of each enterprise and the corresponding class labels thereof.
According to the technical scheme, sample characteristic data of each enterprise and label data corresponding to the sample characteristic data can be obtained; constructing an encryption model of each enterprise according to the sample characteristic data and the category labels; meanwhile, sample characteristic data of each enterprise are respectively input into a corresponding encryption model to be encrypted, so that encrypted data of each enterprise are obtained; the prediction model is jointly constructed according to the encrypted data of each enterprise and the corresponding category labels, so that intervention of a third party is not needed, the enterprise can encrypt the internal data through the encryption model, the third party is prevented from being hooked with other enterprises, the internal data of the enterprise are revealed, the safety of the internal data of the enterprise is improved, and meanwhile, the encryption mode of the enterprise data through the encryption model is not only suitable for a linear regression prediction model and a logistic regression prediction model, but also suitable for other prediction models.
It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (9)
1. The joint construction method of the prediction model is characterized by comprising the following steps of:
Acquiring sample feature data of each enterprise and class labels corresponding to the sample feature data;
Constructing an encryption model of each enterprise according to the sample characteristic data and the class labels;
respectively inputting the sample characteristic data of each enterprise into a corresponding encryption model for encryption to obtain encryption data of each enterprise;
Constructing a prediction model according to the encrypted data of each enterprise and the corresponding class labels thereof;
the encryption model is a gradient descent tree encryption model, the sample characteristic data of each enterprise is respectively input into a corresponding encryption model to be encrypted, and the obtained encryption data of each enterprise comprises:
Inputting sample characteristic data of each enterprise into the gradient descent tree encryption model for matching so as to determine whether the sample characteristic data is matched with leaf nodes of the gradient descent tree encryption model;
according to the matching result, determining each characteristic matching value of the sample characteristic data;
Determining the dimension of the sample feature vector according to the number of leaf nodes of the gradient descent tree encryption model;
According to each feature matching value of the sample feature data and the dimension of the sample feature vector, determining a sample feature vector corresponding to the sample feature data;
And determining the sample feature vector as encryption data of each enterprise.
2. The method of claim 1, wherein constructing an encryption model for each enterprise based on the sample feature data and the class labels comprises:
Training the sample characteristic data and the class labels by using a preset gradient descent tree algorithm to construct the gradient descent tree encryption model.
3. The method of claim 2, wherein training the sample feature data and the class labels using a preset gradient descent tree algorithm to construct the gradient descent tree encryption model comprises:
Performing preliminary training on the sample characteristic data and the class labels by using a preset decision tree algorithm to obtain a preliminary decision tree model;
matching the category labels with the preliminary decision tree model to obtain real probability values of the categories corresponding to the leaf nodes of the preliminary decision tree model to which the sample characteristic data belong;
Inputting the sample characteristic data into the preliminary decision tree model for category prediction to obtain a prediction probability value of the category corresponding to each leaf node of the preliminary decision tree model to which the sample characteristic data belongs;
determining a residual gradient descent value of the preliminary iterative training according to the difference value of the real probability value and the predicted probability value;
Performing iterative training on the preliminary decision tree model according to the residual gradient descent value, the sample characteristic data and the class label, and repeating the step of calculating the residual gradient descent value;
and when the calculated residual gradient descent value is the minimum residual gradient descent value, determining a decision tree model trained by the iteration level corresponding to the minimum residual gradient descent value as the gradient descent tree encryption model.
4. The method of claim 1, wherein determining respective feature match values for the sample feature data based on the match results comprises:
If the sample characteristic data is matched with the leaf nodes of the gradient descent tree encryption model, determining a characteristic matching value of the sample characteristic data to be 1;
and if the sample characteristic data is not matched with the leaf nodes of the gradient descent tree encryption model, determining the characteristic matching value of the sample characteristic data as 0.
5. The method according to claim 1, wherein the constructing a prediction model based on the encrypted data of each enterprise and the corresponding class labels includes:
And combining the encrypted data of each enterprise, the corresponding class labels and the sample characteristic data of the enterprise into a prediction training set, and constructing a prediction model according to the prediction training set.
6. The method according to any one of claims 1-5, wherein the prediction model is a logistic regression prediction model, and the constructing the prediction model based on the encrypted data of each enterprise and the corresponding class labels includes:
And training the encrypted data of each enterprise and the corresponding class labels by using a preset logistic regression algorithm to construct the logistic regression prediction model.
7. A joint construction apparatus of a prediction model, comprising:
the system comprises an acquisition unit, a classification unit and a classification unit, wherein the acquisition unit is used for acquiring sample characteristic data of each enterprise and class labels corresponding to the sample characteristic data;
the first construction unit is used for constructing an encryption model of each enterprise according to the sample characteristic data and the category labels, wherein the encryption model is a gradient descent tree encryption model;
The encryption unit is used for inputting the sample characteristic data of each enterprise into a corresponding encryption model for encryption respectively to obtain encrypted data of each enterprise;
the second construction unit is used for jointly constructing a prediction model according to the encrypted data of each enterprise and the corresponding category labels;
The encryption unit is specifically configured to input sample feature data of each enterprise to the gradient descent tree encryption model for matching, so as to determine whether the sample feature data matches with leaf nodes of the gradient descent tree encryption model; according to the matching result, determining each characteristic matching value of the sample characteristic data; determining the dimension of the sample feature vector according to the number of leaf nodes of the gradient descent tree encryption model; according to each feature matching value of the sample feature data and the dimension of the sample feature vector, determining a sample feature vector corresponding to the sample feature data; and determining the sample feature vector as encryption data of each enterprise.
8. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program when executed by the processor implements the steps of the method according to any one of claims 1 to 6.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910319424.7A CN110210233B (en) | 2019-04-19 | 2019-04-19 | Combined construction method and device of prediction model, storage medium and computer equipment |
PCT/CN2019/102911 WO2020211240A1 (en) | 2019-04-19 | 2019-08-27 | Joint construction method and apparatus for prediction model, and computer device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910319424.7A CN110210233B (en) | 2019-04-19 | 2019-04-19 | Combined construction method and device of prediction model, storage medium and computer equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110210233A CN110210233A (en) | 2019-09-06 |
CN110210233B true CN110210233B (en) | 2024-05-24 |
Family
ID=67786051
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910319424.7A Active CN110210233B (en) | 2019-04-19 | 2019-04-19 | Combined construction method and device of prediction model, storage medium and computer equipment |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110210233B (en) |
WO (1) | WO2020211240A1 (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110728375B (en) * | 2019-10-16 | 2021-03-19 | 支付宝(杭州)信息技术有限公司 | Method and device for training logistic regression model by combining multiple computing units |
CN112668016B (en) * | 2020-01-02 | 2023-12-08 | 华控清交信息科技(北京)有限公司 | Model training method and device and electronic equipment |
CN111428887B (en) * | 2020-03-19 | 2023-05-12 | 腾讯云计算(北京)有限责任公司 | Model training control method, device and system based on multiple computing nodes |
CN111738441B (en) * | 2020-07-31 | 2020-11-17 | 支付宝(杭州)信息技术有限公司 | Prediction model training method and device considering prediction precision and privacy protection |
CN112199706B (en) * | 2020-10-26 | 2022-11-22 | 支付宝(杭州)信息技术有限公司 | Tree model training method and business prediction method based on multi-party safety calculation |
CN112288101A (en) * | 2020-10-29 | 2021-01-29 | 平安科技(深圳)有限公司 | GBDT and LR fusion method, device, equipment and storage medium based on federal learning |
CN112816898B (en) * | 2021-01-26 | 2022-03-01 | 三一重工股份有限公司 | Battery failure prediction method and device, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101727462A (en) * | 2008-10-17 | 2010-06-09 | 北京大学 | Method and device for generating Chinese comparative sentence sorter model and identifying Chinese comparative sentences |
CN108615044A (en) * | 2016-12-12 | 2018-10-02 | 腾讯科技(深圳)有限公司 | A kind of method of disaggregated model training, the method and device of data classification |
CN109002861A (en) * | 2018-08-10 | 2018-12-14 | 深圳前海微众银行股份有限公司 | Federal modeling method, equipment and storage medium |
CN109299728A (en) * | 2018-08-10 | 2019-02-01 | 深圳前海微众银行股份有限公司 | Federal learning method, system and readable storage medium storing program for executing |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2516493A (en) * | 2013-07-25 | 2015-01-28 | Ibm | Parallel tree based prediction |
EP3203679A1 (en) * | 2016-02-04 | 2017-08-09 | ABB Schweiz AG | Machine learning based on homomorphic encryption |
CN109308418B (en) * | 2017-07-28 | 2021-09-24 | 创新先进技术有限公司 | Model training method and device based on shared data |
CN108520181B (en) * | 2018-03-26 | 2022-04-22 | 联想(北京)有限公司 | Data model training method and device |
CN109033854B (en) * | 2018-07-17 | 2020-06-09 | 阿里巴巴集团控股有限公司 | Model-based prediction method and device |
CN109635462A (en) * | 2018-12-17 | 2019-04-16 | 深圳前海微众银行股份有限公司 | Model parameter training method, device, equipment and medium based on federation's study |
-
2019
- 2019-04-19 CN CN201910319424.7A patent/CN110210233B/en active Active
- 2019-08-27 WO PCT/CN2019/102911 patent/WO2020211240A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101727462A (en) * | 2008-10-17 | 2010-06-09 | 北京大学 | Method and device for generating Chinese comparative sentence sorter model and identifying Chinese comparative sentences |
CN108615044A (en) * | 2016-12-12 | 2018-10-02 | 腾讯科技(深圳)有限公司 | A kind of method of disaggregated model training, the method and device of data classification |
CN109002861A (en) * | 2018-08-10 | 2018-12-14 | 深圳前海微众银行股份有限公司 | Federal modeling method, equipment and storage medium |
CN109299728A (en) * | 2018-08-10 | 2019-02-01 | 深圳前海微众银行股份有限公司 | Federal learning method, system and readable storage medium storing program for executing |
Non-Patent Citations (1)
Title |
---|
Multiple Encrypted Random Forests using Compressed Sensing for Private Classification;Mohamed Waleed Fakhr;2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT);20181120;1-7 * |
Also Published As
Publication number | Publication date |
---|---|
WO2020211240A1 (en) | 2020-10-22 |
CN110210233A (en) | 2019-09-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110210233B (en) | Combined construction method and device of prediction model, storage medium and computer equipment | |
Rückel et al. | Fairness, integrity, and privacy in a scalable blockchain-based federated learning system | |
CN112085159B (en) | User tag data prediction system, method and device and electronic equipment | |
Wan et al. | Privacy-preservation for gradient descent methods | |
Chen et al. | Secure social recommendation based on secret sharing | |
US20190073608A1 (en) | Multi-party computation system for learning a classifier | |
CN113221183B (en) | Method, device and system for realizing privacy protection of multi-party collaborative update model | |
Harir et al. | Fuzzy generalized conformable fractional derivative | |
CN111931076B (en) | Method and device for carrying out relationship recommendation based on authorized directed graph and computer equipment | |
WO2020156004A1 (en) | Model training method, apparatus and system | |
CN110033097A (en) | The method and device of the incidence relation of user and article is determined based on multiple data fields | |
CN112131471B (en) | Method, device, equipment and medium for recommending relationship based on unowned undirected graph | |
CN110659394A (en) | Recommendation method based on two-way proximity | |
Boualem | Insensitive bounds for the stationary distribution of a single server retrial queue with server subject to active breakdowns | |
CN113271319B (en) | Communication data encryption method and system based on block chain | |
Ranea et al. | Characteristic automated search of cryptographic algorithms for distinguishing attacks (CASCADA) | |
CN112101609B (en) | Prediction system, method and device for user repayment timeliness and electronic equipment | |
CN117521102A (en) | Model training method and device based on federal learning | |
Xu et al. | Generalized contextual bandits with latent features: Algorithms and applications | |
Wang et al. | Federated cf: Privacy-preserving collaborative filtering cross multiple datasets | |
He et al. | Center‐augmented ℓ2‐type regularization for subgroup learning | |
Liu et al. | A cyber physical system crowdsourcing inference method based on tempering: an advancement in artificial intelligence algorithms | |
CN114692012A (en) | Electronic government affair recommendation method based on Bert neural collaborative filtering | |
CN114048804A (en) | Classification model training method and device | |
US20220382741A1 (en) | Graph embeddings via node-property-aware fast random projection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |