WO2020211240A1 - 预测模型的联合构建方法、装置及计算机设备 - Google Patents
预测模型的联合构建方法、装置及计算机设备 Download PDFInfo
- Publication number
- WO2020211240A1 WO2020211240A1 PCT/CN2019/102911 CN2019102911W WO2020211240A1 WO 2020211240 A1 WO2020211240 A1 WO 2020211240A1 CN 2019102911 W CN2019102911 W CN 2019102911W WO 2020211240 A1 WO2020211240 A1 WO 2020211240A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- model
- enterprise
- data
- gradient descent
- sample feature
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
Definitions
- the predictive model in the field of financial intelligence recommendation plays a key role in decision-making, product recommendation, etc.
- companies usually conduct joint modeling, especially the current analysis phenomenon is very complicated.
- the real data is not shared between enterprises.
- enterprises Before sharing data, enterprises usually encrypt their own data to ensure the privacy of enterprise data. Construct predictive models based on encrypted data shared by various companies.
- the commonly used prediction models are linear regression models and logistic regression models.
- a third party is usually required to provide corresponding random numbers or public keys to various companies, and each company provides them through a third party. Encrypt your own data with the random number or public key, and then share it with other companies.
- the data encryption process for the linear regression model and the logistic regression model requires the existence of a third party, and the third party is required to have sufficient integrity, otherwise the third party will leak the random number provided to a certain company to other companies, and other companies will push back.
- the data of the enterprise can be obtained, which causes the leakage of the internal data of the enterprise.
- the current encryption methods are all based on the selected prediction model.
- the above two prediction models only involve addition and multiplication, so their corresponding encryption methods are not Not applicable to all forecasting models.
- This application provides a joint construction method, device, and computer equipment of a predictive model, which is mainly to avoid collusion between a third party and a data provider and leak data from other data providers, and to ensure data integrity while all companies are jointly modeling safety.
- a method for jointly constructing a prediction model including:
- a prediction model is jointly constructed based on the encrypted data of each enterprise and the corresponding category tags.
- a device for jointly constructing a predictive model including:
- An obtaining unit for obtaining sample characteristic data of each enterprise and a category label corresponding to the sample characteristic data
- the first construction unit is used to construct an encryption model of each enterprise according to the sample characteristic data and the category label;
- the encryption unit is used to input the sample characteristic data of each enterprise into the corresponding encryption model for encryption to obtain the encrypted data of each enterprise;
- the second construction unit is used to jointly construct a prediction model based on the encrypted data of each enterprise and the corresponding category label.
- a computer non-volatile readable storage medium on which computer readable instructions are stored, and when the computer readable instructions are executed by a processor, the following steps are implemented:
- a prediction model is jointly constructed based on the encrypted data of each enterprise and the corresponding category tags.
- a computer device including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor.
- the processor executes the computer-readable instructions Implement the following steps:
- a prediction model is jointly constructed based on the encrypted data of each enterprise and the corresponding category tags.
- the joint construction method, device and computer equipment of a predictive model provided by this application are compared with the method that currently requires the intervention of a third party to encrypt enterprise data and joint modeling based on the encrypted data of the enterprise.
- the sample characteristic data of the enterprise and the label data corresponding to the sample characteristic data; and according to the sample characteristic data and category label, the encryption model of each enterprise is constructed; at the same time, the sample characteristic data of each enterprise is input into the corresponding encryption model for encryption , Get the encrypted data of each enterprise; and jointly construct the prediction model based on the encrypted data of each enterprise and the corresponding category label, so that no third party intervention is required.
- the enterprise can encrypt the internal data through the encryption model, thus avoiding The third party colluded with other companies to leak the internal data of the enterprise, which improved the security of the internal data of the enterprise.
- the method of encrypting the enterprise data through the encryption model is not only suitable for linear regression prediction models and logistic regression prediction models, but also for other predictions. model.
- FIG. 1 shows a flowchart of a method for jointly constructing a prediction model provided by an embodiment of the present application
- FIG. 2 shows a flowchart of another method for jointly constructing a prediction model provided by an embodiment of the present application
- FIG. 3 shows a schematic structural diagram of an apparatus for jointly constructing a prediction model provided by an embodiment of the present application
- FIG. 4 shows a schematic structural diagram of another device for jointly constructing a prediction model provided by an embodiment of the present application
- Fig. 5 shows a schematic diagram of the physical structure of a computer device provided by an embodiment of the present application.
- the embodiment of the present application provides a joint construction method of a prediction model. As shown in FIG. 1, the method includes:
- the category label corresponding to the sample feature data is the true category to which the sample feature data belongs.
- the internal data of the enterprise should be shared with other enterprises.
- it needs to be based on Enterprise internal data establish the encryption model of each enterprise, encrypt the internal data of the enterprise through the encryption model, and then share the encrypted data with other enterprises.
- When constructing the encryption model of each enterprise first obtain the sample characteristic data of each enterprise
- the category label corresponding to the sample feature data For example, each company jointly builds a prediction model to predict the gender of a person.
- the input of the prediction model is the feature data
- the output of the prediction model is the gender of the person.
- the training set Feature data includes time spent online, time spent online, amount spent on online shopping, places you like to go, and things you like to eat.
- these feature data are not shared by all companies.
- the sample feature data mastered by P1 companies includes time spent online , Time spent online, and the amount spent on online shopping
- the sample feature data that P2 companies master includes where they like to go, what they like to eat
- P1 and P2 companies know the gender labels corresponding to each set of sample feature data, and obtain P1 Based on the sample feature data of the P2 company and the gender label corresponding to the sample feature data, based on the sample feature data of the P1 and P2 companies and the gender label corresponding to the sample feature data, the encryption models of the P1 and P2 companies are established respectively.
- each enterprise in order to improve the accuracy of the prediction model, each enterprise will share the internal data of the enterprise with other enterprises during joint modeling. In order not to leak the real data of the enterprise to other enterprises, it is necessary to build an encryption model for the internal data of the enterprise. Encryption.
- the encryption model can be a gradient descent tree encryption model.
- the preset gradient descent tree algorithm is used to train the acquired enterprise sample feature data and the category labels corresponding to the sample feature data, and build each enterprise separately For example, 100 sets of sample feature data of P1 companies, including online time, time spent on the Internet, and the amount spent on online shopping. Each set of feature data corresponds to a unique gender label.
- the gradient descent tree algorithm is used to analyze 100 groups of P1 companies.
- the sample feature data is trained, and an encryption model is constructed so that the encryption model can be used to encrypt the internal data of the enterprise to ensure the privacy of the internal data of the enterprise.
- the enterprise's sample feature data is input into the corresponding encryption model, and the sample feature data is converted into a sample feature vector composed of 0-1 yuan. In this way, the internal data of the enterprise is encrypted.
- the P1 company built an encryption model based on its own sample feature data.
- the encryption model is a gradient descent tree encryption model.
- the model includes two trees with a total of 5 leaf nodes.
- a certain set of sample feature data of the P1 company is input to the gradient The descending tree encryption model.
- the feature data of this group of samples falls on the second leaf node of the first tree and the first leaf node of the second tree.
- the number of leaf nodes represents the dimension of the sample feature vector, and different leaf nodes Represents the different components of the sample feature vector. If the sample feature data falls on a leaf node, set the component value of the sample feature vector corresponding to the leaf node to 1.
- Enterprise P1 has sample characteristic data X1
- enterprise P2 has sample characteristic data X2
- sample characteristic data X1 is encrypted by the encryption model constructed by P1 enterprise and converted into sample characteristic vector Z1
- the sample feature data X2 is encrypted by the encryption model constructed by the P2 enterprise and converted into the sample feature vector Z2.
- the method for jointly constructing a predictive model is compared with the method that currently requires the intervention of a third party to encrypt enterprise data and the enterprise joint modeling method based on the encrypted data.
- This application can obtain sample characteristics of each enterprise Data and the label data corresponding to the sample characteristic data; and based on the sample characteristic data and the label data, the encryption model of each enterprise is constructed; at the same time, the sample characteristic data of each enterprise is input to the corresponding
- the encryption model is encrypted to obtain the encrypted data of each enterprise; and the prediction model is jointly constructed according to the encrypted data of each enterprise and the corresponding category label, so that no third party intervention is required.
- the enterprise can use the encryption model to analyze the internal data Encryption can prevent third parties from colluding with other companies, leaking corporate internal data, and improving the security of corporate internal data.
- the method of encrypting corporate data through encryption models is not only suitable for linear regression prediction models and logistic regression prediction models, It can also be applied to other forecasting models.
- an embodiment of the present application provides another method for jointly constructing a prediction model, as shown in FIG. 2, The method includes:
- the sample feature data of each company and the category label corresponding to the sample feature data are pre-stored in the database of each company.
- the encryption model of each company is constructed, the sample feature data of the company and the sample are obtained from the database.
- the category label corresponding to the feature data is pre-stored in the database of each company.
- the encryption model is a gradient descent tree encryption model
- the step 202 may specifically include: using a preset decision tree algorithm to perform preliminary training on the sample feature data and the category label to obtain a preliminary decision tree Model; matching the category label and the preliminary decision tree model to obtain the true probability value of the sample feature data belonging to the category corresponding to each leaf node of the preliminary decision tree model; input the sample feature data to The preliminary decision tree model performs category prediction to obtain the predicted probability value of the corresponding category of each leaf node of the sample feature data attributable to the preliminary decision tree model; according to the difference between the true probability value and the predicted probability value , Determine the residual gradient descent value of the preliminary iterative training; perform iterative training on the preliminary decision tree model according to the residual gradient descent value, the sample feature data and the category label, and repeatedly calculate the residual gradient descent value When the calculated residual gradient descent value is the smallest residual gradient descent value, the smallest residual gradient descent value is determined as the gradient descent tree encryption model corresponding to the iteratively trained decision tree model.
- 100 sets of sample feature data of P1 companies include the length of time spent online, the time spent on the Internet, and the amount spent on online shopping.
- Each set of feature data corresponds to a unique gender label.
- the gradient descent tree algorithm is used to analyze the 100 sets of sample feature data of P1 companies. Train and construct a gradient descent tree encryption model.
- the estimated value of the sample feature data is F 1 (x),..., F K (x)
- the sample feature data is logically transformed to obtain the probability p k (x) that the sample feature data belongs to each category k
- the log-likelihood loss function is obtained as:
- the gradient error of the i-th sample feature data corresponding to category k is y ik -p k,m-1 , where m-1 represents the number of iterations, that is, the initial estimation function passes through m-1 iterations, and It can be seen that the gradient error is the difference between the true probability of the sample feature data i corresponding to category k and the predicted probability after m-1 rounds of iteration.
- the decision tree model is obtained. According to the generated decision tree model, Calculate the residual fitting value of each leaf node:
- J represents the number of leaf nodes of the decision tree model.
- the sum of the residual fitting value of each leaf node and the estimated function of the previous iteration is calculated, and the estimated function of this iteration is obtained as:
- each iteration will build a decision tree based on the gradient error corresponding to the current sample feature data, and make the gradient of the loss function go in the opposite direction, and finally go through the preset number of iterations to minimize the gradient, and then determine the final estimate
- the function is a gradient descent tree encryption model.
- the sample feature data inside the enterprise is input into the encryption model of the enterprise for encryption, the sample feature data is converted into a sample feature vector composed of 0-1 yuan, and the sample feature vector composed of 0-1 yuan As the encrypted data of the enterprise, it can be shared with other enterprises.
- step 203 further includes: inputting the sample characteristic data of each enterprise into the gradient descent tree encryption model for matching to determine whether the sample characteristic data is consistent with The leaf nodes of the gradient descent tree encryption model are matched; according to the matching result, each feature matching value of the sample feature data is determined; the dimension of the sample feature vector is determined according to the number of leaf nodes of the gradient descent tree encryption model; Each feature matching value of the sample feature data and the dimension of the sample feature vector, the sample feature vector corresponding to the sample feature data is determined, and further, according to the matching result, each feature matching value of the sample feature data is determined, and The method includes: if the sample feature data matches the leaf node of the gradient descent tree encryption model, determining the feature matching value of the sample feature data as 1; if the sample feature data matches the gradient descent tree encryption model If the leaf nodes do not match, the feature matching value of the sample feature data is determined to be 0, thereby converting the sample feature data into a sample feature vector.
- This encryption method does not require the intervention of a third party, and other companies are
- the prediction model is a logistic regression prediction model
- step 204 specifically includes using a maximum likelihood estimation algorithm to train the encrypted data of each enterprise and its corresponding category labels to obtain the maximum likelihood Estimate the prediction model; use the gradient descent algorithm to perform convergence calculations on the maximum likelihood estimation prediction model to obtain the logistic regression prediction model.
- the prediction function is constructed as follows:
- the prediction function h ⁇ (x) represents the probability that the prediction result takes 1, and for the input feature data to be predicted, the probability that the classification result is category 1 and category 0 are:
- the loss function is constructed using the maximum likelihood algorithm as follows:
- the parameter ⁇ at the minimum value of the loss function is the optimal parameter.
- the final prediction function is determined as the logistic regression prediction model. Since the logistic regression prediction model is constructed, the encrypted data of different companies As the prediction training set, joint can further improve the accuracy of the prediction model.
- the embodiment of the application provides another method for jointly constructing a predictive model.
- this application can obtain samples of each enterprise The characteristic data and the label data corresponding to the sample characteristic data; the encryption model of each enterprise can be constructed based on the sample characteristic data and the label data; at the same time, the sample characteristic data of each enterprise is input into the corresponding
- the encryption model is encrypted to obtain the encrypted data of each enterprise; and the prediction model is jointly constructed based on the encrypted data of each enterprise and the corresponding category label, so that no third-party intervention is required.
- the data is encrypted to avoid the third party colluding with other enterprises, leaking the internal data of the enterprise, and improving the security of the internal data of the enterprise.
- the method of encrypting enterprise data through the encryption model is not only suitable for linear regression prediction models and logistic regression prediction models. , Can also be applied to other forecasting models.
- an embodiment of the present application provides a joint construction device for a prediction model.
- the device includes: an acquisition unit 31, a first construction unit 32, an encryption unit 33, and The second construction unit 34.
- the acquiring unit 31 may be used to acquire the sample characteristic data of each enterprise and the category label corresponding to the sample characteristic data.
- the acquiring unit 31 is a main functional module of the device for acquiring sample feature data of each enterprise and the category label corresponding to the sample feature data.
- the first construction unit 32 may be used to construct an encryption model of each enterprise based on the sample feature data and the category label.
- the first construction unit 32 is the main functional module of the device to construct the encryption model of each enterprise according to the sample feature data and the category label, and is also a core module.
- the encryption unit 33 may be used to input the sample characteristic data of each enterprise into a corresponding encryption model for encryption to obtain encrypted data of each enterprise.
- the encryption unit 33 is the main functional module of the device that inputs the sample characteristic data of each enterprise into the corresponding encryption model for encryption, and obtains the encrypted data of each enterprise, and is also a core module.
- the second construction unit 34 may be used to jointly construct a prediction model based on the encrypted data of each enterprise and the corresponding category label.
- the second construction unit 34 is the main functional module of the device to jointly construct a prediction model based on the encrypted data of each enterprise and the corresponding category label.
- the encryption model is a gradient descent tree encryption model
- the first construction unit 32 may be specifically configured to use a preset gradient descent tree algorithm to train the sample feature data and the category label, To construct the gradient descent tree encryption model.
- the first construction unit 32 further includes: a preliminary training module 321, a matching module 322, a prediction module 323, a determination module 324, and an iterative training module 325.
- the preliminary training module 321 may be used to perform preliminary training on the sample feature data and the category labels by using a preset decision tree algorithm to obtain a preliminary decision tree model.
- the matching module 322 may be used to match the category label with the preliminary decision tree model to obtain the true probability value of the sample feature data belonging to the corresponding category of each leaf node of the preliminary decision tree model.
- the prediction module 323 may be used to input the sample feature data into the preliminary decision tree model for category prediction, and obtain the predicted probability of the sample feature data belonging to the category corresponding to each leaf node of the preliminary decision tree model value.
- the determining module 324 may be configured to determine the residual gradient descent value of the preliminary iterative training according to the difference between the true probability value and the predicted probability value.
- the iterative training module 325 may be used to iteratively train the preliminary decision tree model according to the residual gradient descent value, the sample feature data and the category label, and repeat the step of calculating the residual gradient descent value .
- the determining module 324 can also be used to determine that the minimum residual gradient descent value corresponds to the decision tree model of the iterative level training when the calculated residual gradient descent value is the smallest residual gradient descent value.
- the gradient descent tree encryption model can also be used to determine that the minimum residual gradient descent value corresponds to the decision tree model of the iterative level training when the calculated residual gradient descent value is the smallest residual gradient descent value.
- the encryption unit 33 includes: an encryption module 331 and a determination module 332.
- the encryption module 331 may be used to input the sample feature data of each enterprise into the gradient descent tree encryption model for encryption, and obtain the sample feature vector corresponding to the sample feature data.
- the determining module 332 may be used to determine the sample feature vector as the encrypted data of each enterprise.
- the encryption module 331 further includes a matching sub-module 3311 and a determination sub-module 3312.
- the matching sub-module 3311 may be used to input the sample feature data of each enterprise into the gradient descent tree encryption model for matching, so as to determine whether the sample feature data matches the leaf node of the gradient descent tree encryption model.
- the determining sub-module 3312 may be used to determine each feature matching value of the sample feature data according to the matching result.
- the determining submodule 3312 may also be used to determine the dimension of the sample feature vector according to the number of leaf nodes of the gradient descent tree encryption model.
- the determining submodule 3312 may also be used to determine the sample feature vector corresponding to the sample feature data according to each feature matching value of the sample feature data and the dimension of the sample feature vector.
- the determination sub-module 3312 can be specifically used to combine the sample feature data with the leaf node of the gradient descent tree encryption model.
- the feature matching value of the data is determined to be 1; if the sample feature data does not match the leaf nodes of the gradient descent tree encryption model, the feature matching value of the sample feature data is determined to be 0.
- the second construction unit 34 may be specifically used to combine the encrypted data of each enterprise and its corresponding category label, and the sample feature data of the enterprise into a prediction training set, and according to the prediction The training set builds a predictive model.
- the prediction model is a logistic regression prediction model
- the second construction unit 34 can be specifically used to train the encrypted data of each enterprise and its corresponding category labels by using a preset logistic regression algorithm to construct The logistic regression prediction model.
- the second construction unit 34 further includes a training module 341 and a calculation module 342.
- the training module 341 can be used to train the encrypted data of each enterprise and its corresponding category labels by using a maximum likelihood estimation algorithm to obtain a maximum likelihood estimation prediction model.
- the calculation module 342 may be used to perform a convergence calculation on the maximum likelihood estimation prediction model by using a gradient descent algorithm to obtain the logistic regression prediction model.
- an embodiment of the present application also provides a computer non-volatile readable storage medium, on which computer readable instructions are stored, when the computer readable instructions are executed by the processor
- the following steps are implemented: obtain sample characteristic data of each enterprise and the category label corresponding to the sample characteristic data; construct an encryption model of each enterprise according to the sample characteristic data and the category label; combine the sample characteristic data of each enterprise They are respectively input to the corresponding encryption model for encryption to obtain the encrypted data of each enterprise; a prediction model is jointly constructed according to the encrypted data of each enterprise and the corresponding category label.
- the computer device includes: a processor 41, The memory 42 and the computer-readable instructions stored on the memory 42 and that can run on the processor, wherein the memory 42 and the processor 41 are both set on the bus 43, when the processor 41 executes the computer-readable instructions, the following is achieved Steps: Obtain sample characteristic data of each enterprise and the category label corresponding to the sample characteristic data; construct an encryption model of each enterprise according to the sample characteristic data and the category label; input the sample characteristic data of each enterprise respectively Encrypt the corresponding encryption model to obtain encrypted data of each enterprise; jointly construct a prediction model based on the encrypted data of each enterprise and its corresponding category tags.
- this application can obtain the sample characteristic data and label data corresponding to the sample characteristic data of each enterprise; and construct the encryption model of each enterprise according to the sample characteristic data and category labels; at the same time, the The characteristic data of the samples are respectively input to the corresponding encryption model for encryption, and the encrypted data of each enterprise is obtained; and the prediction model is jointly constructed based on the encrypted data of each enterprise and the corresponding category label, so that no third party intervention is required.
- the encryption model encrypts the internal data, thereby avoiding the third party colluding with other companies, leaking the internal data of the enterprise, and improving the security of the internal data of the enterprise.
- the method of encrypting the enterprise data through the encryption model is not only suitable for the linear regression prediction model And logistic regression prediction model can also be applied to other prediction models.
- modules or steps of this application can be implemented by a general computing device, and they can be concentrated on a single computing device or distributed in a network composed of multiple computing devices.
- they can be implemented with program codes executable by the computing device, so that they can be stored in the storage device for execution by the computing device, and in some cases, can be executed in a different order than here.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
一种预测模型的联合构建方法、装置、存储介质及计算机设备,涉及信息技术领域,主要在于能够避免第三方与数据提供方勾结,泄露其他数据提供方的数据,在各企业联合建模的同时能够保证数据的安全性。所述方法包括:获取各个企业的样本特征数据和所述样本特征数据对应的类别标签(101);根据所述样本特征数据和所述类别标签,构建各个企业的加密模型(102);将所述各个企业的样本特征数据分别输入至对应的加密模型进行加密,得到各个企业的加密数据(103);根据所述各个企业的加密数据及其对应的类别标签联合构建预测模型(104)。该方法适用于预测模型的联合构建。
Description
本申请要求与2019年04月19日提交中国专利局、申请号为201910319424.7、申请名称为“预测模型的联合构建方法、装置、存储介质及计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
金融智能推荐领域中的预测模型在决策制定、产品推荐等方面起着关键性作用,为了获得预测精度更高的预测模型,企业之间通常会联合建模,尤其是当前分析的现象非常复杂,需要大量数据进行训练时,在企业联合建模时,企业之间并不会将真实数据进行分享,在分享数据之前,企业通常会对自己的数据进行加密,以确保企业数据的隐私性,之后根据各个企业分享的加密数据构建预测模型。
目前,常用的预测模型为线性回归模型和逻辑回归模型,对于线性回归模型和逻辑回归模型的数据加密方式,通常需要第三方向各个企业提供相应的随机数或者公钥,各个企业通过第三方提供的随机数或者公钥对自己的数据进行加密,之后再分享给其他企业。然而,针对线性回归模型和逻辑回归模型的数据加密过程,都需要第三方的存在,并且要求第三方足够诚信,否则第三方将提供给某企业的随机数泄露给其他企业,其他企业回推后便能得到该企业的数据,造成企业内部数据的泄露,此外,目前的加密方式都是根据选择的预测模型而定,上述两种预测模型都仅涉及加法和乘法,因此其相应的加密方式并不适用于所有预测模型。
申请内容
本申请提供了一种预测模型的联合构建方法、装置及计算机设备,主要在于能够避免第三方与数据提供方勾结,泄露其他数据提供方的数据,在各企业联合建模的同时能够保证数据的安全性。
根据本申请的第一个方面,提供一种预测模型的联合构建方法,包括:
获取各个企业的样本特征数据和所述样本特征数据对应的类别标签;
根据所述样本特征数据和所述类别标签,构建各个企业的加密模型;
将所述各个企业的样本特征数据分别输入至对应的加密模型进行加密,得到各个企业的加密数据;
根据所述各个企业的加密数据及其对应的类别标签联合构建预测模型。
根据本申请的第二个方面,提供一种预测模型的联合构建装置,包括:
获取单元,用于获取各个企业的样本特征数据和所述样本特征数据对应的类别标签;
第一构建单元,用于根据所述样本特征数据和所述类别标签,构建各个企业的加密模型;
加密单元,用于将所述各个企业的样本特征数据分别输入至对应的加密模型进行加密,得到各个企业的加密数据;
第二构建单元,用于根据所述各个企业的加密数据及其对应的类别标签联合构建预测模型。
根据本申请的第三个方面,提供一种计算机非易失性可读存储介质,其上存储有计算机可读指令,该计算机可读指令被处理器执行时实现以下步骤:
获取各个企业的样本特征数据和所述样本特征数据对应的类别标签;
根据所述样本特征数据和所述类别标签,构建各个企业的加密模型;
将所述各个企业的样本特征数据分别输入至对应的加密模型进行加密,得到各个企业的加密数据;
根据所述各个企业的加密数据及其对应的类别标签联合构建预测模型。
根据本申请的第四个方面,提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现以下步骤:
获取各个企业的样本特征数据和所述样本特征数据对应的类别标签;
根据所述样本特征数据和所述类别标签,构建各个企业的加密模型;
将所述各个企业的样本特征数据分别输入至对应的加密模型进行加密,得到各个企业的加密数据;
根据所述各个企业的加密数据及其对应的类别标签联合构建预测模型。
本申请提供的一种预测模型的联合构建方法、装置及计算机设备,与目前需要第三方的介入对企业数据进行加密,并根据企业的加密数据联合建模的方式相比,本申请能够获取各个企业的样本特征数据和样本特征数据对应的标签数据;并根据样本特征数据和类别标签,构建各个企业的加密模型;与此同时,将各个企业的样本特征数据分别输入至对应的加密模型进行加密,得到各个企业的加密数据;并根据各个企业的加密数据及其对应的类别标签联合构建预测模型,由此不需要第三方的介入,企业可以通过加密模型对内部的 数据进行加密,从而避免了第三方与其他企业勾结,泄露企业内部数据,提高了企业内部数据的安全性,同时通过加密模型对企业数据加密的方式不仅适用于线性回归预测模型和逻辑回归预测模型,还可以适用于其他预测模型。
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1示出了本申请实施例提供的一种预测模型的联合构建方法流程图;
图2示出了本申请实施例提供的另一种预测模型的联合构建方法流程图;
图3示出了本申请实施例提供的一种预测模型的联合构建装置的结构示意图;
图4示出了本申请实施例提供的另一种预测模型的联合构建装置的结构示意图;
图5示出了本申请实施例提供的一种计算机设备的实体结构示意图。
下文中将参考附图并结合实施例来详细说明本申请。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
如背景技术,目前,常用的预测模型为线性回归模型和逻辑回归模型,对于线性回归模型和逻辑回归模型的数据加密方式,通常需要第三方向各个企业提供相应的随机数或者公钥。然而,针对线性回归模型和逻辑回归模型的数据加密过程,都需要第三方的存在,并且要求第三方足够诚信,否则第三方与其他企业勾结,会造成企业内部数据的泄露,此外,目前的加密方式都是根据选择的预测模型而定,上述两种预测模型都仅涉及加法和乘法,因此其相应的加密方式并不适用于所有预测模型。
为了解决上述问题,本申请实施例提供了一种预测模型的联合构建方法,如图1所示,所述方法包括:
101、获取各个企业的样本特征数据和所述样本特征数据对应的类别标签。
其中,样本特征数据对应的类别标签为样本特征数据所属的真实类别,在各个企业联合建模时,要将企业内部数据与其他企业共享,为了不将企业的真实数据泄露给其他企业,需要根据企业内部数据,建立各个企业的加密模型,通过加密模型对企业内部数据进行加密,再将加密后的数据分享给其他企业,在构建各个企业的加密模型时,首先要获取各个 企业的样本特征数据和样本特征数据对应的类别标签,例如,各个企业联合构建预测模型对人的性别进行预测,预测模型的输入为特征数据,预测模型的输出为人的性别,对预测模型进行训练时,训练集中的特征数据包括上网的时长、上网的时段、网购所花金额、喜欢去的地方、喜欢吃的东西,但是这些特征数据并不是被所有企业共有,其中,P1企业掌握的样本特征数据包括上网的时长、上网的时段、网购所花金额,而P2企业掌握的样本特征数据包括喜欢去的地方、喜欢吃的东西,P1和P2企业已知各自的每组样本特征数据对应的性别标签,分别获取P1和P2企业的样本特征数据和该样本特征数据对应的性别标签,根据P1和P2企业的样本特征数据和该样本特征数据对应的性别标签,分别建立P1和P2企业的加密模型。
102、根据所述样本特征数据和所述类别标签,构建各个企业的加密模型。
对于本申请实施例,为了提高预测模型的精度,各个企业联合建模时会将企业内部数据分享给其他企业,为了不将企业的真实数据泄露给其他企业,需要构建加密模型对企业的内部数据进行加密,具体在构建加密模型时,该加密模型可以为梯度下降树加密模型,利用预设梯度下降树算法对获取的企业样本特征数据和样本特征数据对应的类别标签进行训练,分别构建各个企业的加密模型,例如,P1企业的100组样本特征数据,包括上网的时长、上网的时段、网购所花金额,每组特征数据对应唯一的性别标签,利用梯度下降树算法对P1企业的100组样本特征数据进行训练,构建加密模型,以便应用该加密模型对该企业内部数据进行加密,保证企业的内部数据的私密性。
103、将所述各个企业的样本特征数据分别输入至对应的加密模型进行加密,得到各个企业的加密数据。
对于本申请实施例,各个企业根据自己的样本特征数据和标签类别建立加密模型后,将企业的样本特征数据输入对应的加密模型,将样本特征数据转化为0-1元组成的样本特征向量,以此对企业内部数据进行加密。
例如,P1企业根据自己的样本特征数据构建了加密模型,该加密模型为梯度下降树加密模型,该模型包括两棵树,共有5个叶子节点,将P1企业的某组样本特征数据输入至梯度下降树加密模型,该组样本特征数据落在了第一棵树的第二个叶子节点和第二颗树的第一个叶子节点,叶子节点数代表样本特征向量的维数,不同的叶子节点代表样本特征向量的不同分量,若样本特征数据落在叶子节点上,将该叶子节点对应的样本特征向量的分量值设置为1,若样本特征数据未落在叶子节点上,将该叶子节点对应的样本特征向量的分量值设置为0,由此该组样本特征数据通过梯度下降树加密模型加密后转化为一个五维 向量Z1=[0,1,0,1,0],因此通过加密模型对企业的样本特征数据进行加密,不需要第三方的介入,而且其他企业无法根据分享的加密数据回推原数据,保证了企业内部数据的安全性。
104、根据所述各个企业的加密数据及其对应的类别标签联合构建预测模型。
对于本申请实施例,将各个企业的加密数据及其对应的类别标签、以及企业的样本特征数据联合成预测训练集,并根据所述预测训练集构建预测模型,例如,样本特征数据X=[X1,X2]分别被企业P1和企业P2所拥有,企业P1拥有样本特征数据X1,企业P2拥有样本特征数据X2,样本特征数据X1通过P1企业构建的加密模型进行加密,转化为样本特征向量Z1,样本特征数据X2通过P2企业构建的加密模型进行加密,转化为样本特征向量Z2,可以将Z=[Z1,Z2]作为预测训练集,此外,为了进一步提高预测模型的精度,各个企业不仅可以根据预测训练集Z=[Z1,Z2]构建预测模型,对于P1企业来说,还可以将Z=[X1,Z1,Z2]作为预测训练集,并根据该预测训练集构建预测模型,对于P2企业来说,还可以将Z=[X2,Z1,Z2]作为预测训练集,并根据该预测训练集构建预测模型。
本申请实施例提供的一种预测模型的联合构建方法,与目前需要第三方的介入对企业数据进行加密,并根据加密数据企业联合建模的方式相比,本申请能够获取各个企业的样本特征数据和所述样本特征数据对应的标签数据;并根据所述样本特征数据和所述标签数据,构建各个企业的加密模型;与此同时,将所述各个企业的样本特征数据分别输入至对应的加密模型进行加密,得到各个企业的加密数据;并根据所述各个企业的加密数据及其对应的类别标签联合构建预测模型,由此不需要第三方的介入,企业可以通过加密模型对内部的数据进行加密,从而避免了第三方与其他企业勾结,泄露企业内部数据,提高了企业内部数据的安全性,同时通过加密模型对企业数据加密的方式不仅适用于线性回归预测模型和逻辑回归预测模型,还可以适用于其他预测模型。
进一步的,为了更好的说明上述对企业内部数据加密的过程,作为对上述实施例的细化和扩展,本申请实施例提供了另一种预测模型的联合构建方法,如图2所示,所述方法包括:
201、获取各个企业的样本特征数据和所述样本特征数据对应的类别标签。
对于本申请实施例,各个企业的样本特征数据和样本特征数据对应的类别标签预先存储在各个企业的数据库中,在构建各个企业的加密模型时,从数据库中获取企业的样本特征数据和该样本特征数据对应的类别标签。
202、利用预设梯度下降树算法对所述样本特征数据和所述类别标签进行训练,以构建所述梯度下降树加密模型。
对于本申请实施例,所述加密模型为梯度下降树加密模型,所述步骤202具体可以包括:利用预设决策树算法对所述样本特征数据和所述类别标签进行初步训练,得到初步决策树模型;将所述类别标签和所述初步决策树模型进行匹配,得到所述样本特征数据归属于所述初步决策树模型的各个叶子节点对应类别的真实概率值;将所述样本特征数据输入到所述初步决策树模型进行类别预测,得到所述样本特征数据归属于所述初步决策树模型的各个叶子节点对应类别的预测概率值;根据所述真实概率值和所述预测概率值的差值,确定初步迭代训练的残差梯度下降值;根据所述残差梯度下降值、所述样本特征数据和所述类别标签对所述初步决策树模型进行迭代训练,并重复计算残差梯度下降值的步骤;当计算的残差梯度下降值为最小的残差梯度下降值时,将所述最小的残差梯度下降值对应迭代层级训练的决策树模型,确定为所述梯度下降树加密模型。
例如,P1企业的100组样本特征数据,包括上网的时长、上网的时段、网购所花金额,每组特征数据对应唯一的性别标签,利用梯度下降树算法对P1企业的100组样本特征数据进行训练,构建梯度下降树加密模型,具体地,给定初始估计函数F
k(x),也可以设定初始估计函数F
k(x)=0,k=1,…,K,其中,K代表K个分类,对于性格预测,K等于2,利用初始估计函数对样本特征数据进行估计,得到样本特征数据的估计值为F
1(x),…,F
K(x),之后对样本特征数据的估计值进行逻辑变换,得到样本特征数据归属于各个类别k的概率p
k(x),
根据所述样本特征数据的真实概率值和初始估计函数估计的概率值,得到对数似然损失函数为:
其中,y
k为样本特征数据的真实概率值,例如,当一个样本属于类别k时,y
k=1,否则y
k=0,将样本特征数据归属于各个类别k的概率p
k(x)代入损失函数,并且对其求导,可以得到损失函数的梯度为:
由此可以计算出第i个样本特征数据对应类别k的的梯度误差为y
ik-p
k,m-1,其中,m-1代表迭代次数,即初始估计函数经过m-1轮迭代,由此可知梯度误差为样本特征数据 i对应类别k的真实概率和经m-1轮迭代后预测概率的差值,之后根据样本特征数据和梯度误差,得到决策树模型,根据生成的决策树模型,计算各个叶子节点的残差拟合值为:
其中,J代表决策树模型的叶子节点数,计算各个叶子节点的残差拟合值与上一轮迭代的估计函数之和,得到本轮迭代的估计函数为:
由此每一步迭代都会根据当前的样本特征数据对应的梯度误差建立一棵决策树,使损失函数的梯度往反方向前进,最终经过预设的迭代次数,使得梯度最小,此时确定最终的估计函数为梯度下降树加密模型。
203、将所述各个企业的样本特征数据输入至所述梯度下降树加密模型进行加密,得到所述样本特征数据对应的样本特征向量;将所述样本特征向量确定为所述各个企业的加密数据。
对于本申请实施例,将企业内部的样本特征数据输入至企业的加密模型进行加密,将样本特征数据转为为0-1元组成的样本特征向量,并将0-1元组成的样本特征向量作为企业的加密数据,可以与其他企业共享,具体地,步骤203还包括:将所述各个企业的样本特征数据输入至所述梯度下降树加密模型进行匹配,以确定所述样本特征数据是否与梯度下降树加密模型的叶子节点匹配;根据匹配结果,确定所述样本特征数据的各个特征匹配值;根据梯度下降树加密模型的叶子节点数量,确定所述样本特征向量的维数;根据所述样本特征数据的各个特征匹配值和所述样本特征向量的维数,确定所述样本特征数据对应的样本特征向量,进一步地,根据匹配结果,确定所述样本特征数据的各个特征匹配值,还包括:若所述样本特征数据与所述梯度下降树加密模型的叶子节点匹配,则将所述样本特征数据的特征匹配值确定为1;若所述样本特征数据与所述梯度下降树加密模型的叶子节点不匹配,则将所述样本特征数据的特征匹配值确定为0,由此将样本特征数据转化为样本特征向量,这种加密方式无需第三方的介入,而且其他企业根据分享的加密数据,也无法回推原数据,保证了企业内部数据的安全性。
204、利用预设逻辑回归算法对所述各个企业的加密数据及其对应的类别标签进行训练,以构建所述逻辑回归预测模型。
对于本申请实施例,所述预测模型为逻辑回归预测模型,步骤204具体还包括利用极大似然估计算法对所述各个企业的加密数据及其对应的类别标签进行训练,得到极大似然估计预测模型;利用梯度下降算法对所述极大似然估计预测模型进行收敛计算,得到所述逻辑回归预测模型,例如,各个企业联合构建性格预测模型,获取P1企业的100组加密数据Z1和P2企业的100组加密数据Z2,该加密数据对应唯一的性格标签,将Z=[Z1,Z2]作为预测训练集,根据该预测训练集构建逻辑回归预测模型,首先构造预测函数如下:
其中,预测函数h
θ(x)表示预测结果取1的概率,则对于输入的待预测的特征数据,其分类结果为类别1和类别0的概率分别为:
p(y=1|x;θ)=h
θ(x)
p(y=0|x;θ)=1-h
θ(x)
其中,y=1代表分类结果为男性,y=0代表分类结果为女性,之后根据预测函数,利用极大似然算法构造损失函数如下:
失函数最小值时的参数θ,求解的θ即为最佳参数,根据最佳参数θ,确定最终的预测函数为逻辑回归预测模型,由于在逻辑回归预测模型构建时,将不同企业的加密数据联合作为预测训练集,可以进一步提高预测模型的精度。
本申请实施例提供的另一种预测模型的联合构建方法,与目前需要第三方的介入对企业数据进行加密,并根据加密数据企业联合建模的方式相比,本申请能够获取各个企业的样本特征数据和所述样本特征数据对应的标签数据;能够根据所述样本特征数据和所述标签数据,构建各个企业的加密模型;与此同时,将所述各个企业的样本特征数据分别输入至对应的加密模型进行加密,得到各个企业的加密数据;并根据所述各个企业的加密数据及其对应的类别标签联合构建预测模型,由此不需要第三方的介入,企业可以通过加密模型对内部的数据进行加密,从而避免了第三方与其他企业勾结,泄露企业内部数据,提高了企业内部数据的安全性,同时通过加密模型对企业数据加密的方式不仅适用于线性回归预测模型和逻辑回归预测模型,还可以适用于其他预测模型。
进一步地,作为图1的具体实现,本申请实施例提供了一种预测模型的联合构建装置,如图3所示,所述装置包括:获取单元31、第一构建单元32,加密单元33和第二构建单元34。
所述获取单元31,可以用于获取各个企业的样本特征数据和所述样本特征数据对应的类别标签。所述获取单元31是本装置中获取各个企业的样本特征数据和所述样本特征数据对应的类别标签的主要功能模块。
所述第一构建单元32,可以用于根据所述样本特征数据和所述类别标签,构建各个企业的加密模型。所述第一构建单元32是本装置中根据所述样本特征数据和所述类别标签,构建各个企业的加密模型的主要功能模块,也是核心模块。
所述加密单元33,可以用于将所述各个企业的样本特征数据分别输入至对应的加密模型进行加密,得到各个企业的加密数据。所述加密单元33是本装置中将所述各个企业的样本特征数据分别输入至对应的加密模型进行加密,得到各个企业的加密数据的主要功能模块,也是核心模块。
所述第二构建单元34,可以用于根据所述各个企业的加密数据及其对应的类别标签联合构建预测模型。所述第二构建单元34是本装置中根据所述各个企业的加密数据及其对应的类别标签联合构建预测模型的主要功能模块。
对于本申请实施例,所述加密模型为梯度下降树加密模型,所述第一构建单元32,具体可以用于利用预设梯度下降树算法对所述样本特征数据和所述类别标签进行训练,以构建所述梯度下降树加密模型。
此外,所述第一构建单元32还包括:初步训练模块321,匹配模块322,预测模块323,确定模块324和迭代训练模块325。
所述初步训练模块321,可以用于利用预设决策树算法对所述样本特征数据和所述类别标签进行初步训练,得到初步决策树模型。
所述匹配模块322,可以用于将所述类别标签和所述初步决策树模型进行匹配,得到所述样本特征数据归属于所述初步决策树模型的各个叶子节点对应类别的真实概率值。
所述预测模块323,可以用于将所述样本特征数据输入到所述初步决策树模型进行类别预测,得到所述样本特征数据归属于所述初步决策树模型的各个叶子节点对应类别的预测概率值。
所述确定模块324,可以用于根据所述真实概率值和所述预测概率值的差值,确定初步迭代训练的残差梯度下降值。
所述迭代训练模块325,可以用于根据所述残差梯度下降值、所述样本特征数据和所述类别标签对所述初步决策树模型进行迭代训练,并重复计算残差梯度下降值的步骤。
所述确定模块324,还可以用于当计算的残差梯度下降值为最小的残差梯度下降值时,将所述最小的残差梯度下降值对应迭代层级训练的决策树模型,确定为所述梯度下降树加密模型。
对于本申请实施例,所述加密单元33,包括:加密模块331和确定模块332。
所述加密模块331,可以用于将所述各个企业的样本特征数据输入至所述梯度下降树加密模型进行加密,得到所述样本特征数据对应的样本特征向量。
所述确定模块332,可以用于将所述样本特征向量确定为所述各个企业的加密数据。
此外,针对样本特征数据转化为样本特征向量的具体过程,所述加密模块331,还包括:匹配子模块3311和确定子模块3312。
所述匹配子模块3311,可以用于将所述各个企业的样本特征数据输入至所述梯度下降树加密模型进行匹配,以确定所述样本特征数据是否与梯度下降树加密模型的叶子节点匹配。
所述确定子模块3312,可以用于根据匹配结果,确定所述样本特征数据的各个特征匹配值。
所述确定子模块3312,还可以用于根据梯度下降树加密模型的叶子节点数量,确定所述样本特征向量的维数。
所述确定子模块3312,还可以用于根据所述样本特征数据的各个特征匹配值和所述样本特征向量的维数,确定所述样本特征数据对应的样本特征向量。
此外,针对样本特征数据的各个特征值的确定过程,所述确定子模块3312,具体可以用于若所述样本特征数据与所述梯度下降树加密模型的叶子节点匹配,则将所述样本特征数据的特征匹配值确定为1;若所述样本特征数据与所述梯度下降树加密模型的叶子节点不匹配,则将所述样本特征数据的特征匹配值确定为0。
对于本申请实施例,所述第二构建单元34,具体可以用于将所述各个企业的加密数据及其对应的类别标签、以及企业的样本特征数据联合成预测训练集,并根据所述预测训练集构建预测模型。
此外,所述预测模型为逻辑回归预测模型,所述第二构建单元34,具体还可以用于利用预设逻辑回归算法对所述各个企业的加密数据及其对应的类别标签进行训练,以构建所述逻辑回归预测模型。
进一步地,针对逻辑回归预测模型的具体构建过程,所述第二构建单元34,还包括:训练模块341和计算模块342。
所述训练模块341,可以用于利用极大似然估计算法对所述各个企业的加密数据及其对应的类别标签进行训练,得到极大似然估计预测模型。
所述计算模块342,可以用于利用梯度下降算法对所述极大似然估计预测模型进行收敛计算,得到所述逻辑回归预测模型。
需要说明的是,本申请实施例提供的一种预测模型的联合构建装置所涉及各功能模块的其他相应描述,可以参考图1所示方法的对应描述,在此不再赘述。
基于上述如图1所示方法,相应的,本申请实施例还提供了一种计算机非易失性可读存储介质,其上存储有计算机可读指令,该计算机可读指令被处理器执行时实现以下步骤:获取各个企业的样本特征数据和所述样本特征数据对应的类别标签;根据所述样本特征数据和所述类别标签,构建各个企业的加密模型;将所述各个企业的样本特征数据分别输入至对应的加密模型进行加密,得到各个企业的加密数据;根据所述各个企业的加密数据及其对应的类别标签联合构建预测模型。
基于上述如图1所示方法和如图3所示装置的实施例,本申请实施例还提供了一种计算机设备的实体结构图,如图5所示,该计算机设备包括:处理器41、存储器42、及存储在存储器42上并可在处理器上运行的计算机可读指令,其中存储器42和处理器41均设置在总线43上所述处理器41执行所述计算机可读指令时实现以下步骤:获取各个企业的样本特征数据和所述样本特征数据对应的类别标签;根据所述样本特征数据和所述类别标签,构建各个企业的加密模型;将所述各个企业的样本特征数据分别输入至对应的加密模型进行加密,得到各个企业的加密数据;根据所述各个企业的加密数据及其对应的类别标签联合构建预测模型。
通过本申请的技术方案,本申请能够获取各个企业的样本特征数据和样本特征数据对应的标签数据;并根据样本特征数据和类别标签,构建各个企业的加密模型;与此同时,将各个企业的样本特征数据分别输入至对应的加密模型进行加密,得到各个企业的加密数据;并根据各个企业的加密数据及其对应的类别标签联合构建预测模型,由此不需要第三方的介入,企业可以通过加密模型对内部的数据进行加密,从而避免了第三方与其他企业勾结,泄露企业内部数据,提高了企业内部数据的安全性,同时通过加密模型对企业数据加密的方式不仅适用于线性回归预测模型和逻辑回归预测模型,还可以适用于其他预测模型。
显然,本领域的技术人员应该明白,上述的本申请的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请不限制于任何特定的硬件和软件结合。
以上所述仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包括在本申请的保护范围之内。
Claims (20)
- 一种预测模型的联合构建方法,其特征在于,包括:获取各个企业的样本特征数据和所述样本特征数据对应的类别标签;根据所述样本特征数据和所述类别标签,构建各个企业的加密模型;将所述各个企业的样本特征数据分别输入至对应的加密模型进行加密,得到各个企业的加密数据;根据所述各个企业的加密数据及其对应的类别标签联合构建预测模型。
- 根据权利要求1所述的方法,其特征在于,所述加密模型为梯度下降树加密模型,所述根据所述样本特征数据和所述类别标签,构建各个企业的加密模型,包括:利用预设梯度下降树算法对所述样本特征数据和所述类别标签进行训练,以构建所述梯度下降树加密模型;所述将所述各个企业的样本特征数据分别输入至对应的加密模型进行加密,得到各个企业的加密数据,包括:将所述各个企业的样本特征数据输入至所述梯度下降树加密模型进行加密,得到所述样本特征数据对应的样本特征向量;将所述样本特征向量确定为所述各个企业的加密数据。
- 根据权利要求2所述的方法,其特征在于,所述利用预设梯度下降树算法对所述样本特征数据和所述类别标签进行训练,以构建所述梯度下降树加密模型,包括:利用预设决策树算法对所述样本特征数据和所述类别标签进行初步训练,得到初步决策树模型;将所述类别标签和所述初步决策树模型进行匹配,得到所述样本特征数据归属于所述初步决策树模型的各个叶子节点对应类别的真实概率值;将所述样本特征数据输入到所述初步决策树模型进行类别预测,得到所述样本特征数据归属于所述初步决策树模型的各个叶子节点对应类别的预测概率值;根据所述真实概率值和所述预测概率值的差值,确定初步迭代训练的残差梯度下降值;根据所述残差梯度下降值、所述样本特征数据和所述类别标签对所述初步决策树模型进行迭代训练,并重复计算残差梯度下降值的步骤;当计算的残差梯度下降值为最小的残差梯度下降值时,将所述最小的残差梯度下降值对应迭代层级训练的决策树模型,确定为所述梯度下降树加密模型。
- 根据权利要求2所述的方法,其特征在于,所述将所述各个企业的样本特征数据输 入至所述梯度下降树加密模型进行加密,得到所述样本特征数据对应的样本特征向量,包括:将所述各个企业的样本特征数据输入至所述梯度下降树加密模型进行匹配,以确定所述样本特征数据是否与梯度下降树加密模型的叶子节点匹配;根据匹配结果,确定所述样本特征数据的各个特征匹配值;根据梯度下降树加密模型的叶子节点数量,确定所述样本特征向量的维数;根据所述样本特征数据的各个特征匹配值和所述样本特征向量的维数,确定所述样本特征数据对应的样本特征向量。
- 根据权利要求4所述的方法,其特征在于,所述根据匹配结果,确定所述样本特征数据的各个特征匹配值,包括:若所述样本特征数据与所述梯度下降树加密模型的叶子节点匹配,则将所述样本特征数据的特征匹配值确定为1;若所述样本特征数据与所述梯度下降树加密模型的叶子节点不匹配,则将所述样本特征数据的特征匹配值确定为0。
- 根据权利要求1所述的方法,其特征在于,所述根据所述各个企业的加密数据及其对应的类别标签联合构建预测模型,包括:将所述各个企业的加密数据及其对应的类别标签、以及企业的样本特征数据联合成预测训练集,并根据所述预测训练集构建预测模型。
- 根据权利要求1-6任一项所述的方法,其特征在于,所述预测模型为逻辑回归预测模型,所述根据所述各个企业的加密数据及其对应的类别标签联合构建预测模型,包括:利用预设逻辑回归算法对所述各个企业的加密数据及其对应的类别标签进行训练,以构建所述逻辑回归预测模型。
- 一种预测模型的联合构建装置,其特征在于,包括:获取单元,用于获取各个企业的样本特征数据和所述样本特征数据对应的类别标签;第一构建单元,用于根据所述样本特征数据和所述类别标签,构建各个企业的加密模型;加密单元,用于将所述各个企业的样本特征数据分别输入至对应的加密模型进行加密,得到各个企业的加密数据;第二构建单元,用于根据所述各个企业的加密数据及其对应的类别标签联合构建预测模型。
- 根据权利要求8所述的装置,其特征在于,所述加密模型为梯度下降树加密模型,所述第一构建单元,具体用于利用预设梯度下降树算法对所述样本特征数据和所述类别标签 进行训练,以构建所述梯度下降树加密模型;所述加密单元,包括:加密模块,用于将所述各个企业的样本特征数据输入至所述梯度下降树加密模型进行加密,得到所述样本特征数据对应的样本特征向量;确定模块,用于将所述样本特征向量确定为所述各个企业的加密数据。
- 根据权利要求9所述的装置,其特征在于,所述第一构建单元,包括:初步训练模块,用于利用预设决策树算法对所述样本特征数据和所述类别标签进行初步训练,得到初步决策树模型;匹配模块,用于将所述类别标签和所述初步决策树模型进行匹配,得到所述样本特征数据归属于所述初步决策树模型的各个叶子节点对应类别的真实概率值;预测模块,用于将所述样本特征数据输入到所述初步决策树模型进行类别预测,得到所述样本特征数据归属于所述初步决策树模型的各个叶子节点对应类别的预测概率值;确定模块,用于根据所述真实概率值和所述预测概率值的差值,确定初步迭代训练的残差梯度下降值;迭代训练模块,用于根据所述残差梯度下降值、所述样本特征数据和所述类别标签对所述初步决策树模型进行迭代训练,并重复计算残差梯度下降值的步骤;所述确定模块,还用于当计算的残差梯度下降值为最小的残差梯度下降值时,将所述最小的残差梯度下降值对应迭代层级训练的决策树模型,确定为所述梯度下降树加密模型。
- 根据权利要求9所述的装置,其特征在于,所述加密模块,包括:匹配子模块,用于将所述各个企业的样本特征数据输入至所述梯度下降树加密模型进行匹配,以确定所述样本特征数据是否与梯度下降树加密模型的叶子节点匹配;确定子模块,用于根据匹配结果,确定所述样本特征数据的各个特征匹配值;所述确定子模块,还用于根据梯度下降树加密模型的叶子节点数量,确定所述样本特征向量的维数;所述确定子模块,还用于根据所述样本特征数据的各个特征匹配值和所述样本特征向量的维数,确定所述样本特征数据对应的样本特征向量。
- 根据权利要求11所述的装置,其特征在于,所述确定子模块,具体用于若所述样本特征数据与所述梯度下降树加密模型的叶子节点匹配,则将所述样本特征数据的特征匹配值确定为1;若所述样本特征数据与所述梯度下降树加密模型的叶子节点不匹配,则将所述样本特征数据的特征匹配值确定为0。
- 根据权利要求8所述的装置,其特征在于,所述第二构建单元,具体用于将所述各个企业的加密数据及其对应的类别标签、以及企业的样本特征数据联合成预测训练集,并根据所述预测训练集构建预测模型。
- 根据权利要求8-13任一项所述的装置,其特征在于,所述预测模型为逻辑回归预测模型,所述第二构建单元,具体还用于利用预设逻辑回归算法对所述各个企业的加密数据及其对应的类别标签进行训练,以构建所述逻辑回归预测模型。
- 一种计算机非易失性可读存储介质,其上存储有计算机可读指令,其特征在于,所述计算机可读指令被处理器执行时实现预测模型的联合构建方法,包括:获取各个企业的样本特征数据和所述样本特征数据对应的类别标签;根据所述样本特征数据和所述类别标签,构建各个企业的加密模型;将所述各个企业的样本特征数据分别输入至对应的加密模型进行加密,得到各个企业的加密数据;根据所述各个企业的加密数据及其对应的类别标签联合构建预测模型。
- 根据权利要求15所述的计算机非易失性可读存储介质,其特征在于,所述加密模型为梯度下降树加密模型,所述计算机可读指令被处理器执行时实现所述根据所述样本特征数据和所述类别标签,构建各个企业的加密模型,包括:利用预设梯度下降树算法对所述样本特征数据和所述类别标签进行训练,以构建所述梯度下降树加密模型;所述将所述各个企业的样本特征数据分别输入至对应的加密模型进行加密,得到各个企业的加密数据,包括:将所述各个企业的样本特征数据输入至所述梯度下降树加密模型进行加密,得到所述样本特征数据对应的样本特征向量;将所述样本特征向量确定为所述各个企业的加密数据。
- 根据权利要求16所述的计算机非易失性可读存储介质,其特征在于,所述计算机可读指令被处理器执行时实现所述利用预设梯度下降树算法对所述样本特征数据和所述类别标签进行训练,以构建所述梯度下降树加密模型,包括:利用预设决策树算法对所述样本特征数据和所述类别标签进行初步训练,得到初步决策树模型;将所述类别标签和所述初步决策树模型进行匹配,得到所述样本特征数据归属于所述初步决策树模型的各个叶子节点对应类别的真实概率值;将所述样本特征数据输入到所述初步决策树模型进行类别预测,得到所述样本特征数据归属于所述初步决策树模型的各个叶子节点对应类别的预测概率值;根据所述真实概率值和所述预测概率值的差值,确定初步迭代训练的残差梯度下降值;根据所述残差梯度下降值、所述样本特征数据和所述类别标签对所述初步决策树模型进行迭代训练,并重复计算残差梯度下降值的步骤;当计算的残差梯度 下降值为最小的残差梯度下降值时,将所述最小的残差梯度下降值对应迭代层级训练的决策树模型,确定为所述梯度下降树加密模型。
- 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现预测模型的联合构建方法,包括:获取各个企业的样本特征数据和所述样本特征数据对应的类别标签;根据所述样本特征数据和所述类别标签,构建各个企业的加密模型;将所述各个企业的样本特征数据分别输入至对应的加密模型进行加密,得到各个企业的加密数据;根据所述各个企业的加密数据及其对应的类别标签联合构建预测模型。
- 根据权利要求18所述的计算机设备,其特征在于,所述加密模型为梯度下降树加密模型,所述计算机可读指令被处理器执行时实现所述根据所述样本特征数据和所述类别标签,构建各个企业的加密模型,包括:利用预设梯度下降树算法对所述样本特征数据和所述类别标签进行训练,以构建所述梯度下降树加密模型;所述将所述各个企业的样本特征数据分别输入至对应的加密模型进行加密,得到各个企业的加密数据,包括:将所述各个企业的样本特征数据输入至所述梯度下降树加密模型进行加密,得到所述样本特征数据对应的样本特征向量;将所述样本特征向量确定为所述各个企业的加密数据。
- 根据权利要求19所述的计算机设备,其特征在于,所述计算机可读指令被处理器执行时实现所述利用预设梯度下降树算法对所述样本特征数据和所述类别标签进行训练,以构建所述梯度下降树加密模型,包括:利用预设决策树算法对所述样本特征数据和所述类别标签进行初步训练,得到初步决策树模型;将所述类别标签和所述初步决策树模型进行匹配,得到所述样本特征数据归属于所述初步决策树模型的各个叶子节点对应类别的真实概率值;将所述样本特征数据输入到所述初步决策树模型进行类别预测,得到所述样本特征数据归属于所述初步决策树模型的各个叶子节点对应类别的预测概率值;根据所述真实概率值和所述预测概率值的差值,确定初步迭代训练的残差梯度下降值;根据所述残差梯度下降值、所述样本特征数据和所述类别标签对所述初步决策树模型进行迭代训练,并重复计算残差梯度下降值的步骤;当计算的残差梯度下降值为最小的残差梯度下降值时,将所述最小的残差梯度下降值对应迭代层级训练的决策树模型,确定为所述梯度下降树加密模型。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910319424.7A CN110210233B (zh) | 2019-04-19 | 2019-04-19 | 预测模型的联合构建方法、装置、存储介质及计算机设备 |
CN201910319424.7 | 2019-04-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020211240A1 true WO2020211240A1 (zh) | 2020-10-22 |
Family
ID=67786051
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/102911 WO2020211240A1 (zh) | 2019-04-19 | 2019-08-27 | 预测模型的联合构建方法、装置及计算机设备 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110210233B (zh) |
WO (1) | WO2020211240A1 (zh) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110728375B (zh) * | 2019-10-16 | 2021-03-19 | 支付宝(杭州)信息技术有限公司 | 多个计算单元联合训练逻辑回归模型的方法和装置 |
CN112668016B (zh) * | 2020-01-02 | 2023-12-08 | 华控清交信息科技(北京)有限公司 | 一种模型训练方法、装置和电子设备 |
CN111428887B (zh) * | 2020-03-19 | 2023-05-12 | 腾讯云计算(北京)有限责任公司 | 一种基于多个计算节点的模型训练控制方法、装置及系统 |
CN111738441B (zh) * | 2020-07-31 | 2020-11-17 | 支付宝(杭州)信息技术有限公司 | 兼顾预测精度和隐私保护的预测模型训练方法及装置 |
CN112199706B (zh) * | 2020-10-26 | 2022-11-22 | 支付宝(杭州)信息技术有限公司 | 基于多方安全计算的树模型的训练方法和业务预测方法 |
CN112288101A (zh) * | 2020-10-29 | 2021-01-29 | 平安科技(深圳)有限公司 | 基于联邦学习的gbdt与lr融合方法、装置、设备和存储介质 |
CN112816898B (zh) * | 2021-01-26 | 2022-03-01 | 三一重工股份有限公司 | 电瓶故障预测方法、装置、电子设备和存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108520181A (zh) * | 2018-03-26 | 2018-09-11 | 联想(北京)有限公司 | 数据模型训练方法和装置 |
CN109002861A (zh) * | 2018-08-10 | 2018-12-14 | 深圳前海微众银行股份有限公司 | 联邦建模方法、设备及存储介质 |
CN109033854A (zh) * | 2018-07-17 | 2018-12-18 | 阿里巴巴集团控股有限公司 | 基于模型的预测方法和装置 |
CN109299728A (zh) * | 2018-08-10 | 2019-02-01 | 深圳前海微众银行股份有限公司 | 联邦学习方法、系统及可读存储介质 |
CN109308418A (zh) * | 2017-07-28 | 2019-02-05 | 阿里巴巴集团控股有限公司 | 一种基于共享数据的模型训练方法及装置 |
CN109635462A (zh) * | 2018-12-17 | 2019-04-16 | 深圳前海微众银行股份有限公司 | 基于联邦学习的模型参数训练方法、装置、设备及介质 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101727462B (zh) * | 2008-10-17 | 2012-04-25 | 北京大学 | 中文比较句分类器模型生成、中文比较句识别方法及装置 |
GB2516493A (en) * | 2013-07-25 | 2015-01-28 | Ibm | Parallel tree based prediction |
EP3203679A1 (en) * | 2016-02-04 | 2017-08-09 | ABB Schweiz AG | Machine learning based on homomorphic encryption |
CN108615044A (zh) * | 2016-12-12 | 2018-10-02 | 腾讯科技(深圳)有限公司 | 一种分类模型训练的方法、数据分类的方法及装置 |
-
2019
- 2019-04-19 CN CN201910319424.7A patent/CN110210233B/zh active Active
- 2019-08-27 WO PCT/CN2019/102911 patent/WO2020211240A1/zh active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109308418A (zh) * | 2017-07-28 | 2019-02-05 | 阿里巴巴集团控股有限公司 | 一种基于共享数据的模型训练方法及装置 |
CN108520181A (zh) * | 2018-03-26 | 2018-09-11 | 联想(北京)有限公司 | 数据模型训练方法和装置 |
CN109033854A (zh) * | 2018-07-17 | 2018-12-18 | 阿里巴巴集团控股有限公司 | 基于模型的预测方法和装置 |
CN109002861A (zh) * | 2018-08-10 | 2018-12-14 | 深圳前海微众银行股份有限公司 | 联邦建模方法、设备及存储介质 |
CN109299728A (zh) * | 2018-08-10 | 2019-02-01 | 深圳前海微众银行股份有限公司 | 联邦学习方法、系统及可读存储介质 |
CN109635462A (zh) * | 2018-12-17 | 2019-04-16 | 深圳前海微众银行股份有限公司 | 基于联邦学习的模型参数训练方法、装置、设备及介质 |
Also Published As
Publication number | Publication date |
---|---|
CN110210233A (zh) | 2019-09-06 |
CN110210233B (zh) | 2024-05-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020211240A1 (zh) | 预测模型的联合构建方法、装置及计算机设备 | |
CN112085159B (zh) | 一种用户标签数据预测系统、方法、装置及电子设备 | |
Rückel et al. | Fairness, integrity, and privacy in a scalable blockchain-based federated learning system | |
WO2022089256A1 (zh) | 联邦神经网络模型的训练方法、装置、设备、计算机程序产品及计算机可读存储介质 | |
US20210174264A1 (en) | Training tree-based machine-learning modeling algorithms for predicting outputs and generating explanatory data | |
WO2022206510A1 (zh) | 联邦学习的模型训练方法、装置、设备及存储介质 | |
US20190073608A1 (en) | Multi-party computation system for learning a classifier | |
Chen et al. | Evfl: An explainable vertical federated learning for data-oriented artificial intelligence systems | |
US11176469B2 (en) | Model training methods, apparatuses, and systems | |
WO2020011200A1 (zh) | 跨域数据融合方法、系统以及存储介质 | |
EP3863003B1 (en) | Hidden sigmoid function calculation system, hidden logistic regression calculation system, hidden sigmoid function calculation device, hidden logistic regression calculation device, hidden sigmoid function calculation method, hidden logistic regression calculation method, and program | |
Meslem et al. | Using set invariance to design robust interval observers for discrete‐time linear systems | |
CN114547643A (zh) | 一种基于同态加密的线性回归纵向联邦学习方法 | |
Groeneboom et al. | Estimation in monotone single‐index models | |
Boualem | Insensitive bounds for the stationary distribution of a single server retrial queue with server subject to active breakdowns | |
CN112101609B (zh) | 关于用户还款及时性的预测系统、方法、装置及电子设备 | |
Faizi et al. | A Multicriteria Decision‐Making Approach Based on Fuzzy AHP with Intuitionistic 2‐Tuple Linguistic Sets | |
WO2024139666A1 (zh) | 双目标域推荐模型的训练方法及装置 | |
CN117521102A (zh) | 一种基于联邦学习的模型训练方法及装置 | |
Deng et al. | Non-interactive and privacy-preserving neural network learning using functional encryption | |
US20190324606A1 (en) | Online training of segmentation model via interactions with interactive computing environment | |
WO2024051456A1 (zh) | 多方协同模型训练方法、装置、设备和介质 | |
CN110175283B (zh) | 一种推荐模型的生成方法及装置 | |
US20230325718A1 (en) | Method and apparatus for joint training logistic regression model | |
CN115130568A (zh) | 支持多参与方的纵向联邦Softmax回归方法及系统 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19925008 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19925008 Country of ref document: EP Kind code of ref document: A1 |