WO2019062697A1 - 虚拟资源分配、模型建立、数据预测方法及装置 - Google Patents

虚拟资源分配、模型建立、数据预测方法及装置 Download PDF

Info

Publication number
WO2019062697A1
WO2019062697A1 PCT/CN2018/107261 CN2018107261W WO2019062697A1 WO 2019062697 A1 WO2019062697 A1 WO 2019062697A1 CN 2018107261 W CN2018107261 W CN 2018107261W WO 2019062697 A1 WO2019062697 A1 WO 2019062697A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
data
model
data provider
training
Prior art date
Application number
PCT/CN2018/107261
Other languages
English (en)
French (fr)
Inventor
周俊
李小龙
Original Assignee
阿里巴巴集团控股有限公司
周俊
李小龙
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 周俊, 李小龙 filed Critical 阿里巴巴集团控股有限公司
Priority to EP18861936.5A priority Critical patent/EP3617983A4/en
Publication of WO2019062697A1 publication Critical patent/WO2019062697A1/zh
Priority to US16/697,913 priority patent/US10691494B2/en
Priority to US16/907,637 priority patent/US10891161B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance

Definitions

  • the present specification relates to the field of computer applications, and in particular, to a virtual resource allocation, a model establishment, a data prediction method and apparatus.
  • data features of several dimensions can be extracted from a large amount of user data, and training samples are constructed based on the extracted features, and a user risk assessment model is created through specific machine learning algorithm training. Then use the user risk assessment model to conduct a risk assessment on the user and determine whether the user is a risk user based on the risk assessment result, and then decide whether to issue a loan to the user.
  • This specification proposes a virtual resource allocation method, including:
  • each training sample includes an evaluation result of the same user in each data provider; wherein the training sample is based on the actual user's business
  • the implementation is marked with a label;
  • the model is trained based on the number of training samples and the labels of the respective training samples, and the coefficients of the variables in the trained model are used as the contribution of each data provider, and the data providers are allocated based on the contribution of each data provider. Virtual resources.
  • the trained model is a linear model.
  • the number of the virtual resources allocated for each data provider is proportional to the contribution of each data provider.
  • it also includes:
  • the virtual resource is a user data usage fund issued to each data provider.
  • the evaluation model is a user risk assessment model; the evaluation result is a risk score; and the label indicates whether the user is a risk user.
  • the present specification also provides a virtual resource allocation apparatus, including:
  • the receiving module receives the evaluation results of the plurality of users uploaded by the plurality of data providers; wherein the evaluation result is obtained by each data provider separately evaluating the user based on the evaluation model of the user;
  • the training module constructs a plurality of training samples by using the evaluation result uploaded by each data provider as training data, and each training sample includes an evaluation result of the same user in each data provider; wherein the training sample is based on the user pair
  • the actual implementation of the business is calibrated;
  • the allocation module trains the model based on the plurality of training samples and the labels of the respective training samples, and the coefficients of the variables in the trained model are used as the contribution of each data provider, and the data is based on the contribution of each data provider.
  • the provider allocates virtual resources.
  • the trained model is a linear model.
  • the number of the virtual resources allocated for each data provider is proportional to the contribution of each data provider.
  • it also includes:
  • the evaluation module receives an evaluation result uploaded by a plurality of data providers for a certain user, and inputs the evaluation result into the trained model to obtain a final evaluation result of the user.
  • the virtual resource is a user data usage fund issued to each data provider.
  • the evaluation model is a user risk assessment model; the evaluation result is a risk score; and the label indicates whether the user is a risk user.
  • the specification further provides a method for establishing a model, including:
  • each training sample includes an evaluation result of the same user in each data provider; wherein the training sample is based on the actual user's business
  • the implementation is marked with a label;
  • the model is trained based on the number of training samples and the labels of the respective training samples to obtain a trained model.
  • the trained model is a linear model.
  • the evaluation model is a user risk assessment model; the evaluation result is a risk score; and the label indicates whether the user is a risk user.
  • This specification also proposes a method of data prediction, including:
  • each training sample includes an evaluation result of the same user in each data provider; wherein the training sample is based on the actual user's business
  • the implementation is marked with a label;
  • the present specification also proposes a virtual resource allocation system, including:
  • the plurality of data provider servers upload the evaluation results of the plurality of users to the risk assessor server; wherein the evaluation results are obtained after the data providers respectively evaluate the users based on their own evaluation models;
  • the risk assessor server uses the evaluation result uploaded by each data provider as the training data to construct a plurality of training samples, each training sample includes the evaluation result of the same user in each data provider; wherein the training sample Labeling the model according to the actual execution of the business by the user; and training the model based on the number of the training samples and the labels of the respective training samples, and using the coefficients of the variables in the trained model as the contribution of each data provider Allocating virtual resources to each data provider based on the contribution of each data provider.
  • the present specification also proposes an electronic device comprising:
  • a memory for storing machine executable instructions
  • the processor is caused to:
  • each training sample includes an evaluation result of the same user in each data provider; wherein the training sample is based on the actual user's business
  • the implementation is marked with a label;
  • the model is trained based on the number of training samples and the labels of the respective training samples, and the coefficients of the variables in the trained model are used as the contribution of each data provider, and the data providers are allocated based on the contribution of each data provider. Virtual resources.
  • multiple data providers can upload evaluation results obtained by evaluating several users based on their own evaluation models to the risk assessor; and the risk assessor can use the evaluation results uploaded by each data provider as training data.
  • Several training samples are constructed to train the model, and the coefficients corresponding to the variables in the trained model are used as the contribution of each data provider, and then the virtual resources can be allocated to each data provider based on the contribution:
  • the risk assessor trains the model based on the user data maintained by each data provider, the data provider only needs to transmit the evaluation result obtained after preliminary evaluation of several users to the risk assessor, so for the data provider It is no longer necessary to transmit the original user data maintained locally to the risk assessor, which can significantly reduce the risk of user privacy leakage;
  • FIG. 1 is a flowchart of a virtual resource allocation method according to an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of a training model of an evaluation result uploaded by a risk assessor based on an embodiment of the present specification
  • FIG. 3 is a flow chart showing a method for establishing a model according to an embodiment of the present specification
  • FIG. 5 is a hardware structural diagram of an electronic device carrying a virtual resource allocation apparatus according to an embodiment of the present disclosure
  • FIG. 6 is a logic block diagram of the virtual resource allocation apparatus according to an embodiment of the present disclosure.
  • each data provider can train its locally maintained user data based on a machine learning algorithm, construct a user evaluation model, and use the user evaluation model to evaluate several sample users, and then upload the evaluation results to risk assessment. square.
  • the risk assessor can use the evaluation results uploaded by each data provider as training data to construct a number of training samples; wherein each training sample contains the evaluation results of the same user in each data provider.
  • the evaluation result of a certain user uploaded by each data provider can be used as a modeling feature to construct a feature vector as a training sample.
  • the trained training sample may be calibrated to the training sample according to the actual execution condition of the service by the user; for example, in the business scenario of the credit issuance, the label for each training sample may be specifically based on the user's real Case, a user tag that is calibrated for the user to indicate whether the user is a risky user.
  • the risk assessor can train the model based on the completed training samples and the training samples, and use the coefficients of each variable in the trained model as the contribution of each data provider to the model, and then provide based on each data.
  • the contribution of the party allocates virtual resources to each data provider.
  • the risk assessor trains the model based on the user data maintained by each data provider, the data provider only needs to transmit the evaluation result obtained after preliminary evaluation of several users to the risk assessor, so for the data provider It is no longer necessary to transmit the original user data maintained locally to the risk assessor, which can significantly reduce the risk of user privacy leakage;
  • the user evaluation model may be a user risk assessment model for determining whether the user is a risk user, and the evaluation result may be to use the user risk assessment model to risk the user.
  • a risk score output after evaluation may be a user risk assessment model for determining whether the user is a risk user, and the evaluation result may be to use the user risk assessment model to risk the user.
  • each data provider can construct a user risk assessment model based on its own user data; when the risk assessor (for example, the party that issued the loan) needs to share the user data of each data provider to train the user risk
  • the evaluation results uploaded by each data provider can be used as training data to construct a number of training samples, and based on the actual repayment situation of the user, each training sample is labeled with a label indicating whether the user is a risk user.
  • the coefficients of the variables in the trained model can be used as the contribution of each data provider to the model, based on the contribution of each data provider.
  • Each data provider allocates virtual resources. Thus, throughout the process, each data provider does not need to provide the original user data to the risk assessor to complete the "data sharing.”
  • FIG. 1 is a schematic diagram of a virtual resource allocation method according to an embodiment of the present disclosure, which is applied to a server of a risk assessment party, and performs the following steps:
  • Step 102 Receive an evaluation result of a plurality of users uploaded by a plurality of data providers, where the evaluation result is obtained by each data provider separately evaluating the user based on the evaluation model of the user;
  • Step 104 Using the evaluation result uploaded by each data provider as the training data, constructing a plurality of training samples, each training sample includes an evaluation result of the same user in each data provider; wherein the training sample is based on the user pair The actual implementation of the business is calibrated;
  • Step 106 Train the model based on the number of the training samples and the labels of the training samples, and use the coefficients of the variables in the trained model as the contribution of each data provider, and the data is based on the contribution of each data provider.
  • the provider allocates virtual resources.
  • the above data provider may specifically include a party having a cooperative relationship with the above risk assessor.
  • the data provider and the risk assessor may respectively correspond to different operators; for example, the modeler may be the data operation platform of company A, and the data provider may be the data operation platform of company A.
  • Docking service platforms such as e-commerce platforms, third-party banks, express delivery companies, other financial institutions, and telecom operators.
  • the user evaluation model described above may specifically include any type of machine learning model for evaluating a user
  • the user evaluation model described above may specifically be a user risk assessment model trained based on a specific machine learning algorithm (eg, a linear logistic regression model or score for risk assessment of a user). Card model); correspondingly, the evaluation result outputted by the user evaluation model after the user evaluation may be a risk score that represents the risk level of the user; wherein, in practical applications, the risk score is usually a 0-
  • the floating point value between 1 for example, the above risk score may specifically be a probability value representing the user's risk level
  • the above evaluation result may also be other forms of the score other than the risk score, such as a credit score.
  • each data provider may no longer need to transmit the locally maintained original user data to the risk assessment. Instead, they are modeled separately using raw user data maintained locally.
  • the server of each data provider can collect user data generated by the user in the background separately, collect several pieces of user data as data samples from the collected user data, and generate one based on the collected data samples. A collection of initialized data samples.
  • the specific form of the foregoing user data depends on a specific service scenario and a modeling requirement, and may specifically cover any type of user data from which a modeling feature for training a user evaluation model can be extracted, and is not performed in this specification. Specially defined;
  • the user data may include transaction data such as the user, a shopping record, and a repayment record. , consumer records, wealth management product purchase records, etc., from which user data for training the modeling features of the risk assessment model can be extracted.
  • the data provider server may further preprocess the data samples in the data sample set.
  • the preprocessing of the data samples in the data sample set generally includes performing data cleaning, supplementing default values, normalization processing, or other forms of preprocessing on the data samples in the data sample set.
  • the collected data samples can be converted into standardized data samples suitable for model training.
  • the data provider server may extract data features of several dimensions from each data sample in the data sample set (ie, finally participate in modeling). Modeling features).
  • the number of data features of the extracted plurality of dimensions is not specifically limited in the present specification, and those skilled in the art can select based on actual modeling requirements.
  • the specific type of the extracted data features is not particularly limited in the present specification, and those skilled in the art can manually select from the information actually included in the data samples based on actual modeling requirements.
  • the data provider server may generate a data feature vector for each data sample based on the extracted data feature values corresponding to the data features of the dimensions. Then, based on the data feature vector of each data sample, a target matrix is constructed; for example, taking the data features of M dimensions from N data samples as an example, the target matrix may be an N*M-dimensional matrix. .
  • the above-mentioned target matrix is constructed, that is, the training sample set that finally performs model training, and each data provider server can perform machine learning on the target matrix as the original sample training set based on a specific machine learning algorithm. Train a user evaluation model.
  • machine learning algorithms used by each data provider in training the user evaluation model may be the same or different, and are not particularly limited in the present specification.
  • the machine learning model may be a supervised machine learning model; for example, the machine learning model may be an LR (Logistic Regression) model.
  • LR Logistic Regression
  • each of the data samples in the training sample set may carry a pre-calibrated sample tag.
  • the specific form of the sample tag usually depends on specific business scenarios and modeling requirements, and is not specifically limited in this specification;
  • the sample tag may be a user tag indicating whether the user is a risk user; wherein the user tag may specifically Calibrated and provided by the risk assessor.
  • each of the data feature vectors in the target matrix may correspond to one sample tag.
  • a loss function (Loss Function) can usually be used to estimate the fitting error between the training sample and the corresponding sample label.
  • the training sample and the corresponding sample label can be input as input values to the loss function, and the iterative calculation is repeated by the gradient descent method until convergence, and then the model parameters can be solved (ie, each modeling feature in the training sample)
  • the optimal weight value which can be used to characterize the contribution of each modeling feature to the model output result, and then the obtained value of the model parameter is taken as an optimal parameter to construct the above-mentioned logistic regression model.
  • FIG. 2 is a schematic diagram of a training model of an evaluation result uploaded by a risk assessor based on multiple data providers.
  • the risk assessor may prepare a number of sample users in advance, and notify the data providers of the user IDs of the sample users; for example, the user ID of each sample user may be distributed in the form of a list when implemented. To each data provider.
  • each data provider can use its own user evaluation model to perform user evaluation for each sample user, and then upload the evaluation result to the risk assessor for modeling by the above risk assessor. .
  • the risk assessor does not need to notify the data provider of the sample user's user ID.
  • the preliminary evaluation result of the data provider “sharing” to the risk assessor can be understood as a dimension reduction of the locally maintained user data; that is, the preliminary evaluation result of “sharing” by each data provider can be regarded as Is a data feature that maintains locally maintained user data to a dimension of 1.
  • the preliminary evaluation results are “shared” to the risk assessor, which is equivalent to Based on machine learning, the value of the data analyzed from the locally maintained user data is shared with the risk assessor.
  • each data provider does not “share” the original user data to the risk assessor, data sharing can still be achieved through data sharing “sharing”.
  • the risk assessor may use the evaluation results uploaded by the data providers as training data to create a corresponding correspondence for each sample user. Training samples.
  • each training sample completed by the construction includes the evaluation result obtained by each data provider based on the training completion of the above-mentioned user evaluation model and preliminary evaluation of the sample user corresponding to the training sample. For each data provider's evaluation result, it will correspond to one of the above training samples.
  • each training sample will include several feature fields, and each feature field will respectively correspond to an evaluation result uploaded by one data provider.
  • a training sample set may also be generated based on the created training samples, and the corresponding label is calibrated to the training samples based on actual performance of each sample user; for example, in credit
  • the label that is calibrated for each training sample may be a user label that is calibrated for the user to indicate whether the user is a risk user based on the real repayment situation of the user.
  • the risk assessor can calibrate the user label for each sample user based on whether the sample user ultimately defaults on the repayment; for example, if the user finally issues a loan to a sample user, the user has defaulted on the repayment, then Finally in the training sample set, the training samples corresponding to the sample user will be tagged with a label indicating that the user is a risk user.
  • the risk evaluator server can train the preset machine learning model based on the constructed training sample set and the tags corresponding to the training samples.
  • a certain linear relationship may be maintained between the user's user tag (ie, the final user evaluation result) due to the evaluation result of the same user uploaded by each data provider. ;
  • the result of the evaluation of the same user uploaded by each data provider may be multiplied by the corresponding coefficient, and then the calculation result is used as the final evaluation result for the user.
  • the machine learning model trained by the risk assessor side may specifically be a linear model; for example, in practical applications, the machine learning model trained by the risk assessor side may be a linear logistic regression model.
  • the risk assessment party trains the linear model based on the constructed training sample set and the label corresponding to each training sample, that is, the evaluation result uploaded by each data provider is used as an independent variable, and the corresponding user label is used as a factor.
  • Variables which are substituted into the expression of the linear model for linear fitting, and the process of solving the coefficients corresponding to the respective variables, and the specific implementation process thereof will not be described in detail in the specification, and those skilled in the art will When the technical solution is implemented, reference can be made to the description in the related art.
  • the model is trained at this time.
  • a virtual resource wherein the number of virtual resources allocated for each data provider may be proportional to a weight value (ie, a coefficient) of each data provider;
  • the virtual resource allocated for each data provider may be the user data usage fund issued by each data provider in the risk assessment direction.
  • the risk assessor may use the funds of the user data that can be distributed to each data provider based on the contribution of each data provider to the trained model, and distribute the benefits.
  • the contribution of each data provider to the trained model may be specifically characterized by a trained system corresponding to each variable in the training sample.
  • the coefficients corresponding to the trained variables can be used as the data providers. Contribution degree, and then assign interest to each data provider based on the coefficient size corresponding to each variable;
  • the risk assessor can use the coefficient of each variable as the contribution to the model, convert the corresponding allocation ratio, and then use the total amount of funds that can be used for the user data allocated to each data provider according to the converted distribution ratio. , respectively assigned to each data provider.
  • the data provider with a higher contribution to the model will be able to get more data usage funds.
  • high-quality data providers can benefit more, forcing individual data providers to continuously improve their data quality.
  • an initial coefficient can be set for each variable in the model, and the initial coefficient can be used. Characterize the initial contribution of each data provider to the model.
  • the setting strategy of the initial contribution degree is not particularly limited in the present specification, and those skilled in the art can set the technical solution of the present specification based on actual needs when implementing the technical solution of the present specification;
  • a weighted average manner may be used to set an identical initial coefficient for each variable in the model. And based on the initial coefficient as the initial contribution of each data provider, the virtual resources are evenly distributed for each data provider.
  • the virtual resource allocated by the risk assessor for each data provider is used as an example for the user data used by the data provider in the risk assessment direction.
  • the risk assessor may be based on each data provider.
  • the initial contribution, the total amount of funds used for user data that can be allocated to each data provider, is evenly distributed to each data provider.
  • the trained model can be used to conduct a risk assessment for a target user.
  • the above-mentioned target users may specifically include users whose risk assessment parties need to perform risk assessment; for example, a business scenario in which a credit is issued is taken as an example, and the risk assessment party may specifically be a party to which the loan is issued; and the target user may specifically Refers to the user who initiated a loan application and needs to be assessed by the risk assessor and decide whether to issue the loan.
  • the plurality of data providers may search for the evaluation result that has been evaluated by using the user evaluation model based on the user ID, and then upload the evaluation result to the risk assessor.
  • the risk assessor can use the evaluation result uploaded by each data provider as the training data, create a corresponding prediction sample for the target user, and then predict the result.
  • the sample is input into the trained model for prediction calculation, and the final evaluation result of the user is obtained, and the corresponding business decision is made based on the final evaluation result.
  • the business scenario of credit is still taken as an example.
  • the final evaluation result may still be a risk score; when the risk assessor decides whether to issue a loan to the user based on the risk score, The risk score is compared with a preset risk threshold. On the one hand, if the risk score is higher than or equal to the risk threshold, the target user is a risk user, and the user may be calibrated for indicating the user. Label the user of the risky user and terminate the loan application initiated by the user.
  • the risk score is lower than the risk threshold, it indicates that the target user is a low-risk user, and the user may be marked with a user label indicating that the user is a low-risk user, and the user is normally responded to by the user.
  • the user tag that has been calibrated for the user can be maintained and updated based on whether the target user finally defaults on the repayment; for example, the target user is assumed to be a non-risk user, and finally After the loan is issued to the user, if the user has a default payment, the user tag that has been calibrated can be updated immediately, and the user is re-calibrated as a risk user.
  • the risk assessor trains the model based on the user data maintained by each data provider, the data provider only needs to transmit the initial evaluation of the user to the risk assessor risk assessor. As a result of the evaluation, it is no longer necessary for the data provider to transmit the locally maintained raw user data to the risk assessor, which can significantly reduce the risk of user privacy leakage;
  • the specification further provides a model establishing method, which is applied to the risk assessment party server, and performs the following steps:
  • Step 302 Receive an evaluation result of a plurality of users uploaded by a plurality of data providers, where the evaluation result is obtained by each data provider separately evaluating the user based on the evaluation model of the user;
  • Step 304 Using the evaluation result uploaded by each data provider as the training data, constructing a plurality of training samples, each training sample includes an evaluation result of the same user in each data provider; wherein the training sample is based on the user pair The actual implementation of the business is calibrated;
  • Step 306 Train the model based on the plurality of training samples and the labels of the respective training samples to obtain a trained model.
  • the trained model described above may be a linear model.
  • it can be a linear logistic regression model.
  • the above evaluation model may be a user risk assessment model; the above evaluation result may be a risk score (or a credit score); the above label indicates whether the user is a risk user.
  • the present specification further provides a data prediction method, which is applied to a risk assessment party server, and performs the following steps:
  • Step 402 Receive an evaluation result of a plurality of users uploaded by a plurality of data providers, where the evaluation result is obtained by each data provider separately evaluating the user based on the evaluation model of the user;
  • Step 404 using the evaluation result uploaded by each data provider as the training data, constructing a plurality of training samples, each training sample includes an evaluation result of the same user in each data provider; wherein the training sample is based on the user pair The actual implementation of the business is calibrated;
  • Step 406 Train the model based on the number of the training samples and the labels of the respective training samples to obtain a trained model
  • Step 408 Receive an evaluation result uploaded by a plurality of data providers for a certain user, and input the evaluation result into the trained model to obtain a final evaluation result of the user.
  • the present specification also provides an embodiment of a virtual resource allocation apparatus.
  • Embodiments of the virtual resource allocation device of the present specification can be applied to an electronic device.
  • the device embodiment may be implemented by software, or may be implemented by hardware or a combination of hardware and software.
  • the processor of the electronic device in which it is located reads the corresponding computer program instructions in the non-volatile memory into the memory.
  • FIG. 5 a hardware structure diagram of an electronic device in which the virtual resource allocation device of the present specification is located, except for the processor, the memory, the network interface, and the nonvolatile memory shown in FIG. 5.
  • the electronic device in which the device is located in the embodiment may also include other hardware according to the actual function of the electronic device, and details are not described herein.
  • FIG. 6 is a block diagram of a virtual resource allocation apparatus according to an exemplary embodiment of the present specification.
  • the virtual resource allocation device 60 can be applied to the electronic device shown in FIG. 6 , and includes: a receiving module 601 , a training module 602 , and an allocating module 603 .
  • the receiving module 601 receives an evaluation result of a plurality of users uploaded by a plurality of data providers, wherein the evaluation result is obtained by each data provider separately evaluating the user based on the evaluation model of the user;
  • the training module 602 is configured to use the evaluation result uploaded by each data provider as training data to construct a plurality of training samples, where each training sample includes an evaluation result of the same user in each data provider; wherein the training sample is based on the user The actual implementation of the business is calibrated;
  • the distribution module 603 trains the model based on the plurality of training samples and the labels of the respective training samples, and uses the coefficients of the variables in the trained model as the contribution of each data provider, based on the contribution of each data provider.
  • the data provider allocates virtual resources.
  • the trained model is a linear model.
  • the number of the virtual resources allocated for each data provider is proportional to the contribution of each data provider.
  • the device further includes:
  • An evaluation module 604 receives an evaluation result uploaded by a plurality of data providers for a certain user, and inputs the evaluation result into the trained model to obtain a final evaluation result of the user. .
  • the virtual resource is a user data usage fund issued to each data provider.
  • the evaluation model is a user risk assessment model; the evaluation result is a risk score; and the label indicates whether the user is a risk user.
  • the device embodiment since it basically corresponds to the method embodiment, reference may be made to the partial description of the method embodiment.
  • the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, ie may be located A place, or it can be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the objectives of the present specification. Those of ordinary skill in the art can understand and implement without any creative effort.
  • the system, device, module or unit illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product having a certain function.
  • a typical implementation device is a computer, and the specific form of the computer may be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email transceiver, and a game control.
  • the present specification also provides an embodiment of a virtual resource allocation system.
  • the virtual resource allocation system may include multiple data provider servers and a risk assessor server.
  • the plurality of data provider servers upload the evaluation results of the plurality of users to the risk assessor server; wherein the evaluation results are obtained after the data providers respectively evaluate the users based on their own evaluation models;
  • the risk assessor server uses the evaluation result uploaded by each data provider as the training data to construct a plurality of training samples, each training sample includes the evaluation result of the same user in each data provider; wherein the training sample Labeling the model according to the actual execution of the business by the user; and training the model based on the number of the training samples and the labels of the respective training samples, and using the coefficients of the variables in the trained model as the contribution of each data provider Allocating virtual resources to each data provider based on the contribution of each data provider.
  • the present specification also provides an embodiment of an electronic device.
  • the electronic device includes a processor and a memory for storing machine executable instructions; wherein the processor and the memory are typically interconnected by an internal bus.
  • the device may also include an external interface to enable communication with other devices or components.
  • the processor by reading and executing the machine-executable instructions stored in the memory corresponding to the control logic of the virtual resource allocation described above, the processor is caused to:
  • each training sample includes an evaluation result of the same user in each data provider; wherein the training sample is based on the actual user's business
  • the implementation is marked with a label;
  • the model is trained based on the number of training samples and the labels of the respective training samples, and the coefficients of the variables in the trained model are used as the contribution of each data provider, and the data providers are allocated based on the contribution of each data provider. Virtual resources.
  • the trained model is a linear model.
  • the number of the virtual resources allocated for each data provider is proportional to the contribution of each data provider.
  • the processor by reading and executing the machine-executable instructions stored in the memory corresponding to the control logic of the virtual resource allocation described above, the processor is caused to:
  • the virtual resource is a user data usage fund issued to each data provider.
  • the evaluation model is a user risk assessment model; the evaluation result is a risk score; and the label indicates whether the user is a risk user.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Computing Systems (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Technology Law (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Algebra (AREA)
  • Medical Informatics (AREA)
  • Game Theory and Decision Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

公开一种虚拟资源分配方法,包括:接收多个数据提供方上传的若干用户的评估结果;其中,所述评估结果为各数据提供方基于自己的评估模型对用户分别进行评估后得到;将各数据提供方上传的评估结果作为训练数据,构建出若干训练样本,每一条训练样本包含同一个用户在所述各数据提供方的评估结果;其中,所述训练样本根据用户对业务的实际执行情况被标定了标签;基于所述若干训练样本以及各个训练样本的标签对模型进行训练,将训练好的模型中的各变量的系数作为各数据提供方贡献度,基于各数据提供方的贡献度为各数据提供方分配虚拟资源。

Description

虚拟资源分配、模型建立、数据预测方法及装置 技术领域
本说明书涉及计算机应用领域,尤其涉及一种虚拟资源分配、模型的建立、数据预测方法及装置。
背景技术
随着互联网技术的飞速发展,用户的个人数据的网络化和透明化已经成为不可阻挡的大趋势。对于一些面向用户提供互联网服务的服务平台而言,可以通过采集用户日常产生的服务数据,收集到海量的用户数据。而这些用户数据对于服务平台的运营方来说,是非常珍贵的“资源”,服务平台的运营方可以通过数据挖掘和机器学习,基于这些“资源”来构建用户评估模型,并利用该用户评估模型对用户进行评估决策。
例如,在信贷发放的场景下,可以从海量的用户数据中提取出若干个维度的数据特征,并基于提取出的这些特征构建训练样本,通过特定的机器学习算法训练来创建用户风险评估模型,然后使用该用户风险评估模型,对用户进行风险评估,并基于风险评估结果来决策该用户是否为风险用户,进而决定是否需要向用户发放贷款。
发明内容
本说明书提出一种虚拟资源分配方法,包括:
接收多个数据提供方上传的若干用户的评估结果;其中,所述评估结果为各数据提供方基于自己的评估模型对用户分别进行评估后得到;
将各数据提供方上传的评估结果作为训练数据,构建出若干训练样本,每一条训练样本包含同一个用户在所述各数据提供方的评估结果;其中,所述训练样本根据用户对业务的实际执行情况被标定了标签;
基于所述若干训练样本以及各个训练样本的标签对模型进行训练,将训练 好的模型中的各变量的系数作为各数据提供方贡献度,基于各数据提供方的贡献度为各数据提供方分配虚拟资源。
可选的,所述训练好的模型为线性模型。
可选的,为各数据提供方分配的所述虚拟资源的数量,与各数据提供方的贡献度成正比。
可选的,还包括:
接收多个数据提供方上传的针对某一个用户的评估结果,将所述评估结果输入所述训练好的模型中,得到该用户的最终的评估结果。
可选的,所述虚拟资源为向各数据提供方发放的用户数据使用资金。
可选的,所述评估模型为用户风险评估模型;所述评估结果为风险评分;所述标签指示用户是否为风险用户。
本说明书还提出一种虚拟资源分配装置,包括:
接收模块,接收多个数据提供方上传的若干用户的评估结果;其中,所述评估结果为各数据提供方基于自己的评估模型对用户分别进行评估后得到;
训练模块,将各数据提供方上传的评估结果作为训练数据,构建出若干训练样本,每一条训练样本包含同一个用户在所述各数据提供方的评估结果;其中,所述训练样本根据用户对业务的实际执行情况被标定了标签;
分配模块,基于所述若干训练样本以及各个训练样本的标签对模型进行训练,将训练好的模型中的各变量的系数作为各数据提供方贡献度,基于各数据提供方的贡献度为各数据提供方分配虚拟资源。
可选的,所述训练好的模型为线性模型。
可选的,为各数据提供方分配的所述虚拟资源的数量,与各数据提供方的贡献度成正比。
可选的,还包括:
评估模块,接收多个数据提供方上传的针对某一个用户的评估结果,将所述评估结果输入所述训练好的模型中,得到该用户的最终的评估结果。
可选的,所述虚拟资源为向各数据提供方发放的用户数据使用资金。
可选的,所述评估模型为用户风险评估模型;所述评估结果为风险评分;所述标签指示用户是否为风险用户。
可选的,本说明书还提出一种模型建立方法,包括:
接收多个数据提供方上传的若干用户的评估结果;其中,所述评估结果为各数据提供方基于自己的评估模型对用户分别进行评估后得到;
将各数据提供方上传的评估结果作为训练数据,构建出若干训练样本,每一条训练样本包含同一个用户在所述各数据提供方的评估结果;其中,所述训练样本根据用户对业务的实际执行情况被标定了标签;
基于所述若干训练样本以及各个训练样本的标签对模型进行训练,得到训练好的模型。
可选的,其中,所述训练好的模型为线性模型。
可选的,所述评估模型为用户风险评估模型;所述评估结果为风险评分;所述标签指示用户是否为风险用户。
本说明书还提出一种数据预测的方法,包括:
接收多个数据提供方上传的若干用户的评估结果;其中,所述评估结果为各数据提供方基于自己的评估模型对用户分别进行评估后得到;
将各数据提供方上传的评估结果作为训练数据,构建出若干训练样本,每一条训练样本包含同一个用户在所述各数据提供方的评估结果;其中,所述训练样本根据用户对业务的实际执行情况被标定了标签;
基于所述若干训练样本以及各个训练样本的标签对模型进行训练,得到训练好的模型;
接收多个数据提供方上传的针对某一个用户的评估结果,将所述评估结果输入所述训练好的模型中,得到该用户的最终的评估结果。
本说明书还提出一种虚拟资源分配系统,包括:
多个数据提供方服务端,向风险评估方服务端上传若干用户的评估结果;其中,所述评估结果为各数据提供方基于自己的评估模型对用户分别进行评估后得到;
风险评估方服务端,将各数据提供方上传的评估结果作为训练数据,构建出若干训练样本,每一条训练样本包含同一个用户在所述各数据提供方的评估结果;其中,所述训练样本根据用户对业务的实际执行情况被标定了标签;以及,基于所述若干训练样本以及各个训练样本的标签对模型进行训练,将训练好的模型中的各变量的系数作为各数据提供方贡献度,基于各数据提供方的贡献度为各数据提供方分配虚拟资源。
本说明书还提出一种电子设备,包括:
处理器;
用于存储机器可执行指令的存储器;
其中,通过读取并执行所述存储器存储的与虚拟资源分配的控制逻辑对应的机器可执行指令,所述处理器被促使:
接收多个数据提供方上传的若干用户的评估结果;其中,所述评估结果为各数据提供方基于自己的评估模型对用户分别进行评估后得到;
将各数据提供方上传的评估结果作为训练数据,构建出若干训练样本,每一条训练样本包含同一个用户在所述各数据提供方的评估结果;其中,所述训练样本根据用户对业务的实际执行情况被标定了标签;
基于所述若干训练样本以及各个训练样本的标签对模型进行训练,将训练好的模型中的各变量的系数作为各数据提供方贡献度,基于各数据提供方的贡献度为各数据提供方分配虚拟资源。
本说明书中,多个数据提供方可以基于自己的评估模型对若干用户分别进行评估后得到的评估结果上传给风险评估方;而风险评估方可以将各数据提供方上传的评估结果作为训练数据,构建出若干训练样本,来训练模型,并将训练好的模型中的与各变量对应的系数作为各数据提供方的贡献度,然后可以基于该贡献度为各数据提供方分配虚拟资源:
一方面,由于风险评估方在基于各数据提供方维护的用户数据训练模型时,数据提供方仅需要向风险评估方传输对若干用户进行初步评估后得到的评估结果,因此对于数据提供方而言,不再需要将本地维护的原始用户数据 传输至风险评估方,可以显著降低用户隐私泄露的发生风险;
另一方面,由于训练好的模型中的各变量的系数能够真实的反映各数据提供方对训练越好的模型的贡献度,因此基于该贡献度向各数据提供方分配虚拟资源,能够做到虚拟资源的合理分配。
附图说明
图1是本说明书一实施例示出的一种虚拟资源分配方法的流程图;
图2是本说明书一实施例示出的一种风险评估方基于多个数据提供方上传的评估结果训练模型的示意图;
图3是本说明书一实施例示出的一种模型建立方法的流程图;
图4是本说明书一实施例示出的一种数据预测方法的流程图;
图5是本说明书一实施例提供的承载虚拟资源分配装置的电子设备所涉及的硬件结构图;
图6是本说明书一实施例提供的所述虚拟资源分配装置的逻辑框图。
具体实施方式
在大数据时代,通过对海量数据进行挖掘,可以获得各种形式的有用信息,因此数据的重要性不言而喻。不同的机构都拥有各自的数据,但是任何一家机构的数据挖掘效果,都会受限于其自身拥有的数据数量和种类。针对该问题,一种直接的解决思路是:多家机构相互合作,将数据进行共享,从而实现更好的数据挖掘效果,实现共赢。
然而对于数据拥有方而言,数据本身是一种具有很大价值的资产,而且出于保护隐私、防止泄露等需求,数据拥有方往往并不愿意直接把数据提供出来,这种状况导致“数据共享”在现实中很难实际运作。因此,如何在充分保证数据安全的前提下实现数据共享,已经成为行业内备受关注的问题。
本说明书中,则旨在提出一种风险评估方方在“共享”多个数据提供方维护的用户数据来训练模型时,各个数据提供方不再需要将原始的用户数据传输至 风险评估方,就可以完成“数据共享”的技术方案。
在实现时,各个数据提供方可以分别基于机器学习算法对其本地维护的用户数据进行训练,构建用户评估模型,并使用该用户评估模型对若干样本用户进行评估,然后将评估结果上传给风险评估方。
而风险评估方可以将各数据提供方上传的评估结果作为训练数据,来构建出若干训练样本;其中,每一条训练样本包含同一个用户在各数据提供方的评估结果。例如,可以将各数据提供方上传的对某一个用户的评估结果分别作为建模特征,来构建一个特征向量作为训练样本。
并且,构建完成的训练样本可以根据用户对业务的实际执行情况对训练样本标定对应的标签;例如,在信贷发放的业务场景下,为各训练样本标定的标签,具体可以是基于用户真实的还款情况,为用户标定的能够指示该用户是否为风险用户的用户标签。
最后,风险评估方可以基于构建完成的训练样本以及与各个训练样本对模型进行训练,并将训练好的模型中的各变量的系数作为各数据提供方对模型的贡献度,然后基于各数据提供方的贡献度为各数据提供方分配虚拟资源。
一方面,由于风险评估方在基于各数据提供方维护的用户数据训练模型时,数据提供方仅需要向风险评估方传输对若干用户进行初步评估后得到的评估结果,因此对于数据提供方而言,不再需要将本地维护的原始用户数据传输至风险评估方,可以显著降低用户隐私泄露的发生风险;
另一方面,由于训练好的模型中的各变量的系数能够真实的反映各数据提供方对训练越好的模型的贡献度,因此基于该贡献度向各数据提供方分配虚拟资源,能够做到虚拟资源的合理分配。
例如,以信贷发放的业务场景为例,上述用户评估模型具体可以是一个用于决策用户是否为风险用户的用户风险评估模型;而上述评估结果则可以是利用上述用户风险评估模型对用户进行风险评估后输出的一个风险评分。
在这种场景下,各个数据提供方可以基于自有的用户数据构建用户风险评估模型;当风险评估方(比如,可以是贷款的发放一方)需要共享各数据提供 方的用户数据来训练用户风险评估模型时,可以将各数据提供方上传的评估结果作为训练数据,来构建出若干训练样本,并基于用户真实的还款情况,为各训练样本标定能够指示该用户是否为风险用户的标签,然后可以基于构建完成的训练样本以及与各个训练样本对模型进行训练,并将训练好的模型中的各变量的系数作为各数据提供方对模型的贡献度,基于各数据提供方的贡献度为各数据提供方分配虚拟资源。从而,在整个过程中,各个数据提供方并不需要向风险评估方提供原始的用户数据,就可以完成“数据共享”。
下面通过具体实施例并结合具体的应用场景进行详细描述。
请参考图1,图1是本说明书一实施例提供的一种虚拟资源分配方法,应用于风险评估方服务端,执行以下步骤:
步骤102,接收多个数据提供方上传的若干用户的评估结果;其中,所述评估结果为各数据提供方基于自己的评估模型对用户分别进行评估后得到;
步骤104,将各数据提供方上传的评估结果作为训练数据,构建出若干训练样本,每一条训练样本包含同一个用户在所述各数据提供方的评估结果;其中,所述训练样本根据用户对业务的实际执行情况被标定了标签;
步骤106,基于所述若干训练样本以及各个训练样本的标签对模型进行训练,将训练好的模型中的各变量的系数作为各数据提供方贡献度,基于各数据提供方的贡献度为各数据提供方分配虚拟资源。
上述数据提供方,具体可以包括与上述风险评估方具有合作关系的一方。在实际应用中,上述数据提供方和风险评估方可以分别对应不同的运营方;例如,上述建模方可以是A公司的数据运营平台,而上述数据提供方可以是与A公司的数据运营平台对接的诸如电商平台、第三方银行、快递公司、其它金融机构、电信运营商等服务平台。
上述所述用户评估模型,具体可以包括任意类型的用于对用户进行评估的机器学习模型;
例如,在示出的一种实施方式中,上述用户评估模型,具体可以是基于特定的机器学习算法训练出的用户风险评估模型(比如,用于对用户进行风 险评估的线性逻辑回归模型或者评分卡模型);相应的,通过该用户评估模型对用户进行评估后输出的评估结果,则可以是表征该用户的风险水平的风险评分;其中,在实际应用中,该风险评分通常是一个0-1之间的浮点数值(比如,上述风险评分具体可以是一个表征用户风险水平的概率值);或者,上述评估结果也可以是风险评分以外其它形式的评分,比如,信用评分等。
在本说明书中,为了降低将原始的用户数据传输至风险评估方进行建模,而造成的用户隐私泄露的风险,各个数据提供方可以不再需要将本地维护的原始用户数据,传输至风险评估方,而是各自利用本地维护的原始用户数据分别进行建模。
在实现时,各个数据提供方的服务端可以分别在后台收集用户日常产生的用户数据,从收集到的这些用户数据中采集若干条用户数据作为数据样本,并基于采集到的这些数据样本生成一个初始化的数据样本集合。
其中,采集到的上述数据样本的具体数量,在本说明书中不进行特别限定,本领域技术人员可以基于实际的需求进行设置。
上述用户数据的具体形态,取决于具体的业务场景以及建模需求,具体可以涵盖任意类型的,从中可以提取出用于训练用户评估模型的建模特征的用户数据,在本说明书中也不进行特别限定;
例如,在实际应用中,如果希望创建出一个用于对用户发起的贷款申请,或者支付交易进行风险评估的评分卡模型,上述用户数据则可以包括诸如用户的交易数据、购物记录、还款记录、消费记录、理财产品购买记录等,能够从中提取出用于训练风险评估模型的建模特征的用户数据。
当基于采集到的数据样本生成上述数据样本集合后,上述数据提供方服务端还可以对该数据样本集合中的数据样本进行预处理。
其中,对上述数据样本集合中的数据样本进行预处理,通常包括对上述数据样本集合中的数据样本进行数据清洗、补充缺省值、归一化处理或者其它形式的预处理过程。通过对数据样本集合中的数据样本进行预处理,可以将采集到的数据样本转换成适宜进行模型训练的标准化的数据样本。
当对上述数据样本集合中的数据样本预处理完成后,上述数据提供方服务端可以从上述数据样本集合中的各数据样本中,分别提取出若干个维度的数据特征(即最终参与建模的建模特征)。其中,提取出的上述若干个维度的数据特征的数量,在本说明书中不进行特别限定,本领域技术人员可以基于实际的建模需求进行选择。
另外,提取出的数据特征的具体类型,在本说明书中也不进行特别限定,本领域技术人员可以基于实际的建模需求,从上述数据样本中实际所包含的信息中来人工选定。
当从数据样本中分别提取出若干个维度的数据特征后,上述数据提供方服务端可以基于提取出的这些维度的数据特征对应的数据特征取值,为各数据样本分别生成一个数据特征向量,然后基于各数据样本的数据特征向量,构建出一个目标矩阵;比如,以从N条数据样本中分别提取出M个维度的数据特征为例,上述目标矩阵则可以是一个N*M维的矩阵。
此时,构建出的上述目标矩阵,即最终进行模型训练的训练样本集,各个数据提供方服务端,可以基于特定的机器学习算法,将上述目标矩阵作为原始的样本训练集进行机器学习,分别训练出一个用户评估模型。
其中,需要说明的是,各个数据提供方在训练用户评估模型时所采用的机器学习算法,可以相同也可以不同,在本说明书中不进行特别限定。
在本说明书中,上述机器学习模型具体可以是有监督的机器学习模型;例如,上述机器学习模型具体可以是LR(Logistic Regression,逻辑回归)模型。
在这种情况下,上述训练样本集中的每一条数据样本,可以分别携带一个预先标定的样本标签。其中,该样本标签的具体形态,通常也取决于具体的业务场景以及建模需求,在本说明书中也不进行特别限定;
例如,在实际应用中,如果希望创建出一个是否可以给用户发放贷款的模型,那么上述样本标签则具体可以是一个用于指示该用户是否为风险用户的用户标签;其中,该用户标签具体可以由风险评估方来标定和提供。在这 种情况下,上述目标矩阵中的每一个数据特征向量,可以分别对应一个样本标签。
其中,各个数据提供方基于有监督的机器学习算法训练用户评估模型的具体过程,在本说明书中不再进行详述,本领域技术人员在将本说明书记载的技术方案付诸实现时,可以参考相关技术中的记载。
例如,以上述有监督的机器学习算法为LR算法为例,在基于LR算法训练逻辑回归模型时,通常可以采用损失函数(Loss Function)来评估训练样本和对应的样本标签之间的拟合误差。在实现时,可以将训练样本和对应的样本标签作为输入值输入至损失函数中,并采用梯度下降法进行反复迭代计算,直至收敛,进而可以求解出模型参数(即训练样本中各个建模特征的最优权重值,该权重值可以表征各个建模特征对模型输出结果的贡献度)的取值,然后将求解出的该模型参数的取值作为最优参数,来构建上述逻辑回归模型。
请参见图2,图2为本说明书示出的一种风险评估方基于多个数据提供方上传的评估结果训练模型的示意图。
在初始状态下,上述风险评估方可以预先准备若干样本用户,并将这些样本用户的用户ID通知给各个数据提供方;例如,在实现时可以将各样本用户的用户ID以列表的形式下发至各个数据提供方。
各数据提供方在收到上述若干样本用户的用户ID后,可以使用自己的用户评估模型对各样本用户分别进行用户评估,然后将评估结果上传给风险评估方,由上述风险评估方进行建模。
当然,如果各用户提供方发给风险评估方的评估结果中对于相同的用户本身就具有相同的ID,那么风险评估方无需将样本用户的用户ID通知给各个数据提供方。
可见,通过这种方式,对于各个数据提供方而言,不再需要将本地维护的原始的用户数据“共享”给风险评估方,只需要向风险评估方“共享”一个对用户的初步的评估结果即可。
一方面,数据提供方“共享”给风险评估方的初步的评估结果,可以理解为对本地维护的用户数据的一个降维;即各个数据提供方“共享”的初步的评估结果,可以看作是一个将本地维护的用户数据,降低到维度为1的一个数据特征。
另一方面,由于上述初步的评估结果,是由各个数据提供方基于对本地维护的用户数据进行机器学习建模得出的,因此将初步的评估结果“共享”给风险评估方,相当于是将基于机器学习从本地维护的用户数据中学习分析出的数据价值,共享给风险评估方。虽然各个数据提供方没有将原始的用户数据“共享”给风险评估方,但通过数据价值“共享”,仍然能够达到数据共享的目的。
在本说明书中,风险评估方在收到各数据提供方上传的对应于这些样本用户的评估结果后,可以将各数据提供方上传的评估结果作为训练数据,为各样本用户分别创建出一条对应的训练样本。
此时,构建完成的每一个训练样本中,将包含各个数据提供方基于训练完成上述用户评估模型,对与该训练样本对应的样本用户进行初步的评估后,得到的评估结果。而对于每一个数据提供方的评估结果而言,将分别对应上述训练样本中的一个特征变量。
其中,上述特征变量具体是指构成上述训练样本的特征字段,在本说明书中,每一个训练样本中将包含若干个特征字段,而每个特征字段将分别对应一个数据提供方上传的评估结果。
当为各样本用户分别创建出对应的训练样本后,还可以基于创建的训练样本生成一个训练样本集,并基于各样本用户对业务的实际执行情况对训练样本标定对应的标签;例如,在信贷发放的业务场景下,为各训练样本标定的标签,具体可以是基于用户真实的还款情况,为用户标定的能够指示该用户是否为风险用户的用户标签。在这种场景下,风险评估方可以基于各样本用户最终是否违约还款,对各样本用户标定用户标签;比如,假设最终向某一样本用户发放贷款后,该用户出现了违约还款,那么最终在训练样本集中, 与该样本用户对应的训练样本将被标记一个用于指示该用户为风险用户的标签。
当为训练样本集中的训练样本分别标定了用户标签后,上述风险评估方服务端,可以基于构建的训练样本集以及与各训练样本对应的标签,对预设的机器学习模型进行训练。
在示出的一种实施方式中,由于各数据提供方上传的对同一个用户的评估结果,与该用户的用户标签(即最终的用户评估结果)之间,可能会保持着一定的线性关系;
例如,对于风险评估方而言,可以通过将各个数据提供方上传的对同一个用户的评估结果乘以相应的系数进行相加计算,然后将计算结果作为针对该用户的最终评估结果。
因此,在本实施例中,风险评估方一侧训练的机器学习模型,具体可以是线性模型;例如,在实际应用中,风险评估方一侧训练的机器学习模型,可以是线性逻辑回归模型。
其中,风险评估方在基于构建的训练样本集以及与各训练样本对应的标签,对线性模型进行训练的过程,即为将各数据提供方上传的评估结果作为自变量,将对应用户标签作为因变量,代入到线性模型的表达式中进行线性拟合,求解出与各自变量对应的系数的过程,其具体的实施过程,在本说明书不再进行详述,本领域技术人员在将本说明书中的技术方案付诸实现时,可以参考相关技术中的记载。
在本说明书中,当风险评估方通过以上训练过程,训练出与训练样本中的各变量(即各数据提供方上传的评估结果)对应的系数后,此时模型训练完毕。
当风险评估方通过使用各个数据提供方上传的对目标用户的初步评估结果,完成模型的训练后,还可以基于各个数据提供方对训练好的模型的贡献度为各个数据提供方分配一定数额的虚拟资源;其中,为各数据提供方分配的虚拟资源的数量,可以与各个数据提供方的权重值(即系数)成正比;
在示出的一种实施方式中,为各数据提供方分配的虚拟资源,具体可以是风险评估方向各数据提供方发放的用户数据使用资金。在这种情况下,上述风险评估方可以基于各个数据提供方对训练好的模型的贡献度,对能够向各个数据提供方发放的用户数据使用资金,进行利益分配。
其中,在示出的一种实现方式中,各个数据提供方对训练好的模型的贡献度,具体可以用训练出的与训练样本中的各变量对应的系统来表征。在这种情况下,当风险评估方通过以上示出的模型训练过程,训练出了与训练样本中的各变量对应的系数后,可以将训练出的各变量对应的系数作为各个数据提供方的贡献度,然后基于各变量对应的系数大小为各数据提供方进行利益分配;
例如,风险评估方可以将各变量的系数作为对模型的贡献度,换算出对应的分配比例,然后按照换算出的分配比例,将能够用于向各数据提供方分配的用户数据使用资金的总额,分别分配给各个数据提供方。在这种情况下,对模型的贡献较高的数据提供方,将可以拿到更多的数据使用资金。通过这种方式,可以使高质量的数据提供方能够获益更多,从而迫使各个数据提供方能够不断去提高各自的数据质量。
当然,在实际应用中,在初始状态下,如果风险评估方无法收集到足够的训练样本完成模型的训练,那么可以为该模型中的各变量设置一个初始的系数,并利用该初始的系数来表征各数据提供方对模型的初始贡献度。
其中,上述初始贡献度的设置策略,在本说明书中不进行特别限定,本领域技术人员在将本说明书的技术方案付诸实现时,可以基于实际的需求来进行设置;
例如,在一种实现方式中,当风险评估方的服务端在设备冷启动(即设备首次开机运行)时,可以采用加权平均的方式,为模型中的各变量设置一个相同的初始的系数,并基于该初始的系数作为各数据提供方的初始贡献度,来为各数据提供方平均分配虚拟资源。
例如,以风险评估方为各数据提供方分配的虚拟资源,为风险评估方向 各数据提供方发放的用户数据使用资金为例,在这种情况下,上述风险评估方可以基于各数据提供方的初始贡献度,对能够用于向各数据提供方分配的用户数据使用资金的总额,平均分配给各个数据提供方。
可见,通过这种方式,由于训练好的模型中的各变量的系数能够真实的反映各数据提供方对训练越好的模型的贡献度,因此基于该贡献度向各数据提供方分配虚拟资源,能够做到虚拟资源的合理分配。
请继续参见图2,当风险评估方完成模型的训练后,后续可以使用训练好的模型对某一个目标用户来进行风险评估。
其中,上述目标用户,具体可以包括上述风险评估方需要进行风险评估的用户;例如,以信贷发放的业务场景为例,上述风险评估方,具体可以是贷款发放的一方;而上述目标用户具体可以是指发起了一笔贷款申请,需要由风险评估方进行风险评估和决策是否发放贷款的用户。
当该多个数据提供方在收到该目标用户的用户ID后,可以基于该用户ID查找已经利用自己的用户评估模型评估完成的评估结果,然后将评估结果上传给风险评估方。
而风险评估方在收到各数据提供方针对该目标用户的评估结果后,可以将各数据提供方上传的评估结果作为训练数据,为该目标用户创建出一条对应的预测样本,然后将该预测样本输入至训练好的模型中进行预测计算,得到该用户的最终的评估结果,并基于该最终的评估结果来进行相应的业务决策。
例如,仍以信贷发放的业务场景为例,在这种场景下,上述最终的评估结果仍然可以是风险评分;风险评估方在基于该风险评分来决策是否向该用户发放贷款时,可以通过将该风险评分与预设的风险阈值进行比较来实现;一方面,如果风险评分高于或者等于上述风险阈值,则表明上述目标用户为风险用户,此时可以为该用户标定一个用于指示该用户为风险用户的用户标签,并终止该用户发起的贷款申请。
另一方面,如果上述风险评分低于上述风险阈值,则表明上述目标用户 为低风险用户,可以为该用户标定一个用于指示该用户为低风险用户的用户标签,并正常响应该用户发起的贷款申请,向该用户发放贷款。
当然,为该目标用户发放贷款后,后续还可以基于该目标用户最终是否违约还款,对已经为该用户标定的用户标签进行维护和更新;比如,假设目标用户被标定为非风险用户,最终向该用户发放贷款后,如果该用户出现违约还款,那么可以立即对已经标定的用户标签进行更新,将该用户重新标定为风险用户。
最后,需要补充说明的是,在本说明书中,与上述风险评估方具有合作关系的数据提供方,可以是动态变化的。
作为数据建模方,可以支持任何一个数据提供方随时退出“数据共享”,也可以支持任何一个数据提供方随时加入“数据共享”。即对于风险评估方而言,可以不需要关注与其具有合作关系的数据提供方的数量以及类型,仅需要对当前与自身保持合作关系的各个数据提供方上传的对上述目标用户的初步的评估结果进行加权计算即可。可见,在本说明书中,上述风险评估方,可以灵活的与不同类型的数据提供方进行对接。
通过以上各实施例可知,一方面,由于风险评估方在基于各数据提供方维护的用户数据训练模型时,数据提供方仅需要向风险评估方风险评估方传输对该用户进行初步评估后得到的评估结果,因此对于数据提供方而言,不再需要将本地维护的原始用户数据传输至风险评估方,可以显著降低用户隐私泄露的发生风险;
另一方面,由于训练好的模型中的各变量的系数能够真实的反映各数据提供方对训练越好的模型的贡献度,因此基于该贡献度向各数据提供方分配虚拟资源,能够做到虚拟资源的合理分配。
请参见图3,与上述方法实施例相对应,本说明书还提供了一种模型建立方法,应用于风险评估方服务端,执行以下步骤:
步骤302,接收多个数据提供方上传的若干用户的评估结果;其中,所述评估结果为各数据提供方基于自己的评估模型对用户分别进行评估后得到;
步骤304,将各数据提供方上传的评估结果作为训练数据,构建出若干训练样本,每一条训练样本包含同一个用户在所述各数据提供方的评估结果;其中,所述训练样本根据用户对业务的实际执行情况被标定了标签;
步骤306,基于所述若干训练样本以及各个训练样本的标签对模型进行训练,得到训练好的模型。
在本实施例中,上述训练好的模型可以为线性模型。例如,在实际应用中,可以是线性逻辑回归模型。上述评估模型可以为用户风险评估模型;上述评估结果可以为风险评分(或信用评分);上述标签指示用户是否为风险用户。
其中,以上各步骤的具体实施细节,在本实施例中不再进行赘述,本领域技术人员可以参考之前实施例的记载。
请参见图4,与上述方法实施例相对应,本说明书还提供了一种数据预测的方法,应用于风险评估方服务端,执行以下步骤:
步骤402,接收多个数据提供方上传的若干用户的评估结果;其中,所述评估结果为各数据提供方基于自己的评估模型对用户分别进行评估后得到;
步骤404,将各数据提供方上传的评估结果作为训练数据,构建出若干训练样本,每一条训练样本包含同一个用户在所述各数据提供方的评估结果;其中,所述训练样本根据用户对业务的实际执行情况被标定了标签;
步骤406,基于所述若干训练样本以及各个训练样本的标签对模型进行训练,得到训练好的模型;
步骤408,接收多个数据提供方上传的针对某一个用户的评估结果,将所述评估结果输入所述训练好的模型中,得到该用户的最终的评估结果。
其中,以上各步骤的具体实施细节,在本实施例中不再进行赘述,本领域技术人员可以参考之前实施例的记载。与上述方法实施例相对应,本说明书还提供了一种虚拟资源分配装置的实施例。
本说明书的虚拟资源分配装置的实施例可以应用在电子设备上。装置实施例可以通过软件实现,也可以通过硬件或者软硬件结合的方式实现。以软 件实现为例,作为一个逻辑意义上的装置,是通过其所在电子设备的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言,如图5所示,为本说明书的虚拟资源分配装置所在电子设备的一种硬件结构图,除了图5所示的处理器、内存、网络接口、以及非易失性存储器之外,实施例中装置所在的电子设备通常根据该电子设备的实际功能,还可以包括其他硬件,对此不再赘述。
图6是本说明书一示例性实施例示出的一种虚拟资源分配装置的框图。
请参考图6,所述虚拟资源分配装置60可以应用在前述图6所示的电子设备中,包括有:接收模块601、训练模块602、分配模块603。
其中,接收模块601,接收多个数据提供方上传的若干用户的评估结果;其中,所述评估结果为各数据提供方基于自己的评估模型对用户分别进行评估后得到;
训练模块602,将各数据提供方上传的评估结果作为训练数据,构建出若干训练样本,每一条训练样本包含同一个用户在所述各数据提供方的评估结果;其中,所述训练样本根据用户对业务的实际执行情况被标定了标签;
分配模块603,基于所述若干训练样本以及各个训练样本的标签对模型进行训练,将训练好的模型中的各变量的系数作为各数据提供方贡献度,基于各数据提供方的贡献度为各数据提供方分配虚拟资源。
在本实施例中,所述训练好的模型为线性模型。
在本实施例中,为各数据提供方分配的所述虚拟资源的数量,与各数据提供方的贡献度成正比。
在本实施例中,所述装置还包括:
评估模块604(图6中未示出),接收多个数据提供方上传的针对某一个用户的评估结果,将所述评估结果输入所述训练好的模型中,得到该用户的最终的评估结果。
在本实施例中,所述虚拟资源为向各数据提供方发放的用户数据使用资金。
在本实施例中,所述评估模型为用户风险评估模型;所述评估结果为风险评分;所述标签指示用户是否为风险用户。
上述装置中各个模块的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本说明书方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。
与上述方法实施例相对应,本说明书还提供一种虚拟资源分配系统的实施例。
该虚拟资源分配系统,可以包括多个数据提供方服务端和风险评估方服务端。
其中,多个数据提供方服务端,向风险评估方服务端上传若干用户的评估结果;其中,所述评估结果为各数据提供方基于自己的评估模型对用户分别进行评估后得到;
风险评估方服务端,将各数据提供方上传的评估结果作为训练数据,构建出若干训练样本,每一条训练样本包含同一个用户在所述各数据提供方的评估结果;其中,所述训练样本根据用户对业务的实际执行情况被标定了标 签;以及,基于所述若干训练样本以及各个训练样本的标签对模型进行训练,将训练好的模型中的各变量的系数作为各数据提供方贡献度,基于各数据提供方的贡献度为各数据提供方分配虚拟资源。
与上述方法实施例相对应,本说明书还提供了一种电子设备的实施例。该电子设备包括:处理器以及用于存储机器可执行指令的存储器;其中,处理器和存储器通常通过内部总线相互连接。在其他可能的实现方式中,所述设备还可能包括外部接口,以能够与其他设备或者部件进行通信。
在本实施例中,通过读取并执行所述存储器存储的与上述虚拟资源分配的控制逻辑对应的机器可执行指令,所述处理器被促使:
接收多个数据提供方上传的若干用户的评估结果;其中,所述评估结果为各数据提供方基于自己的评估模型对用户分别进行评估后得到;
将各数据提供方上传的评估结果作为训练数据,构建出若干训练样本,每一条训练样本包含同一个用户在所述各数据提供方的评估结果;其中,所述训练样本根据用户对业务的实际执行情况被标定了标签;
基于所述若干训练样本以及各个训练样本的标签对模型进行训练,将训练好的模型中的各变量的系数作为各数据提供方贡献度,基于各数据提供方的贡献度为各数据提供方分配虚拟资源。
在本实施例中,所述训练好的模型为线性模型。
在本实施例中,为各数据提供方分配的所述虚拟资源的数量,与各数据提供方的贡献度成正比。
在本实施例中,通过读取并执行所述存储器存储的与上述虚拟资源分配的控制逻辑对应的机器可执行指令,所述处理器被促使:
接收多个数据提供方上传的针对某一个用户的评估结果,将所述评估结果输入所述训练好的模型中,得到该用户的最终的评估结果。
在本实施例中,所述虚拟资源为向各数据提供方发放的用户数据使用资金。
在本实施例中,所述评估模型为用户风险评估模型;所述评估结果为风 险评分;所述标签指示用户是否为风险用户。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本说明书的其它实施方案。本说明书旨在涵盖本说明书的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本说明书的一般性原理并包括本说明书未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本说明书的真正范围和精神由下面的权利要求指出。
应当理解的是,本说明书并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本说明书的范围仅由所附的权利要求来限制。
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。

Claims (18)

  1. 一种虚拟资源分配方法,包括:
    接收多个数据提供方上传的若干用户的评估结果;其中,所述评估结果为各数据提供方基于自己的评估模型对用户分别进行评估后得到;
    将各数据提供方上传的评估结果作为训练数据,构建出若干训练样本,每一条训练样本包含同一个用户在所述各数据提供方的评估结果;其中,所述训练样本根据用户对业务的实际执行情况被标定了标签;
    基于所述若干训练样本以及各个训练样本的标签对模型进行训练,将训练好的模型中的各变量的系数作为各数据提供方贡献度,基于各数据提供方的贡献度为各数据提供方分配虚拟资源。
  2. 根据权利要求1所述的方法,其中,所述训练好的模型为线性模型。
  3. 根据权利要求1所述的方法,其中,为各数据提供方分配的所述虚拟资源的数量,与各数据提供方的贡献度成正比。
  4. 根据权利要求1所述的方法,还包括:
    接收多个数据提供方上传的针对某一个用户的评估结果,将所述评估结果输入所述训练好的模型中,得到该用户的最终的评估结果。
  5. 根据权利要求3所述的方法,所述虚拟资源为向各数据提供方发放的用户数据使用资金。
  6. 根据权利要求1所述的方法,所述评估模型为用户风险评估模型;所述评估结果为风险评分;所述标签指示用户是否为风险用户。
  7. 一种虚拟资源分配装置,包括:
    接收模块,接收多个数据提供方上传的若干用户的评估结果;其中,所述评估结果为各数据提供方基于自己的评估模型对用户分别进行评估后得到;
    训练模块,将各数据提供方上传的评估结果作为训练数据,构建出若干训练样本,每一条训练样本包含同一个用户在所述各数据提供方的评估结果;其中,所述训练样本根据用户对业务的实际执行情况被标定了标签;
    分配模块,基于所述若干训练样本以及各个训练样本的标签对模型进行训练,将训练好的模型中的各变量的系数作为各数据提供方贡献度,基于各数据提供方的贡献度为各数据提供方分配虚拟资源。
  8. 根据权利要求7所述的装置,其中,所述训练好的模型为线性模型。
  9. 根据权利要求7所述的装置,其中,为各数据提供方分配的所述虚拟资源的数量,与各数据提供方的贡献度成正比。
  10. 根据权利要求7所述的装置,还包括:
    评估模块,接收多个数据提供方上传的针对某一个用户的评估结果,将所述评估结果输入所述训练好的模型中,得到该用户的最终的评估结果。
  11. 根据权利要求9所述的装置,所述虚拟资源为向各数据提供方发放的用户数据使用资金。
  12. 根据权利要求7所述的装置,所述评估模型为用户风险评估模型;所述评估结果为风险评分;所述标签指示用户是否为风险用户。
  13. 一种模型建立方法,包括:
    接收多个数据提供方上传的若干用户的评估结果;其中,所述评估结果为各数据提供方基于自己的评估模型对用户分别进行评估后得到;
    将各数据提供方上传的评估结果作为训练数据,构建出若干训练样本,每一条训练样本包含同一个用户在所述各数据提供方的评估结果;其中,所述训练样本根据用户对业务的实际执行情况被标定了标签;
    基于所述若干训练样本以及各个训练样本的标签对模型进行训练,得到训练好的模型。
  14. 如权利要求13所述的方法,其中,所述训练好的模型为线性模型。
  15. 如权利要求13所述的方法,所述评估模型为用户风险评估模型;所述评估结果为风险评分;所述标签指示用户是否为风险用户。
  16. 一种利用如权利要求13至15任一项建立的模型进行数据预测的方法,包括,接收多个数据提供方上传的针对某一个用户的评估结果,将所述评估结果输入所述训练好的模型中,得到该用户的最终的评估结果。
  17. 一种虚拟资源分配系统,包括:
    多个数据提供方服务端,向风险评估方服务端上传若干用户的评估结果;其中,所述评估结果为各数据提供方基于自己的评估模型对用户分别进行评估后得到;
    风险评估方服务端,将各数据提供方上传的评估结果作为训练数据,构建出若干训练样本,每一条训练样本包含同一个用户在所述各数据提供方的评估结果;其中,所述训练样本根据用户对业务的实际执行情况被标定了标签;以及,基于所述若干训练样本以及各个训练样本的标签对模型进行训练,将训练好的模型中的各变量的系数作为各数据提供方贡献度,基于各数据提供方的贡献度为各数据提供方分配虚拟资源。
  18. 一种电子设备,包括:
    处理器;
    用于存储机器可执行指令的存储器;
    其中,通过读取并执行所述存储器存储的与虚拟资源分配的控制逻辑对应的机器可执行指令,所述处理器被促使:
    接收多个数据提供方上传的若干用户的评估结果;其中,所述评估结果为各数据提供方基于自己的评估模型对用户分别进行评估后得到;
    将各数据提供方上传的评估结果作为训练数据,构建出若干训练样本,每一条训练样本包含同一个用户在所述各数据提供方的评估结果;其中,所述训练样本根据用户对业务的实际执行情况被标定了标签;
    基于所述若干训练样本以及各个训练样本的标签对模型进行训练,将训练好的模型中的各变量的系数作为各数据提供方贡献度,基于各数据提供方的贡献度为各数据提供方分配虚拟资源。
PCT/CN2018/107261 2017-09-27 2018-09-25 虚拟资源分配、模型建立、数据预测方法及装置 WO2019062697A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP18861936.5A EP3617983A4 (en) 2017-09-27 2018-09-25 METHOD AND DEVICE FOR ALLOCATING VIRTUAL RESOURCES, MODELING AND DATA PREDICTION
US16/697,913 US10691494B2 (en) 2017-09-27 2019-11-27 Method and device for virtual resource allocation, modeling, and data prediction
US16/907,637 US10891161B2 (en) 2017-09-27 2020-06-22 Method and device for virtual resource allocation, modeling, and data prediction

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710890033.1 2017-09-27
CN201710890033.1A CN109559214A (zh) 2017-09-27 2017-09-27 虚拟资源分配、模型建立、数据预测方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/697,913 Continuation US10691494B2 (en) 2017-09-27 2019-11-27 Method and device for virtual resource allocation, modeling, and data prediction

Publications (1)

Publication Number Publication Date
WO2019062697A1 true WO2019062697A1 (zh) 2019-04-04

Family

ID=65863622

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/107261 WO2019062697A1 (zh) 2017-09-27 2018-09-25 虚拟资源分配、模型建立、数据预测方法及装置

Country Status (5)

Country Link
US (2) US10691494B2 (zh)
EP (1) EP3617983A4 (zh)
CN (1) CN109559214A (zh)
TW (1) TWI687876B (zh)
WO (1) WO2019062697A1 (zh)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2599816B (en) * 2017-06-16 2022-10-12 Soter Analytics Pty Ltd A system for monitoring core body movement
CN109559214A (zh) 2017-09-27 2019-04-02 阿里巴巴集团控股有限公司 虚拟资源分配、模型建立、数据预测方法及装置
EP3503012A1 (en) * 2017-12-20 2019-06-26 Accenture Global Solutions Limited Analytics engine for multiple blockchain nodes
CN110110970A (zh) * 2019-04-12 2019-08-09 平安信托有限责任公司 虚拟资源风险评级方法、系统、计算机设备和存储介质
CN110162995B (zh) * 2019-04-22 2023-01-10 创新先进技术有限公司 评估数据贡献程度的方法及其装置
CN110232403B (zh) * 2019-05-15 2024-02-27 腾讯科技(深圳)有限公司 一种标签预测方法、装置、电子设备及介质
CN110851482B (zh) * 2019-11-07 2022-02-18 支付宝(杭州)信息技术有限公司 为多个数据方提供数据模型的方法及装置
CN111401914B (zh) * 2020-04-02 2022-07-22 支付宝(杭州)信息技术有限公司 风险评估模型的训练、风险评估方法及装置
CN113626881A (zh) * 2020-05-07 2021-11-09 顺丰科技有限公司 对象评估方法、装置、电子设备及存储介质
CN111833179A (zh) * 2020-07-17 2020-10-27 浙江网商银行股份有限公司 资源分配平台、资源分配方法及装置
CN113762675A (zh) * 2020-10-27 2021-12-07 北京沃东天骏信息技术有限公司 信息生成方法、装置、服务器、系统和存储介质
CN113221989B (zh) * 2021-04-30 2022-09-02 浙江网商银行股份有限公司 基于分布式的评估模型训练方法、系统以及装置
US11704609B2 (en) 2021-06-10 2023-07-18 Bank Of America Corporation System for automatically balancing anticipated infrastructure demands
US11252036B1 (en) 2021-06-10 2022-02-15 Bank Of America Corporation System for evaluating and tuning resources for anticipated demands
WO2023097353A1 (en) * 2021-12-03 2023-06-08 Batnav Pty Ltd A method for curating information

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103051645A (zh) * 2011-10-11 2013-04-17 电子科技大学 P2p网络中基于分组的激励机制
CN104240016A (zh) * 2014-08-29 2014-12-24 广州华多网络科技有限公司 虚拟场所的用户管理方法及装置
CN104866969A (zh) * 2015-05-25 2015-08-26 百度在线网络技术(北京)有限公司 个人信用数据处理方法和装置
CN105556552A (zh) * 2013-03-13 2016-05-04 加迪安分析有限公司 欺诈探测和分析

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7444308B2 (en) * 2001-06-15 2008-10-28 Health Discovery Corporation Data mining platform for bioinformatics and other knowledge discovery
US7113932B2 (en) * 2001-02-07 2006-09-26 Mci, Llc Artificial intelligence trending system
US6963826B2 (en) * 2003-09-22 2005-11-08 C3I, Inc. Performance optimizer system and method
US8417715B1 (en) * 2007-12-19 2013-04-09 Tilmann Bruckhaus Platform independent plug-in methods and systems for data mining and analytics
US8655695B1 (en) * 2010-05-07 2014-02-18 Aol Advertising Inc. Systems and methods for generating expanded user segments
AU2011293350B2 (en) * 2010-08-24 2015-10-29 Solano Labs, Inc. Method and apparatus for clearing cloud compute demand
US20120209880A1 (en) * 2011-02-15 2012-08-16 General Electric Company Method of constructing a mixture model
US8630902B2 (en) * 2011-03-02 2014-01-14 Adobe Systems Incorporated Automatic classification of consumers into micro-segments
US8762299B1 (en) * 2011-06-27 2014-06-24 Google Inc. Customized predictive analytical model training
US10366335B2 (en) * 2012-08-31 2019-07-30 DataRobot, Inc. Systems and methods for symbolic analysis
US9436911B2 (en) * 2012-10-19 2016-09-06 Pearson Education, Inc. Neural networking system and methods
CN104954413B (zh) * 2014-03-31 2018-07-13 阿里巴巴集团控股有限公司 提供互联网应用服务的方法、系统、用户端设备及服务端
US9672474B2 (en) * 2014-06-30 2017-06-06 Amazon Technologies, Inc. Concurrent binning of machine learning data
CN105225149B (zh) * 2015-09-07 2018-04-27 腾讯科技(深圳)有限公司 一种征信评分确定方法及装置
US10628826B2 (en) * 2015-11-24 2020-04-21 Vesta Corporation Training and selection of multiple fraud detection models
CN107133628A (zh) * 2016-02-26 2017-09-05 阿里巴巴集团控股有限公司 一种建立数据识别模型的方法及装置
CN106127363B (zh) * 2016-06-12 2022-04-15 腾讯科技(深圳)有限公司 一种用户信用评估方法和装置
CN106204033A (zh) 2016-07-04 2016-12-07 首都师范大学 一种基于人脸识别和指纹识别的支付系统
CN106897918A (zh) * 2017-02-24 2017-06-27 上海易贷网金融信息服务有限公司 一种混合式机器学习信用评分模型构建方法
CN109559214A (zh) 2017-09-27 2019-04-02 阿里巴巴集团控股有限公司 虚拟资源分配、模型建立、数据预测方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103051645A (zh) * 2011-10-11 2013-04-17 电子科技大学 P2p网络中基于分组的激励机制
CN105556552A (zh) * 2013-03-13 2016-05-04 加迪安分析有限公司 欺诈探测和分析
CN104240016A (zh) * 2014-08-29 2014-12-24 广州华多网络科技有限公司 虚拟场所的用户管理方法及装置
CN104866969A (zh) * 2015-05-25 2015-08-26 百度在线网络技术(北京)有限公司 个人信用数据处理方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3617983A4 *

Also Published As

Publication number Publication date
US10691494B2 (en) 2020-06-23
TW201915847A (zh) 2019-04-16
EP3617983A1 (en) 2020-03-04
US20200319927A1 (en) 2020-10-08
US20200097329A1 (en) 2020-03-26
US10891161B2 (en) 2021-01-12
TWI687876B (zh) 2020-03-11
EP3617983A4 (en) 2020-05-06
CN109559214A (zh) 2019-04-02

Similar Documents

Publication Publication Date Title
WO2019062697A1 (zh) 虚拟资源分配、模型建立、数据预测方法及装置
TWI689841B (zh) 資料加密、機器學習模型訓練方法、裝置及電子設備
US9262493B1 (en) Data analytics lifecycle processes
JP2020522832A (ja) 信用力があると判定された消費者にローンを発行するシステムおよび方法
US20190266528A1 (en) System for Discovering Hidden Correlation Relationships for Risk Analysis Using Graph-Based Machine Learning
US11693634B2 (en) Building segment-specific executable program code for modeling outputs
Keller et al. A Reference Model to Support Risk Identification in Cloud Networks.
JP2017535857A (ja) 変換されたデータを用いた学習
US10726501B1 (en) Method to use transaction, account, and company similarity clusters derived from the historic transaction data to match new transactions to accounts
WO2020000689A1 (zh) 基于迁移学习的智能投顾策略生成方法及装置、电子设备、存储介质
CN111783039B (zh) 风险确定方法、装置、计算机系统和存储介质
CN111563267A (zh) 用于联邦特征工程数据处理的方法和装置
US11037236B1 (en) Algorithm and models for creditworthiness based on user entered data within financial management application
CN112348659A (zh) 用户风险识别策略的分配方法、装置及电子设备
CN110858253A (zh) 在数据隐私保护下执行机器学习的方法和系统
CN115063233A (zh) 一种银行业务服务流程的实现方法、系统及装置
Singh et al. Cloud computing adoption challenges in the banking industry
CN115578027A (zh) 一种数据质量评估方法、装置、电子设备及存储介质
CN112948889B (zh) 在数据隐私保护下执行机器学习的方法和系统
CN110363394B (zh) 一种基于云平台的风控服务方法、装置和电子设备
CN116091242A (zh) 推荐产品组合生成方法及装置、电子设备和存储介质
WO2023114637A1 (en) Computer-implemented system and method of facilitating artificial intelligence based lending strategies and business revenue management
Jarraya et al. Multiobjective optimization for the asset allocation of European nonlife insurance companies
CN111625572A (zh) 在数据隐私保护下执行机器学习的方法和系统
Malakani et al. Trading 4.0: An online peer-to-peer money lending platform

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18861936

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018861936

Country of ref document: EP

Effective date: 20191127

NENP Non-entry into the national phase

Ref country code: DE