CN112288573B - Method, device and equipment for constructing risk assessment model - Google Patents

Method, device and equipment for constructing risk assessment model Download PDF

Info

Publication number
CN112288573B
CN112288573B CN202011559506.8A CN202011559506A CN112288573B CN 112288573 B CN112288573 B CN 112288573B CN 202011559506 A CN202011559506 A CN 202011559506A CN 112288573 B CN112288573 B CN 112288573B
Authority
CN
China
Prior art keywords
risk assessment
assessment model
model
updated
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011559506.8A
Other languages
Chinese (zh)
Other versions
CN112288573A (en
Inventor
刘思玥
吴云崇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202111171644.3A priority Critical patent/CN113947471B/en
Priority to CN202011559506.8A priority patent/CN112288573B/en
Publication of CN112288573A publication Critical patent/CN112288573A/en
Application granted granted Critical
Publication of CN112288573B publication Critical patent/CN112288573B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the specification discloses a method, a device and equipment for constructing a risk assessment model. The method comprises the steps of obtaining a risk assessment model obtained by training of a second server of a second organization; the risk assessment model is obtained by training according to first sample data in the second institution; the first sample data is only allowed to be used in the device of the second institution; updating a first parameter in the risk assessment model according to second sample data in the first mechanism to obtain an updated risk assessment model; said second sample data is only allowed to be used in devices of said first institution; the updated risk assessment model is used for risk assessment of the data of the first institution and the second institution.

Description

Method, device and equipment for constructing risk assessment model
Technical Field
The present application relates to the field of risk compliance technologies, and in particular, to a method, an apparatus, and a device for constructing a risk assessment model.
Background
Compliance risk is widely present in various aspects of financial institution business and management, and in the internet industry, compliance risk may refer to a risk resulting from failure to keep consistent with national laws, regulations, policies, and industry paradigms or service level agreements during operation or internal management of an enterprise.
Wind management, which may include risk management and risk control. Risk management refers to the management process of how to minimize risks in an environment where a project or enterprise is at a certain risk. Risk control refers to the risk manager taking various measures and methods to eliminate or reduce the various possibilities of occurrence of a risk event, or to reduce the losses incurred when a risk event occurs. In the internet financial industry, wind control may include control of all possible risk events, involving personnel operational risks, business operational risks, technical operational risks, and risks posed by external events.
The traditional wind control modeling method is a grading card model, and modeling personnel need to gather data samples of organizations participating in modeling and model after the data samples are gathered locally, so that the modeling can be completed only by outputting data in each organization. This modeling behavior is highly likely to create compliance risks.
Therefore, it is desirable to provide a solution for establishing a risk assessment model to solve the compliance risk caused by modeling.
Disclosure of Invention
The embodiment of the specification provides a method, a device and equipment for constructing a risk assessment model, so as to solve the problem of violation caused by data out-of-domain existing in the existing modeling method.
In order to solve the above technical problem, the embodiments of the present specification are implemented as follows:
the method for constructing the risk assessment model provided by the embodiment of the specification is applied to a first server of a first organization, and comprises the following steps:
acquiring a risk assessment model obtained by training of a second server of a second organization; the risk assessment model is obtained by training according to first sample data in the second institution; the first sample data is only allowed to be used in the device of the second institution;
updating a first parameter in the risk assessment model according to second sample data in the first mechanism to obtain an updated risk assessment model; said second sample data is only allowed to be used in devices of said first institution; the updated risk assessment model is used for risk assessment of the data of the first institution and the second institution.
The apparatus for constructing a risk assessment model provided by the embodiments of the present specification is applied to a first server of a first institution, and the apparatus includes:
the risk assessment model acquisition module is used for acquiring a risk assessment model obtained by training of a second server of a second organization; the risk assessment model is obtained by training according to first sample data in the second institution; the first sample data is only allowed to be used in the device of the second institution;
the model updating module is used for updating a first parameter in the risk assessment model according to second sample data in the first mechanism to obtain an updated risk assessment model; said second sample data is only allowed to be used in devices of said first institution; the updated risk assessment model is used for risk assessment of the data of the first institution and the second institution.
The device for constructing the risk assessment model provided by the embodiment of the specification comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
acquiring a risk assessment model obtained by training of a second server of a second organization; the risk assessment model is obtained by training according to first sample data in the second institution; the first sample data is only allowed to be used in the device of the second institution;
updating a first parameter in the risk assessment model according to second sample data in the first mechanism to obtain an updated risk assessment model; said second sample data is only allowed to be used in devices of said first institution; the updated risk assessment model is used for risk assessment of the data of the first institution and the second institution.
Embodiments of the present specification provide a computer readable medium having stored thereon computer readable instructions executable by a processor to implement a method of constructing a risk assessment model.
One embodiment of the present description achieves the following advantageous effects: obtaining a risk assessment model obtained by training a second server of a second organization; the risk assessment model is obtained by training according to first sample data in the second institution; the first sample data is only allowed to be used in the device of the second institution; updating a first parameter in the risk assessment model according to second sample data in the first mechanism to obtain an updated risk assessment model; said second sample data is only allowed to be used in devices of said first institution; the updated risk assessment model is used for risk assessment of the data of the first institution and the second institution. By adopting the method, when the risk assessment model is constructed, only data inside the mechanism is relied on in single iteration, and frequent data exchange on the network is not needed, so that the time overhead of model training can be greatly shortened. Meanwhile, an incremental learning method is adopted in different mechanisms, the risk assessment model is built under the condition that data cannot be out of the domain, and compliance risks caused by data out of the domain during model building can be avoided.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.
FIG. 1 is a schematic diagram of an overall scheme of a method for constructing a risk assessment model in an embodiment of the present specification;
FIG. 2 is a flow chart of a method for constructing a risk assessment model provided by an embodiment of the present disclosure;
FIG. 3 is a schematic lane diagram of a method for constructing a risk assessment model according to an embodiment of the present disclosure;
fig. 4 is a schematic diagram of an apparatus for constructing a risk assessment model according to an embodiment of the present disclosure.
Detailed Description
To make the objects, technical solutions and advantages of one or more embodiments of the present disclosure more apparent, the technical solutions of one or more embodiments of the present disclosure will be described in detail and completely with reference to the specific embodiments of the present disclosure and the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present specification, and not all embodiments. All other embodiments that can be derived by a person skilled in the art from the embodiments given herein without making any creative effort fall within the scope of protection of one or more embodiments of the present specification.
"compliance risk" refers to: banks may be at risk of legal sanctions or regulatory penalties, significant financial or reputation loss due to failure to comply with legal regulations, regulatory requirements, rules, relevant guidelines set by the autonomic organization, or behavioral guidelines that have been applied to the bank's own business activities. Compliance risk is widely present in various aspects of financial institution business and management, and in the internet industry, compliance risk may refer to a risk resulting from failure to keep consistent with national laws, regulations, policies, and industry paradigms or service level agreements during operation or internal management of an enterprise.
Risk Assessment (Risk Assessment) refers to the task of quantitatively assessing the likelihood of impact and loss on various aspects of a person's life, property, etc., before or after (but not before) a Risk event occurs. That is, risk assessment quantifies the likely degree of impact or loss associated with an event or thing.
From an information security perspective, risk assessment is an assessment of the threat faced by an information asset (i.e., the information set that an event or thing has), the vulnerability present, the impact posed, and the likelihood of risk posed by the combined effect of the three. As a basis of risk management, risk assessment is an important way for organizations to determine information security requirements, and belongs to a process for organizing information security management system plans. And compliance risk assessment is carried out, so that the function of the compliance risk assessment in risk management can be fully exerted.
Compliance risk management refers to a cyclic process of actively avoiding the occurrence of violation events, actively finding and taking appropriate measures to correct the occurring violation events, and continuously improving related systems and corresponding methods.
In the prior art, a credit wind control scene is taken as an example, and a risk assessment model needs to be constructed by using samples of a plurality of credit agencies in the same scene in the credit wind control scene. For example, when a third-party wind control company plans to develop a general application scoring model for a bank credit card passenger group, credit data related to credit cards of multiple banks are used in the development process of the model, and modelers need to gather samples of institutions participating in modeling, and perform data cleaning, feature matching, model development, model calibration, model testing and other works, so that financial institutions need to output data such as identity information and loan information of the loan passenger group to an external modeling environment, namely a data export domain. The information of the lender can be transferred among different institutions, and information leakage of the user is caused.
With the gradual enhancement of the protection of the personal privacy data of the citizens by the supervision department, the existing modeling method obviously has serious compliance risk. Many legal and legal provisions are intensively released, and the legal and legal system in the aspect of personal information privacy protection in China is systematically improved from multiple aspects. Under the strong constraints of the relevant laws, the financial institution must not share user privacy data, financial activity data with third-party institutions without direct authorization from individual users. The traditional model of data aggregation and integration within third-party companies is therefore fraught with unsustainable dilemma.
In the prior art, a principle based on federal learning is selected for model development. The model training process is completely completed by a preset program, and model developers cannot intervene in the development process, so that the core work such as feature engineering, parameter tuning and the like is not friendly, and the information of a modeling sample cannot be fully mined; massive data exchange is carried out on the network in the model training process, so that strict requirements are imposed on the network performance, and the time overhead of the training process is far greater than that of conventional training; third-party companies providing the federal learning platform service are required to participate, so that the economic cost and the time cost of model development are high.
In addition, there are also prior art schemes for incremental learning of models from continuously updated data in the network in a single organization. However, this method often only obtains data from one organization for model training, and the data coverage rate is low, and the trained risk assessment model can only be used for risk assessment on the data of the organization, but cannot be used for risk assessment on the data of other organizations. In the prior art, once a general risk assessment model needs to be trained, data of multiple organizations in the same application scene needs to be used for construction, which causes user data to be output to other organizations and causes violation defects.
Based on the scheme, the scheme for establishing the risk assessment model is provided, local training is sequentially carried out in samples of different organizations by using a specific algorithm supporting incremental learning, and finally the model fusing sample information of each organization is obtained, so that double requirements of business and compliance can be met.
The technical solutions provided by the embodiments of the present description are described in detail below with reference to the accompanying drawings.
Fig. 1 is a schematic diagram of an overall scheme of a method for constructing a risk assessment model in an embodiment of the present specification. As shown in fig. 1, the mechanism 1 performs model initialization by using sample data in the mechanism 1, trains to obtain the model 1, and the mechanism 2 updates parameters of the model 1 based on the model 1 according to the sample data in the mechanism, and in this loop, the mechanism M performs parameter update based on the previous model by using the sample data in the mechanism M, so as to obtain the updated model M. It should be noted that the diagram in fig. 1 shows that each organization continues to perform local training on the model obtained by training the previous organization according to the data in its own organization, and obtains the model after updating the parameters. Not every organization trains a model separately. Instead, the model is continuously updated incrementally and iteratively by the mechanism according to the data in the mechanism, so that the model finally fusing the sample data of each mechanism is obtained. For example: the mechanism 1 trains the model by adopting sample data in the mechanism to obtain the model 1, determines that a parameter estimation value A corresponding to the model 1 is 0.5, puts the model 1 into the mechanism 2, updates the parameters of the model 1 by adopting the sample data in the mechanism 2, and continuously optimizes the parameter A to obtain the model 2, wherein the parameter estimation value A' is 0.55; by analogy, the mechanism N adopts the user data in the mechanism N to continue training the model N-1 output by the previous mechanism to obtain the model N, and the model N finally fusing the sample data of each mechanism is obtained.
Next, a method for constructing a risk assessment model provided in the embodiments of the specification will be described in detail with reference to the accompanying drawings:
fig. 2 is a flowchart of a method for constructing a risk assessment model according to an embodiment of the present disclosure. From the viewpoint of a program, the execution subject of the flow may be a program installed in an application server or an application client. In this embodiment, the execution subject may be a first server of the first mechanism. It should be noted that "first" in "first server and first mechanism" mentioned herein is only used for distinguishing different mechanisms or different servers, and does not mean specific meaning. In this embodiment, when the first server of the first organization constructs the model, the model needs to be continuously constructed on the basis of the model constructed by the previous organization.
As shown in fig. 2, the process may include the following steps:
step 210: acquiring a risk assessment model obtained by training of a second server of a second organization; the risk assessment model is obtained by training according to first sample data in the second institution; the first sample data is only allowed to be used in the device of the second institution.
It should be noted that the second mechanism may represent another mechanism different from the first mechanism. Specifically, the model training process according to the data in the second institution is completed by a second server deployed in the second institution.
The risk assessment model in the above step may refer to a model obtained by training a second server in a second institution according to sample data in the second institution. The risk assessment model may be used to assess risk for a second institution.
When the second mechanism trains the risk assessment model, data cannot be taken from other mechanisms, and training is only carried out according to the data in the mechanism. Only the data inside the second organization is relied on in a single iteration, and the data exchange on the network is not required to be frequently carried out.
Step 220: updating a first parameter in the risk assessment model according to second sample data in the first mechanism to obtain an updated risk assessment model; said second sample data is only allowed to be used in devices of said first institution; the updated risk assessment model is used for risk assessment of the data of the first institution and the second institution.
The server of the first organization does not take data from the second organization during model training, but the server of the first organization continues training based on the risk assessment model trained by the second organization.
The first server performs basic training on the risk assessment model obtained by training the second server according to second sample data in the first institution to obtain an updated risk assessment model, and can perform risk assessment on data in the first institution and the second institution.
For example: there are a mechanism a, a mechanism B, and a mechanism C, sample data in the mechanism a is data 1, sample data in the mechanism B is data 2, and sample data in the mechanism C is data 3. The server in the organization A is trained by adopting the data 1 to obtain a risk assessment model 1, the organization B can obtain the risk assessment model 1, the server in the organization B continues to train the risk assessment model 1 by adopting the data 2, in fact, parameters in the risk assessment model 1 are updated by adopting the data 2 to obtain an updated model 2, the organization C can obtain the risk assessment model 2, the server in the organization C continues to train the risk assessment model 2 by adopting the data 3, in fact, parameters in the risk assessment model 2 are updated by adopting the data 3 to obtain the updated risk assessment model 3. The risk assessment model 3 can perform risk assessment on data in institution a, institution B, and institution C.
It should be noted that the above method can be understood as follows: during the training of the risk assessment model, the optimization objective function of the risk assessment model is decomposed into single or single batches of samples, and the samples of different batches are contributed by different organizations. The traditional 'data movement and model immobility' is converted into 'data immobility and model movement', and the high risk of the data domain is converted into the risk-free model domain.
In addition, the first mechanism in the above steps may represent one mechanism, or may represent a plurality of mechanisms, for example: when it is desired to train a risk assessment model that can be generalized to multiple banking institutions, the first institution may be one of the banks whose data is only allowed to be processed at that institution. When it is desired to train a risk assessment model that is universally applicable to financial institutions, the first institution may be a class of institutions, for example: the first institution may be a plurality of banks, but at this time, the data in each bank is limited to be used within the bank unless a special security agreement exists.
It should be understood that the order of some steps in the method described in one or more embodiments of the present disclosure may be interchanged according to actual needs, or some steps may be omitted or deleted.
The method of FIG. 2, the risk assessment model trained by the second server of the second organization is obtained; the risk assessment model is obtained by training according to first sample data in the second institution; the first sample data is only allowed to be used in the device of the second institution; updating a first parameter in the risk assessment model according to second sample data in the first mechanism to obtain an updated risk assessment model; said second sample data is only allowed to be used in devices of said first institution; the updated risk assessment model is used for risk assessment of the data of the first institution and the second institution. By adopting the method, when the risk assessment model is constructed, only data inside the mechanism is relied on in single iteration, and frequent data exchange on the network is not needed, so that the time overhead of model training can be greatly shortened. Meanwhile, an incremental learning method is adopted in different mechanisms, the risk assessment model is built under the condition that data cannot be out of the domain, and compliance risks caused by data out of the domain during model building can be avoided.
Based on the method of fig. 2, the present specification also provides some specific embodiments of the method, which are described below.
Optionally, the method in fig. 2 may further include:
sending the updated risk assessment model to a third institution; and the third mechanism updates the second parameter in the updated risk assessment model according to third sample data in the mechanism to obtain the updated risk assessment model.
The model trained by the second organization can be sent to a third organization, and the third organization continues training. It should be noted that, in practical applications, when a risk assessment model of a general type in a financial institution needs to be trained, a trained sample may be selected, for example: if a risk assessment model commonly used in financial institutions needs to be trained, data in banks, securities institutions, insurance institutions, trust institutions, fund institutions and the like can be selected for model training. If the risk assessment model that needs to be trained only requires adaptation between banks, the data of the respective bank can be selected as training data for the model.
For this embodiment, after the second organization trains to obtain the updated risk assessment model, the third organization trains the more obvious risk assessment model according to the data in this organization on this basis. During the actual training process, the second parameter in the updated risk assessment model may be updated.
It should be noted that in the training process of the model, the parameters of the model need to be calculated and solved, and the training process of the model can be understood as continuously updating the parameters of the model by using new data until the scoring accuracy of the risk assessment model after being updated again meets the preset accuracy.
In the process of constructing the risk assessment model, the model construction process is divided into a model training process and a model testing process, theoretically, training in all corresponding mechanisms should be performed during model training, and finally the trained model is obtained, but in practical application, traversing of all the mechanisms cannot be really achieved, therefore, during model training, the mechanisms of the training model can be selected, the coverage rate of data is guaranteed as far as possible, and for the selected mechanisms, each mechanism can update parameters of the model on the basis of the previous mechanism, and finally the trained risk assessment model is obtained.
After the trained risk assessment model is obtained, the model needs to be tested, and the performance of the model is also determined. Specifically, when the model is tested, the updated risk assessment model can be input into each organization for local testing. And (4) sequentially substituting the trained risk assessment model into the test samples of each mechanism again to test and assess the performance of the model. When the performance of the model on the samples of part of the institutions is found to be not expected, the optimization is carried out on the current institution until the performance of the model on the samples of all institutions reaches the expectation. The method specifically comprises the following steps:
for any mechanism, inputting data in the mechanism into the risk assessment model after being updated again to obtain a prediction result of the risk assessment model after being updated again;
calculating the prediction accuracy of the updated risk assessment model according to the prediction result;
judging whether the prediction accuracy of the risk assessment model after being updated again reaches the preset accuracy corresponding to any mechanism or not to obtain a judgment result;
and when the judgment result shows that the prediction accuracy of the risk assessment model after being updated again reaches the preset accuracy corresponding to any mechanism, determining that the risk assessment model after being updated again passes the test of any mechanism.
Optionally, the determining whether the prediction accuracy of the risk assessment model updated again reaches the preset accuracy corresponding to any one mechanism, and after obtaining the determination result, may further include:
and when the prediction accuracy of the risk assessment model after being updated again does not reach the preset accuracy corresponding to any mechanism, training the risk assessment model after being updated again by adopting sample data in any mechanism, and optimizing parameters in the risk assessment model after being updated again until the prediction accuracy of the risk assessment model after being updated again meets the preset value corresponding to any mechanism, so as to obtain the optimized risk assessment model.
It should be noted that each organization may have a prediction accuracy when training the risk assessment model according to its own organization. For example: the mechanisms participating in model training comprise a mechanism 1 and a mechanism 5, wherein the mechanism 1 requires that the model trained by the mechanism per se reaches the prediction accuracy of 90%, the mechanism 2 requires that the model trained by the mechanism per se reaches the prediction accuracy of 95%, and … …, and the mechanism 5 requires that the model trained by the mechanism per se reaches the prediction accuracy of 99%. When each mechanism trains the model by adopting the data in the mechanism, the parameters of the model need to be continuously updated until the trained model meets the preset accuracy rate corresponding to the mechanism.
By the method, when the model is trained in each mechanism, the data is only applied in the mechanism in an iterative mode and cannot be applied to model training of other mechanisms, and data domain extraction is avoided. Because the model does not contain any information of the user, the user data cannot be output to other mechanisms in the process of building the model, and corresponding regulations are not violated.
In an embodiment, the training process of the risk assessment model in the present solution may be implemented in combination with a blockchain, and the result of each organization after training the risk assessment model locally may be stored in a blockchain system in a privacy-protected manner.
In the embodiment of the present specification, when each institution performs local training by using the institution, the trained risk assessment model can be stored in the blockchain system. When other organizations need to adopt the data of the organization to continuously update the risk assessment model, the risk assessment model trained and completed by the previous organization can be obtained from the block chain system. After the risk assessment model is updated by adopting the mechanism data, the updated risk assessment model can be encrypted and uploaded to a block chain system for storage. In order to further ensure the security of the model, the risk assessment model stored in the blockchain system may be an encrypted model, and after acquiring the model, other organizations need to decrypt the model to ensure the security of data in the operation process.
In practical application, in order to prove the effectiveness of the training of the risk assessment model, after the training of each mechanism is completed, it can be proved to other mechanisms that the model obtained by training is obtained by the mechanism through the data in the mechanism, and specifically, the requirement can be realized in a mode of generating a Verifiable statement (VC). VC is also an important application in DID. The VCs may be stored in a blockchain platform. In the scheme, multiple organizations need to use data in the organizations to conduct local training, and finally obtain needed risk assessment models. Each organization trains to obtain a model by using the data in the organization, so that the model trained by each organization can carry version information. For example, the contents of a VC may include a digital signature of the organization that trained the model and the corresponding version information for the model. At this time, the verifiable statement can prove that the model of the version information is the model obtained by training of the mechanism, but whether the sample data in the mechanism is adopted needs to be checked, and the corresponding data identifier needs to be checked. Thus, when any one of the organizations trains a model of a certain version, in addition to storing verifiable statements in the blockchain system, data identifications of sample data used in training the model may also be stored in the blockchain system. Each organization may create a distributed digital Identity and a Document of the distributed digital Identity (DID Doc) through a Distributed Identity Service (DIS) for managing the Identity of the organization, and the DID Doc of each organization may be stored in the blockchain platform. Accordingly, in the present specification embodiment, the data identification of the sample data may be stored in a DID document (DID Doc).
When an organization updates the risk assessment model obtained by the previous organization, the risk assessment model is obtained from the previous organization, and the corresponding VC and the data identification of the sample data adopted during training can be verified through the block chain. Specifically, a certain mechanism may obtain a public key in the DID Doc from the blockchain, and when the output result of the previous mechanism is verified, the corresponding public key is also used to verify the signature of the VC sent by the previous mechanism, thereby confirming that the VC is issued by the previous mechanism and is complete, that is, has not been tampered. Thus, based on the non-falsifiable characteristic of the blockchain platform and the credibility of the signature authority, the real validity approval of the risk assessment model can be improved.
It should be noted that there is interaction between each mechanism and the blockchain system, and each mechanism may upload the trained risk assessment model and the generated other result data to the blockchain system for storage, and may also acquire the uploaded result data of the other mechanisms stored in the blockchain system.
Privacy protection can be achieved by a variety of techniques, such as cryptography (e.g., Homomorphic encryption, or Zero-knowledge proof of knowledge), hardware privacy techniques, and network isolation techniques. The hardware privacy protection technology typically includes a Trusted Execution Environment (TEE).
For example, each blockchain link point may implement a secure execution environment for blockchain transactions through the TEE. The TEE is a trusted execution environment that is based on a secure extension of the CPU hardware and is completely isolated from the outside. The industry is concerned with TEE solutions, and almost all mainstream chip and Software consortiums have their own TEE solutions, such as TPM (Trusted Platform Module) in Software, and intel SGX (Software Guard Extensions) in hardware, ARM Trustzone, and AMD PSP (Platform Security Processor). The TEE can function as a hardware black box, and codes and data executed in the TEE cannot be peeped even in an operating system layer, and can be operated only through an interface predefined in the codes. In terms of efficiency, due to the black box nature of the TEE, plaintext data is operated on in the TEE, rather than the complex cryptographic operations in homomorphic encryption, and little loss in computational process efficiency occurs. Therefore, the TEE environment is deployed on the block chain link point, so that the privacy requirement under the block chain scene can be met to a great extent on the premise that the performance loss is relatively small, and the privacy of data is guaranteed.
In the embodiment of the description, when each organization locally trains the risk assessment model by using the data in the organization, the risk assessment model can be processed by using a privacy calculation method, so that the safe and reliable calculation can be performed without revealing the organization data. Therefore, the training process of the risk assessment model may be performed in the TEE, thereby ensuring the safety and reliability of the model training process.
By the method, the training process of the risk assessment model is combined with the block chain system, so that the safety and reliability of the training process and the credibility of the training result can be ensured. But also can ensure the safety of data in the training process.
Optionally, the risk assessment model may be a logistic regression model; the obtaining of the risk assessment model trained by the second server of the second organization may specifically include:
determining a first parameter in the risk assessment model sent by a second institution;
the updating a first parameter in the risk assessment model according to second sample data in the first institution to obtain an updated risk assessment model may specifically include:
and training a logistic regression model according to the first parameter and the second sample data to obtain an updated risk assessment model.
It should be noted that, the Logistic Regression (LR) model can be understood as applying a Logistic function based on linear Regression, which is equivalent to y = f (x), and indicates the relationship between the independent variable x and the dependent variable y. For example: taking a credit scenario as an example, X is relevant data of the user, such as: user basic information, credit data of the user, historical risk data of the user and the like. Y is an observed value and can represent whether the detected mechanism has risk or not. By constructing a linear regression model, it is possible to predict whether there is a risk in the data in the mechanism based on the input user data.
Optionally, the determining a first parameter in the risk assessment model sent by the second entity specifically may include:
determining a first parameter of the risk assessment model using maximum likelihood estimation based on the first sample data in the second institution.
Optionally, the updating the first parameter in the risk assessment model according to the second sample data in the first institution may specifically include:
and updating the first parameter by adopting a gradient ascent method according to the second sample data.
The Maximum Likelihood Method (ML) is also called Maximum Likelihood estimation, and is a theoretical point estimation method. When n sets of sample observations are randomly drawn from the population of models, the most reasonable parameter estimator should maximize the probability of extracting the n sets of sample observations from the model.
In the machine learning algorithm, when the loss function is minimized, the minimized loss function and the corresponding parameter value can be obtained through a gradient descent idea, and conversely, if the maximized loss function is required, the minimized loss function and the corresponding parameter value can be obtained through a gradient ascent idea.
It should be noted that, in the present embodiment, modeling is performed based on incremental learning, and therefore, any model that supports the incremental learning algorithm may be applied to the embodiment of the present embodiment. For example: deep learning models, neural network models, and the like. And sequentially carrying out local training in samples of different organizations by using a specific algorithm supporting incremental learning, and finally obtaining a risk assessment model fusing sample information of each organization. In each training, the current mechanism sample can be substituted into the formula of model parameter estimation to further optimize the existing parameters.
The construction modes of the logistic regression model and the neural network model are taken as an example for explanation:
and constructing a logistic regression model in a first mode.
The model solves the optimal parameters by maximizing the log-likelihood function by using a gradient ascent method. When the log-likelihood function is optimized, the gradient of the log-likelihood function on the parameter to be updated can be decomposed into the contribution of a single sample, so that the current parameter can be updated by using a new sample independently in the parameter updating process.
For example: solving the logistic regression model parameters by using the maximum likelihood estimation method and solving the feedforward type neural network parameters by using the back propagation method are both based on a gradient rise (fall) method, and the gradient can be decomposed into the sum of components of any group of samples, namely the gradient has additive type to the samples. The model modeling process based on incremental learning at different mechanisms can be realized by utilizing the additive property of the gradient to the sample. The specific process steps can be as follows:
suppose that N mechanisms participate in modeling, and the sample sets are respectively:
Figure DEST_PATH_IMAGE001
,
wherein, the sample set of the ith institution is:
Figure DEST_PATH_IMAGE002
Figure DEST_PATH_IMAGE003
wherein,
Figure DEST_PATH_IMAGE004
,
Figure DEST_PATH_IMAGE005
respectively an independent variable data set and a dependent variable data set provided by the ith mechanism,
Figure DEST_PATH_IMAGE006
indicates that the ith household is shared
Figure 200512DEST_PATH_IMAGE006
The number of the samples is one,
Figure DEST_PATH_IMAGE007
indicates the number of arguments.
The logistic regression model based incremental learning process is as follows:
step 1. sample based on first institution
Figure DEST_PATH_IMAGE008
Constructing a logistic regression model to obtain coefficient estimation of the regression model
Figure DEST_PATH_IMAGE009
Wherein
Figure DEST_PATH_IMAGE010
Representing the b-th variable obtained by training the sample based on the 1 st institution
Figure DEST_PATH_IMAGE011
An estimate of the coefficient of (a).
Step 2: from the second organization to the Nth organization,
Figure DEST_PATH_IMAGE012
values are traversed from 1 to K, using the following formula:
Figure DEST_PATH_IMAGE013
Figure DEST_PATH_IMAGE014
when |
Figure DEST_PATH_IMAGE015
||<
Figure DEST_PATH_IMAGE016
Is terminated at the moment
Figure 549278DEST_PATH_IMAGE012
Go through, and order
Figure DEST_PATH_IMAGE017
Wherein, in the training of the ith institution, the initial parameters of the MLE optimization method
Figure DEST_PATH_IMAGE018
Obtained for the model in the sample of the i-1 st organization
Figure DEST_PATH_IMAGE019
B is traversed from 0 to p; k is set to a large constant, for example 1000000;
Figure 407644DEST_PATH_IMAGE016
set to a very small constant, e.g.
Figure DEST_PATH_IMAGE020
Figure DEST_PATH_IMAGE021
For learning the step size, it can be set to a small constant, e.g.
Figure DEST_PATH_IMAGE022
The optimal h can also be obtained by a cross-validation method or other parameter adjusting methods; the operator | | | | represents a 2 norm of the vector, e.g.
Figure DEST_PATH_IMAGE023
Figure DEST_PATH_IMAGE024
Is a vector
Figure DEST_PATH_IMAGE025
The ith component of (a).
Step 3, repeating the step 2 until
Figure DEST_PATH_IMAGE026
And (4) stabilizing, wherein i is accumulated to N from 1 each time step 2 is repeated, namely, the sample of the first institution needs to be included in the iteration.
And secondly, constructing a neural network model.
The neural network model can be a model for parameter estimation based on a back propagation method and is used for incremental learning modeling of the wind control model. The rationale for this may be the sum of the loss functions that decompose the loss function into different sub data sets.
For example: the fully-connected feedforward neural network with 3 input layer nodes, 1 hidden layer (2 nodes) and 1 output layer node is taken as an example for explanation:
the relationship of the input layer, the hidden layer and the output layer may be:
Figure DEST_PATH_IMAGE027
Figure DEST_PATH_IMAGE028
Figure 753918DEST_PATH_IMAGE029
wherein,
Figure DEST_PATH_IMAGE030
Figure 709236DEST_PATH_IMAGE031
Figure DEST_PATH_IMAGE032
is the independent variable of the number of the variable,
Figure DEST_PATH_IMAGE033
Figure DEST_PATH_IMAGE034
is a hidden layer node and accepts data from
Figure 672644DEST_PATH_IMAGE030
Figure 611781DEST_PATH_IMAGE031
Figure 545102DEST_PATH_IMAGE032
Is converted by the activation function and then input to the inputEgress node
Figure DEST_PATH_IMAGE035
And then, the conversion is carried out through an activation function in one step and then the output is carried out.
Figure 620505DEST_PATH_IMAGE033
Figure 283043DEST_PATH_IMAGE034
Figure 771793DEST_PATH_IMAGE035
Has an activation function of
Figure DEST_PATH_IMAGE036
Figure DEST_PATH_IMAGE037
And
Figure DEST_PATH_IMAGE038
the index i indicates the ith individual sample,
Figure DEST_PATH_IMAGE039
a bias term is represented.
Assuming an activation function
Figure DEST_PATH_IMAGE040
Can be derived and has a derivative function of
Figure DEST_PATH_IMAGE041
. In the context of a regression task (e.g., predicting the amount of loss due to a default), the mean square error function may be used as the loss function, including:
Figure DEST_PATH_IMAGE042
in such a model, a gradient descent method in combination with a back propagation (BP algorithm) method may be used for the parametric solution.
By the method, a model supporting the incremental learning algorithm can be constructed on a sample of the mechanism 1, the estimated value of each parameter of the model, the gradient of a loss function or a log-likelihood function on each parameter are determined, and other mechanisms update the model parameters on the basis of the model obtained by the mechanism to complete the construction of the risk assessment model.
Optionally, in an actual application scenario, the scheme may be applied in an application scenario that needs to be wind-controlled, such as: either risk control in a credit agency or risk control in a financial insurance agency. Thus, the first and second institutions in the above embodiments may belong to credit institutions; the sample data collected from the respective credit agency may include at least user profile data, credit data and risk data in the first agency. The user basic data may include information such as name, sex, age, occupation, native place, identification number, etc. of the user. The credit data may be a loan record of the user, and may specifically include information such as loan time, mortgage information, guarantee information, loan amount, loan age, and loan times. The risk data may include historical risk tags, such as: credit risk of the user, macro-policy risk, internal operational risk, liquidity risk, and the like. It should be noted that, these data are all user-related data, and in order to ensure compliance, these data can only be used in the organization storing the user data, and cannot be used by transmitting the organization to other organizations through a network or other means. Similarly, the first institution and the second institution may also belong to a financial insurance institution; at this time, the user basic data in the institution may include information of the user's sex, age, date of birth, location, occupation type, disposable income, social security situation, loan situation, living habits, family situation, and the like. Insurance data may include policy information, insurance type, insurance time limit, insurance responsibility type, and insurance contributions, among others. The risk data may include historical risk tags, such as: and (4) cheating insurance risks.
Taking the financial insurance institution as an example, a first institution for training an initial risk assessment model may continuously perform iterative training according to user data, insurance data, and risk data in the institution to obtain the initial risk assessment model, and then perform local training on the model by using data in the institution by other financial insurance institutions based on the initial risk assessment model, thereby updating the risk assessment model. Only data inside the mechanism is relied on in a single iteration of model training, and frequent data exchange on the network is not needed, so that the network performance is not strongly required, and the time overhead of model training can be greatly reduced. Meanwhile, model developers can deeply intervene in the process of model training and deeply mine the performance of data. In the algorithm supporting incremental learning, the model in one or more embodiments of the present specification is not weaker in model accuracy than the existing model, and at the same time, the risk of compliance caused by data out-of-range during model training can be avoided.
In addition, in the actual model building process, because different mechanisms may store data with different data formats, when building a model according to the data, in order to improve the efficiency of building the model, the data format of training data may be preset, and the preset data format may be a data format required in the risk assessment model training. When each mechanism performs local training according to the data of the mechanism, the data can be converted into a required data format, and then model construction is performed, specifically, the following steps can be adopted:
before the updating the first parameter in the risk assessment model according to the second sample data in the first institution and obtaining the updated risk assessment model, the method may further include:
acquiring a data format of second sample data;
and converting the data format of the second sample data according to a preset data format.
The method in the above embodiment can be explained with reference to fig. 3:
FIG. 3 is a schematic lane diagram of a method for constructing a risk assessment model provided in the embodiments of the present disclosure.
As shown in FIG. 3, for example, in training a risk assessment model for assessing credit institutions, relevant principals of the method include a first institution, a second institution, and a third institution. Specifically, when the steps are executed, the steps are executed by servers in the respective mechanisms, the number of the mechanisms is also selected according to actual situations, and in order to simplify the description of the present solution, the mechanisms are directly described as execution subjects of the respective flows in fig. 3, and only the interaction among 3 mechanisms is listed, so that the scope of protection of the solution is not limited. The specific implementation process is as follows:
in the construction process of the risk assessment model, the input data is second sample data in the first institution, and the sample data can comprise user basic data, credit data and risk data.
And the first mechanism solves the parameters of the model according to the input second sample data to obtain a risk assessment model 1.
And judging whether the scoring accuracy of the risk assessment model meets a first preset accuracy corresponding to a first mechanism, if not, continuously updating the parameters in the model by adopting the sample data in the mechanism until the scoring accuracy of the risk assessment model meets the first preset accuracy corresponding to the first mechanism. If so, the risk assessment model 1 is sent to the second institution.
The second mechanism updates the parameters of the risk assessment model 1 according to the first sample data in the mechanism to obtain an updated risk assessment model 2, the scoring accuracy of the risk assessment model 2 also needs to meet a second preset accuracy corresponding to the mechanism 2, and after the second preset accuracy is met, the risk assessment model 2 is sent to a third mechanism, and so on until all mechanisms are traversed, for example: and the mechanism N is trained to obtain the model N on the basis of the model N-1 input by the last mechanism, and the scoring accuracy of the model N meets the Nth preset accuracy corresponding to the Nth mechanism. And (4) sequentially bringing the model N into the first mechanism-the Nth mechanism again for local testing, finishing the process after the testing is successful, and outputting a final target risk assessment model.
By the method, the following technical effects can be achieved:
1) only data inside the mechanism is relied on in a single iteration of model training, and frequent data exchange on the network is not needed, so that the network performance is not strongly required, and the time overhead of model training can be greatly reduced.
2) The model developer can deeply intervene in the process of model training and can deeply mine the performance of data.
3) The optimization objective function in model training is decomposed into single or single batches of samples, which are contributed by different organizations. The traditional 'data movement and model immobility' is converted into 'data immobility and model movement', and the high risk of the data domain is converted into the risk-free model domain.
4) The model itself does not contain any information of the user, and therefore the risk assessment model can be exported from the institution participating in the model training to the external environment on the basis of compliance with relevant laws and regulations.
Based on the same idea, the embodiment of the present specification further provides a device corresponding to the above method. The apparatus may include:
the risk assessment model acquisition module is used for acquiring a risk assessment model obtained by training of a second server of a second organization; the risk assessment model is obtained by training according to first sample data in the second institution; the first sample data is only allowed to be used in the device of the second institution;
the model updating module is used for updating a first parameter in the risk assessment model according to second sample data in the first mechanism to obtain an updated risk assessment model; said second sample data is only allowed to be used in devices of said first institution; the updated risk assessment model is used for risk assessment of the data of the first institution and the second institution.
Based on the above device, the embodiments of the present specification also provide some specific embodiments of the method, which are described below.
Optionally, the apparatus may further include:
the model sending module is used for sending the updated risk assessment model to a third mechanism; and the third mechanism updates the second parameter in the updated risk assessment model according to third sample data in the mechanism to obtain the updated risk assessment model.
Optionally, the scoring accuracy of the risk assessment model updated again meets a preset accuracy.
Optionally, the apparatus may further include:
and the model testing module is used for inputting the updated risk assessment model into each mechanism for local testing.
Optionally, the model testing module may specifically include:
a prediction result determining unit, configured to, for any one mechanism, input data in the mechanism into the risk assessment model after being updated again, and obtain a prediction result of the risk assessment model after being updated again;
the prediction accuracy calculation unit is used for calculating the prediction accuracy of the risk assessment model after being updated again according to the prediction result;
a judging unit, configured to judge whether a prediction accuracy of the risk assessment model updated again reaches a preset accuracy corresponding to the any one mechanism, to obtain a judgment result;
and the test passing determination unit is used for determining that the updated risk assessment model passes the test of any mechanism when the judgment result shows that the prediction accuracy of the updated risk assessment model reaches the preset accuracy corresponding to any mechanism.
Optionally, the model testing module may further include:
and the test tuning unit is used for training the risk assessment model after being updated again by adopting sample data in any mechanism when the prediction accuracy of the risk assessment model after being updated again does not reach the preset accuracy corresponding to any mechanism, tuning parameters in the risk assessment model after being updated again until the prediction accuracy of the risk assessment model after being updated again meets the preset value corresponding to any mechanism, and obtaining the tuned risk assessment model.
Optionally, the risk assessment model is a logistic regression model; the risk assessment model obtaining module may specifically include:
the first parameter determining unit is used for determining a first parameter in the risk assessment model sent by a second organization;
the model updating module specifically comprises:
and the updating unit is used for training the logistic regression model according to the first parameter and the second sample data to obtain an updated risk assessment model.
Optionally, the apparatus may further include:
the data format acquisition module is used for acquiring the data format of the second sample data;
and the data format conversion module is used for converting the data format of the second sample data according to a preset data format.
Optionally, the first parameter determining unit may be specifically configured to:
determining a first parameter of the risk assessment model using maximum likelihood estimation based on the first sample data in the second institution.
Optionally, the model updating module may be specifically configured to:
and updating the first parameter by adopting a gradient ascent method according to the second sample data.
Optionally, the risk assessment model obtaining module may specifically include:
and the risk evaluation model acquisition unit is used for acquiring the encrypted risk evaluation model from the block chain system and decrypting the encrypted risk evaluation model.
Optionally, the apparatus may further include:
and the risk evaluation model storage module is used for encrypting the updated risk evaluation model and uploading the encrypted risk evaluation model to the block chain system for storage.
Optionally, the apparatus may further include:
a first verifiable statement acquisition module for acquiring a first verifiable statement for indicating that the risk assessment model is trained by the second organization using the first sample data; the first verifiable statement at least comprises a digital signature of the second organization and version information of the risk assessment model, and a data identifier of the first sample data corresponding to the version information is also stored in the blockchain system.
Optionally, the apparatus may further include:
a second verifiable statement generation module, configured to generate a second verifiable statement, where the second verifiable statement is used to indicate that the updated risk assessment model is trained by the first mechanism using the second sample data; the second verifiable claim includes at least a digital signature of the first mechanism and version information of the updated risk assessment model.
Optionally, the apparatus may further include:
and the storage module is used for sending the second verifiable statement and the data identifier of the second sample data corresponding to the updated version information of the risk assessment model to the blockchain system for storage.
Optionally, the apparatus may further include:
and the trusted operating system loading module is used for loading the trusted operating system in advance so as to provide a running environment for the first server and the second server to execute the training model process.
Based on the same idea, the embodiment of the present specification further provides a device corresponding to the above method.
Fig. 4 is a schematic diagram of an apparatus for constructing a risk assessment model according to an embodiment of the present disclosure. As shown in fig. 4, the apparatus 400 may include:
at least one processor 410; and the number of the first and second groups,
a memory 430 communicatively coupled to the at least one processor; wherein,
the memory 430 stores instructions 420 executable by the at least one processor 410 to enable the at least one processor 410 to:
acquiring a risk assessment model obtained by training of a second server of a second organization; the risk assessment model is obtained by training according to first sample data in the second institution; the first sample data is only allowed to be used in the device of the second institution;
updating a first parameter in the risk assessment model according to second sample data in the first mechanism to obtain an updated risk assessment model; said second sample data is only allowed to be used in devices of said first institution; the updated risk assessment model is used for risk assessment of the data of the first institution and the second institution.
Based on the same idea, the embodiment of the present specification further provides a computer-readable medium corresponding to the above method. The computer readable medium has computer readable instructions stored thereon that are executable by a processor to implement the method of:
acquiring a risk assessment model obtained by training of a second server of a second organization; the risk assessment model is obtained by training according to first sample data in the second institution; the first sample data is only allowed to be used in the device of the second institution;
updating a first parameter in the risk assessment model according to second sample data in the first mechanism to obtain an updated risk assessment model; said second sample data is only allowed to be used in devices of said first institution; the updated risk assessment model is used for risk assessment of the data of the first institution and the second institution.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus for constructing a risk assessment model shown in fig. 4, since it is substantially similar to the method embodiment, the description is simple, and the relevant points can be referred to the partial description of the method embodiment.
In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital character system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate a dedicated integrated circuit chip. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), vhal (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.
The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information which can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (28)

1. A method of building a risk assessment model, the method being applied to a first server of a first institution, the method comprising:
acquiring an encrypted risk assessment model obtained by training of a second server of a second organization from the block chain system, and decrypting the encrypted risk assessment model; the risk assessment model is obtained by training according to first sample data in the second institution; the first sample data is only allowed to be used in the device of the second institution; the blockchain system comprises a first verifiable statement; the first verifiable claim is used to represent that the risk assessment model was trained by the second entity using the first sample data; the first verifiable statement at least comprises a digital signature of the second organization and version information of the risk assessment model, and a data identifier of the first sample data corresponding to the version information is also stored in the blockchain system;
acquiring a public key in a document of the distributed digital identity from the blockchain system;
verifying the first verifiable statement using the public key;
after the first verifiable statement passes the verification, updating a first parameter in the risk assessment model according to second sample data in the first institution to obtain an updated risk assessment model; said second sample data is only allowed to be used in devices of said first institution; the updated risk assessment model is used for carrying out risk assessment on the data of the first institution and the second institution; a second verifiable statement is included in the blockchain system; the second verifiable statement is used for representing that the updated risk assessment model is obtained by training the first mechanism by adopting the second sample data; the second verifiable statement includes at least a digital signature of the first mechanism and version information of the updated risk assessment model;
and encrypting the updated risk assessment model, and uploading the encrypted risk assessment model to the block chain system for storage.
2. The method of claim 1, further comprising:
sending the updated risk assessment model to a third institution; and the third mechanism updates the second parameter in the updated risk assessment model according to third sample data in the mechanism to obtain the updated risk assessment model.
3. The method of claim 2, wherein the scoring accuracy of the re-updated risk assessment model satisfies a preset accuracy.
4. The method of claim 2, after obtaining the updated risk assessment model, further comprising:
and inputting the updated risk assessment model into each mechanism for local testing.
5. The method according to claim 4, wherein inputting the updated risk assessment model into each institution for local testing specifically comprises:
for any mechanism, inputting data in the mechanism into the risk assessment model after being updated again to obtain a prediction result of the risk assessment model after being updated again;
calculating the prediction accuracy of the updated risk assessment model according to the prediction result;
judging whether the prediction accuracy of the risk assessment model after being updated again reaches the preset accuracy corresponding to any mechanism or not to obtain a judgment result;
and when the judgment result shows that the prediction accuracy of the risk assessment model after being updated again reaches the preset accuracy corresponding to any mechanism, determining that the risk assessment model after being updated again passes the test of any mechanism.
6. The method of claim 1, after generating the second verifiable claim, further comprising:
and sending the second verifiable statement and the data identifier of the second sample data corresponding to the version information of the updated risk assessment model to the blockchain system for storage.
7. The method of claim 1, further comprising:
and pre-loading the trusted operating system to provide a running environment for the first server and the second server to execute the training model process.
8. The method according to claim 5, wherein the determining whether the prediction accuracy of the updated risk assessment model reaches the preset accuracy corresponding to the any one mechanism further comprises:
and when the prediction accuracy of the risk assessment model after being updated again does not reach the preset accuracy corresponding to any mechanism, training the risk assessment model after being updated again by adopting the sample data in any mechanism, and optimizing the parameters in the risk assessment model after being updated again until the prediction accuracy of the risk assessment model after being updated again meets the preset value corresponding to any mechanism, so as to obtain the optimized risk assessment model.
9. The method of claim 1, the risk assessment model is a logistic regression model; the obtaining of the risk assessment model trained by the second server of the second organization specifically includes:
determining a first parameter in the risk assessment model sent by a second institution;
the updating a first parameter in the risk assessment model according to second sample data in the first institution to obtain an updated risk assessment model specifically includes:
and training a logistic regression model according to the first parameter and the second sample data to obtain an updated risk assessment model.
10. The method of claim 1, the first and second institutions belonging to a credit agency;
the second sample data includes at least user profile data, credit data, and risk data in the first institution.
11. The method of claim 1, the first institution and the second institution belonging to a financial insurance institution;
the second sample data includes at least user profile data, insurance data and risk data in the second institution.
12. The method of claim 1, wherein updating the first parameter in the risk assessment model according to the second sample data in the first institution further comprises, before obtaining the updated risk assessment model:
acquiring a data format of second sample data;
and converting the data format of the second sample data according to a preset data format.
13. The method according to claim 9, wherein the determining the first parameter in the risk assessment model sent by the second entity specifically comprises:
determining a first parameter of the risk assessment model using maximum likelihood estimation based on the first sample data in the second institution.
14. The method of claim 1, wherein said updating a first parameter in said risk assessment model according to second sample data in said first institution comprises:
and updating the first parameter by adopting a gradient ascent method according to the second sample data.
15. An apparatus for constructing a risk assessment model, the apparatus being applied to a first server of a first institution, the apparatus comprising:
the risk assessment model acquisition module is used for acquiring an encrypted risk assessment model obtained by training of a second server of a second organization from the block chain system and decrypting the encrypted risk assessment model; the risk assessment model is obtained by training according to first sample data in the second institution; the first sample data is only allowed to be used in the device of the second institution; the blockchain system comprises a first verifiable statement; the first verifiable claim is used to represent that the risk assessment model was trained by the second entity using the first sample data; the first verifiable statement at least comprises a digital signature of the second organization and version information of the risk assessment model, and a data identifier of the first sample data corresponding to the version information is also stored in the blockchain system;
acquiring a public key in a document of the distributed digital identity from the blockchain system;
verifying the first verifiable statement using the public key;
a model update module that, upon verification of the first verifiable assertion,
the risk assessment model is used for updating a first parameter in the risk assessment model according to second sample data in the first mechanism to obtain an updated risk assessment model; said second sample data is only allowed to be used in devices of said first institution; the updated risk assessment model is used for carrying out risk assessment on the data of the first institution and the second institution; a second verifiable statement is included in the blockchain system; the second verifiable statement is used for representing that the updated risk assessment model is obtained by training the first mechanism by adopting the second sample data; the second verifiable statement includes at least a digital signature of the first mechanism and version information of the updated risk assessment model;
and encrypting the updated risk assessment model, and uploading the encrypted risk assessment model to the block chain system for storage.
16. The apparatus of claim 15, the apparatus further comprising:
the model sending module is used for sending the updated risk assessment model to a third mechanism; and the third mechanism updates the second parameter in the updated risk assessment model according to third sample data in the mechanism to obtain the updated risk assessment model.
17. The apparatus of claim 16, wherein the scoring accuracy of the re-updated risk assessment model satisfies a preset accuracy.
18. The apparatus of claim 16, the apparatus further comprising:
and the model testing module is used for inputting the updated risk assessment model into each mechanism for local testing.
19. The apparatus of claim 18, wherein the model test module specifically comprises:
a prediction result determining unit, configured to, for any one mechanism, input data in the mechanism into the risk assessment model after being updated again, and obtain a prediction result of the risk assessment model after being updated again;
the prediction accuracy calculation unit is used for calculating the prediction accuracy of the risk assessment model after being updated again according to the prediction result;
a judging unit, configured to judge whether a prediction accuracy of the risk assessment model updated again reaches a preset accuracy corresponding to the any one mechanism, to obtain a judgment result;
and the test passing determination unit is used for determining that the updated risk assessment model passes the test of any mechanism when the judgment result shows that the prediction accuracy of the updated risk assessment model reaches the preset accuracy corresponding to any mechanism.
20. The apparatus of claim 15, further comprising:
and the storage module is used for sending the second verifiable statement and the data identifier of the second sample data corresponding to the updated version information of the risk assessment model to the blockchain system for storage.
21. The apparatus of claim 15, further comprising:
and the trusted operating system loading module is used for loading the trusted operating system in advance so as to provide a running environment for the first server and the second server to execute the training model process.
22. The apparatus of claim 19, the model testing module, further comprising:
and the test tuning unit is used for training the risk assessment model after being updated again by adopting the sample data in any mechanism when the prediction accuracy of the risk assessment model after being updated again does not reach the preset accuracy corresponding to any mechanism, tuning the parameters in the risk assessment model after being updated again until the prediction accuracy of the risk assessment model after being updated again meets the preset value corresponding to any mechanism, and obtaining the tuned risk assessment model.
23. The apparatus of claim 15, the risk assessment model is a logistic regression model; the risk assessment model acquisition module specifically includes:
the first parameter determining unit is used for determining a first parameter in the risk assessment model sent by a second organization;
the model updating module specifically comprises:
and the updating unit is used for training the logistic regression model according to the first parameter and the second sample data to obtain an updated risk assessment model.
24. The apparatus of claim 15, further comprising:
the data format acquisition module is used for acquiring the data format of the second sample data;
and the data format conversion module is used for converting the data format of the second sample data according to a preset data format.
25. The apparatus of claim 23, wherein the first parameter determining unit is specifically configured to:
determining a first parameter of the risk assessment model using maximum likelihood estimation based on the first sample data in the second institution.
26. The apparatus of claim 15, wherein the model update module is specifically configured to:
and updating the first parameter by adopting a gradient ascent method according to the second sample data.
27. An apparatus for constructing a risk assessment model, comprising:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to:
acquiring an encrypted risk assessment model obtained by training of a second server of a second organization from the block chain system, and decrypting the encrypted risk assessment model; the risk assessment model is obtained by training according to first sample data in the second institution; the first sample data is only allowed to be used in the device of the second institution; the blockchain system comprises a first verifiable statement; the first verifiable claim is used to represent that the risk assessment model was trained by the second entity using the first sample data; the first verifiable statement at least comprises a digital signature of the second organization and version information of the risk assessment model, and a data identifier of the first sample data corresponding to the version information is also stored in the blockchain system;
acquiring a public key in a document of the distributed digital identity from the blockchain system;
verifying the first verifiable statement using the public key;
after the first verifiable statement passes the verification, updating a first parameter in the risk assessment model according to second sample data in a first organization to obtain an updated risk assessment model; said second sample data is only allowed to be used in devices of said first institution; the updated risk assessment model is used for carrying out risk assessment on the data of the first institution and the second institution; a second verifiable statement is included in the blockchain system; the second verifiable statement is used for representing that the updated risk assessment model is obtained by training the first mechanism by adopting the second sample data; the second verifiable statement includes at least a digital signature of the first mechanism and version information of the updated risk assessment model;
and encrypting the updated risk assessment model, and uploading the encrypted risk assessment model to the block chain system for storage.
28. A computer readable medium having computer readable instructions stored thereon which are executable by a processor to implement the method of constructing a risk assessment model according to any one of claims 1 to 12.
CN202011559506.8A 2020-12-25 2020-12-25 Method, device and equipment for constructing risk assessment model Active CN112288573B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202111171644.3A CN113947471B (en) 2020-12-25 2020-12-25 Method, device and equipment for constructing risk assessment model
CN202011559506.8A CN112288573B (en) 2020-12-25 2020-12-25 Method, device and equipment for constructing risk assessment model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011559506.8A CN112288573B (en) 2020-12-25 2020-12-25 Method, device and equipment for constructing risk assessment model

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202111171644.3A Division CN113947471B (en) 2020-12-25 2020-12-25 Method, device and equipment for constructing risk assessment model

Publications (2)

Publication Number Publication Date
CN112288573A CN112288573A (en) 2021-01-29
CN112288573B true CN112288573B (en) 2021-09-21

Family

ID=74426315

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202111171644.3A Active CN113947471B (en) 2020-12-25 2020-12-25 Method, device and equipment for constructing risk assessment model
CN202011559506.8A Active CN112288573B (en) 2020-12-25 2020-12-25 Method, device and equipment for constructing risk assessment model

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202111171644.3A Active CN113947471B (en) 2020-12-25 2020-12-25 Method, device and equipment for constructing risk assessment model

Country Status (1)

Country Link
CN (2) CN113947471B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113033605B (en) * 2021-02-08 2024-09-24 广东迅科动力科技有限公司 Motor fault judging method and device, terminal equipment and computer storage medium
CN112801557A (en) * 2021-04-07 2021-05-14 支付宝(杭州)信息技术有限公司 Risk evaluation method and device based on block chain
CN112906139A (en) * 2021-04-08 2021-06-04 平安科技(深圳)有限公司 Vehicle fault risk assessment method and device, electronic equipment and storage medium
CN113657613B (en) * 2021-08-23 2024-08-30 北京易真学思教育科技有限公司 Predictive model training method, data processing device and system
CN114035835B (en) * 2021-10-15 2024-07-16 四川新网银行股份有限公司 System rollback risk assessment method
CN114819614B (en) * 2022-04-22 2024-10-15 支付宝(杭州)信息技术有限公司 Data processing method, device, system and equipment

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991782A (en) * 2019-02-21 2020-04-10 北京嘀嘀无限科技发展有限公司 Risk order studying and judging method and system
CN110310206B (en) * 2019-07-01 2023-09-29 创新先进技术有限公司 Method and system for updating risk control model
CN111095327B (en) * 2019-07-02 2023-11-17 创新先进技术有限公司 System and method for verifying verifiable claims
CN111047423A (en) * 2019-11-01 2020-04-21 支付宝(杭州)信息技术有限公司 Risk determination method and device and electronic equipment
CN111127197A (en) * 2019-12-31 2020-05-08 南京币鑫数据科技有限公司 Foreign trade supply chain financial risk control method
CN111431936B (en) * 2020-04-17 2021-09-21 支付宝(杭州)信息技术有限公司 Authorization processing method, device, equipment, system and storage medium based on verifiable statement
CN112818380B (en) * 2020-07-10 2024-06-28 支付宝(杭州)信息技术有限公司 Backtracking processing method, device, equipment and system for business behaviors
CN111598633B (en) * 2020-07-24 2021-11-12 北京淇瑀信息科技有限公司 Online advertisement putting method and device based on incremental learning and electronic equipment

Also Published As

Publication number Publication date
CN112288573A (en) 2021-01-29
CN113947471A (en) 2022-01-18
CN113947471B (en) 2024-09-27

Similar Documents

Publication Publication Date Title
CN112288573B (en) Method, device and equipment for constructing risk assessment model
Li et al. Trustworthy AI: From principles to practices
US20220382713A1 (en) Management of erasure or retention of user data stored in data stores
US11907403B2 (en) Dynamic differential privacy to federated learning systems
US11693634B2 (en) Building segment-specific executable program code for modeling outputs
US20240265255A1 (en) Machine-learning techniques involving monotonic recurrent neural networks
US20220198054A1 (en) Rights management regarding user data associated with data lifecycle discovery platform
US20220198044A1 (en) Governance management relating to data lifecycle discovery and management
CN109478263A (en) System and equipment for architecture assessment and strategy execution
US11894971B2 (en) Techniques for prediction models using time series data
US20220327541A1 (en) Systems and methods of generating risk scores and predictive fraud modeling
US20180253737A1 (en) Dynamicall Evaluating Fraud Risk
US11507291B2 (en) Data block-based system and methods for predictive models
CA3154647C (en) Maintaining data privacy in a shared detection model system
US12061671B2 (en) Data compression techniques for machine learning models
US20190340614A1 (en) Cognitive methodology for sequence of events patterns in fraud detection using petri-net models
EP4085332A1 (en) Creating predictor variables for prediction models from unstructured data using natural language processing
CN114402301B (en) System and method for maintaining data privacy in a shared detection model system
CN113614726A (en) Dynamic differential privacy for federated learning systems
US20220345323A1 (en) Method, computer program and system for enabling a verification of a result of a computation
US20240152926A1 (en) Preventing digital fraud utilizing a fraud risk tiering system for initial and ongoing assessment of risk
US20240056441A1 (en) Decentralized Identity Management for Web3
Roszel Towards Trustworthy Artificial Intelligence in Privacy-Preserving Collaborative Machine Learning
Deep et al. AI-Driven Data Security in Healthcare: Safeguarding Data and Financial Transactions
CN117876102A (en) Method and platform for calculating real estate financial risk through federal learning supported privacy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40045814

Country of ref document: HK