CN116167868A

CN116167868A - Risk identification method, apparatus, device and storage medium based on privacy calculation

Info

Publication number: CN116167868A
Application number: CN202211683941.0A
Authority: CN
Inventors: 马新悦; 王玉; 李�昊; 张书涵; 方帅
Original assignee: Picc Information Technology Co ltd
Current assignee: Picc Information Technology Co ltd
Priority date: 2022-12-27
Filing date: 2022-12-27
Publication date: 2023-05-26

Abstract

The application discloses a risk identification method based on privacy calculation, which is used for solving the problem that the risk identification result is inaccurate because only a single data source is used for risk identification in the prior art. The method comprises the following steps: acquiring first characteristic data corresponding to a service request to be identified according to the received service request to be identified; processing the user identifier according to an unintentional transmission protocol to obtain first private data, and sending the first private data to a second encryption server so that the second encryption server obtains second characteristic data according to the first private data; the first encryption server inputs the first characteristic data into a first risk assessment model which is obtained through pre-training, and a first risk value is obtained; acquiring a second risk value sent by the second encryption server; and determining a risk assessment result corresponding to the service request to be identified according to the first risk value and the second risk value.

Description

Risk identification method, apparatus, device and storage medium based on privacy calculation

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a risk identification method, apparatus, device, and storage medium based on privacy calculation.

Background

With the development of computer technology and internet technology, big data analysis has become an important technical means for business processing in the related field. In the process of informatization, various data sources (such as various e-commerce platforms, various financial institutions, various insurance institutions, medical institutions and the like) accumulate massive business data, and the business data gradually become important strategic resources of related enterprises.

Taking an insurance mechanism as an example, when the insurance mechanism performs related business (such as claim risk assessment or verification), a large amount of data is required to perform business risk assessment, and although the insurance mechanism accumulates a large amount of business data, the business data are mainly collected by relying on self-insurance attributes, and the feature dimension of the data is insufficient, so that an accurate assessment result is difficult to obtain by utilizing the business data of the mechanism. Under such circumstances, the limited self data features are insufficient, and the insurance institution needs to obtain more dimensional data from other data sources to optimize the risk model, so as to provide more accurate services for different users.

However, the user service data owned by each data source side belongs to the core data asset of the data source side, the traditional plaintext data sharing and API data interface sharing schemes have the risk of leakage privacy, and due to the consideration of privacy protection, each data source side is unwilling and dared to share data outwards, a data island phenomenon is formed among each data source side, and further the service side is directly limited to use the data of the data source side, so that the data value is difficult to fully mine.

Therefore, how to share data in multiple directions under the condition of ensuring data security and privacy security is a problem to be solved by related technicians in the field at present.

Disclosure of Invention

The embodiment of the application provides a risk identification method based on privacy calculation, which is used for solving the problems that a service user cannot acquire multi-dimensional data and only uses a single data source to perform risk identification due to the fact that the existing data source party is unwilling to perform data sharing due to privacy safety, and further, a risk identification result is inaccurate.

The embodiment of the application also provides a risk identification device based on privacy calculation, which is used for solving the problems that the existing data source party is unwilling to carry out data sharing due to privacy safety, so that a service user cannot acquire multi-dimensional data, only a single data source is used for risk identification, and further, the risk identification result is inaccurate.

The embodiment of the application also provides risk identification equipment based on privacy calculation, which is used for solving the problems that the existing data source party is unwilling to carry out data sharing due to privacy safety, so that a service user cannot acquire multi-dimensional data, only a single data source is used for risk identification, and further, the risk identification result is inaccurate.

The embodiment of the application also provides a computer readable storage medium, which is used for solving the problems that the existing data source party is unwilling to share data due to privacy security, so that a service user cannot acquire multi-dimensional data, only a single data source is used for risk identification, and further, the risk identification result is inaccurate.

The embodiment of the application adopts the following technical scheme:

a risk identification method based on privacy computation, comprising: the method comprises the steps that a first encryption server obtains first characteristic data corresponding to a service request to be identified according to the received service request to be identified, wherein the service request to be identified carries a user identifier, and the first characteristic data comprises user characteristic data; processing the user identifier according to an unintentional transmission protocol to obtain first private data, and sending the first private data to a second encryption server so that the second encryption server obtains second characteristic data according to the first private data; the first encryption server inputs the first characteristic data into a first risk assessment model which is obtained through pre-training, and a first risk value is obtained; acquiring a second risk value sent by the second encryption server, wherein the second risk value is calculated by the second encryption server by inputting the second characteristic data into a pre-trained second risk assessment model; and determining a risk assessment result corresponding to the service request to be identified according to the first risk value and the second risk value.

A privacy computation-based risk identification apparatus, comprising: the device comprises a feature data acquisition unit, a feature data processing unit and a feature data processing unit, wherein the feature data acquisition unit is used for acquiring first feature data corresponding to a service request to be identified according to the received service request to be identified, the service request to be identified carries a user identifier, and the first feature data comprises user feature data; the data alignment unit is used for processing the user identifier according to an careless transmission protocol to obtain first private data, and sending the first private data to a second encryption server so that the second encryption server can obtain second characteristic data according to the first private data; the risk value acquisition unit is used for inputting the first characteristic data into a first risk assessment model which is obtained through pre-training, so as to obtain a first risk value; the risk value obtaining unit is used for obtaining a second risk value sent by the second encryption server, wherein the second risk value is obtained by the second encryption server through calculation of a second risk assessment model obtained by inputting the second characteristic data into a pre-training mode; the risk assessment unit is used for determining a risk assessment result corresponding to the service request to be identified according to the first risk value and the second risk value.

A privacy computation-based risk identification device, comprising:

a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to: the method comprises the steps that a first encryption server obtains first characteristic data corresponding to a service request to be identified according to the received service request to be identified, wherein the service request to be identified carries a user identifier, and the first characteristic data comprises user characteristic data; processing the user identifier according to an unintentional transmission protocol to obtain first private data, and sending the first private data to a second encryption server so that the second encryption server obtains second characteristic data according to the first private data; the first encryption server inputs the first characteristic data into a first risk assessment model which is obtained through pre-training, and a first risk value is obtained; acquiring a second risk value sent by the second encryption server, wherein the second risk value is calculated by the second encryption server by inputting the second characteristic data into a pre-trained second risk assessment model; and determining a risk assessment result corresponding to the service request to be identified according to the first risk value and the second risk value.

A computer-readable storage medium storing one or more programs that, when executed by an electronic device comprising a plurality of application programs, cause the electronic device to: the method comprises the steps that a first encryption server obtains first characteristic data corresponding to a service request to be identified according to the received service request to be identified, wherein the service request to be identified carries a user identifier, and the first characteristic data comprises user characteristic data; processing the user identifier according to an unintentional transmission protocol to obtain first private data, and sending the first private data to a second encryption server so that the second encryption server obtains second characteristic data according to the first private data; the first encryption server inputs the first characteristic data into a first risk assessment model which is obtained through pre-training, and a first risk value is obtained; acquiring a second risk value sent by the second encryption server, wherein the second risk value is calculated by the second encryption server by inputting the second characteristic data into a pre-trained second risk assessment model; and determining a risk assessment result corresponding to the service request to be identified according to the first risk value and the second risk value.

The above-mentioned at least one technical scheme that this application embodiment adopted can reach following beneficial effect:

by adopting the risk identification method based on privacy calculation provided by the embodiment of the application, when a service user needs to perform risk identification on a service request, a first encryption server corresponding to the service user can acquire corresponding first characteristic data according to the received service request to be identified, and a first risk value corresponding to the service request is calculated by inputting the first characteristic data into a first risk assessment model which is obtained by training in advance; meanwhile, in order to avoid the problem that risk identification is inaccurate due to the fact that only a single data source is used for risk identification, in the embodiment of the application, the first encryption server can process the user identification through an inadvertent transmission protocol to obtain first private data, the first private data is sent to the second encryption server, so that the second encryption server obtains second characteristic data according to the first private data, a second risk value is obtained by calculating a second risk assessment model obtained by inputting the second characteristic data through pre-training, the second risk value is sent to the first encryption server, so that risk assessment of the service request can be completed by the first encryption server according to the first risk value and the second risk value, and as the first risk value and the second risk value are obtained by calculating data provided by two different data sources respectively, compared with a risk identification result obtained by calculating the single data source, the risk identification result provided by the embodiment of the application has higher accuracy; meanwhile, in the embodiment of the application, although the data sharing of multiple data sources is involved, due to the adoption of the privacy calculation scheme, shared data is invisible to all data sources, so that the safety of data owned by all data sources is greatly ensured, more data sources can participate in the data sharing and use the data to perform data mining, risk analysis and other services on the premise of ensuring the data safety, the island effect of the data is eliminated, the value of the data is fully mined, and the accuracy of a risk analysis result obtained by carrying out risk analysis on the data is greatly improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

fig. 1 is a specific flow diagram of a risk identification method based on privacy calculation according to an embodiment of the present application;

fig. 2 is a schematic specific structure diagram of a risk identification device based on privacy calculation according to an embodiment of the present application;

fig. 3 is a schematic specific structure diagram of a risk identification device based on privacy calculation according to an embodiment of the present application.

Detailed Description

For the purposes, technical solutions and advantages of the present application, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.

The risk identification method based on privacy calculation is used for solving the problems that due to the fact that an existing data source party is unwilling to conduct data sharing due to privacy safety, a service user cannot acquire multi-dimensional data, risk identification is conducted only by using a single data source, and further a risk identification result is inaccurate.

The execution subject of the risk identification method based on privacy calculation provided in the embodiment of the present application may be, but is not limited to, at least one of an insurance server, an policy processing server, a vehicle insurance claim settlement server, a medical insurance claim settlement server, a claim risk identification server, and the like.

For convenience of description, embodiments of the method will be described below taking an execution subject of the method as an example of the risk recognition server. It will be appreciated that the subject of execution of the method is merely an exemplary illustration of a claims risk identification server and should not be construed as limiting the method.

The specific implementation flow diagram of the risk identification method based on privacy calculation provided by the application is shown in fig. 1, and mainly comprises the following steps:

step 11, acquiring first characteristic data corresponding to a service request to be identified according to the received service request to be identified by a claim risk identification server;

In addition to the user identifier of the insurance applicant carried in the service request to be identified, for example, the service request to be identified may also carry a responsive service identifier, for example, the specific dangerous seed of the insurance service is health insurance, and the service identifier on the health insurance protection system may also be carried in the service request.

In the embodiment of the present application, the obtained first feature data may include user feature data corresponding to the user identifier and service data corresponding to the service identifier. Specifically, after receiving the claim settlement request, the claim settlement risk identification server may search corresponding user characteristic data, such as applicant basic information, applicant health information, and the like, and corresponding service characteristic data, such as purchased insurance types, claim settlement amounts, validity periods, policy numbers, and the like, in the corresponding background database according to the user identifier and the service identifier.

Step 12, the claim risk recognition server processes the user identifier according to an careless transmission protocol to obtain first privacy data, and sends the first privacy data to a second encryption server;

By executing step 11, the claim risk recognition server may obtain feature data related to the service request to be recognized, but the feature data are all derived from a background database of the insurance company, and although a large amount of service feature data are stored in the database of the insurance company, the service feature data are mainly collected depending on self service correlation (such as based on insurance attribute), and the data feature dimension is insufficient, so that if risk recognition is performed only by using the feature data collected in step 11, it is difficult to obtain an accurate risk evaluation result. Therefore, in the process of risk assessment, there is a need to fuse related data of multiple data sources, but at the same time, there is a need to solve the problem of data security faced in the process of data sharing.

In order to solve the above problem, in the embodiment of the present application, after receiving a service request to be identified, the claim risk identification server may be used as an initiator to initiate a data fusion sharing request to other data sources, so that the other data sources are used as participants to perform data fusion interaction with the risk identification server. It should be noted that, for convenience of description, the risk identification method provided in the embodiments of the present application will be described below with reference to one participant as an example, and a health management platform is taken as an example, where it is to be understood that one participant "health management platform" is only an exemplary illustration, and should not be construed as limiting the method.

In the embodiment of the application, the claim risk recognition server can realize data sharing with the participants based on a pre-trained federal learning model. Through the federal learning technology, distributed model training can be carried out among a plurality of data sources with local data, and by utilizing the trained federal learning model, the global model based on virtual fusion data can be constructed only by exchanging model parameters or intermediate results on the premise of not exchanging local individual or sample data, so that balance of data privacy protection and data sharing calculation is realized, and the availability of data is achieved. In one embodiment, the risk assessment model used by the risk identification server for risk identification is obtained based on federal learning thought training, and detailed description of the specific training method of the risk assessment model is omitted here.

It should be noted that, before using the risk assessment model constructed by federal learning, the claim risk recognition server first performs privacy set intersection (Private Set Intersection, PSI) on the data, that is, determines the data common to the multi-party data sources without exposing the unique data of each data source before performing data fusion joint calculation between the multi-party data sources.

In one embodiment, the claims risk recognition server may perform the privacy set intersection through the privacy set intersection implemented by the careless transmission protocol (Oblivious Transfer, OT), specifically, in the embodiment of the present application, the claims risk recognition server may perform the privacy set intersection according to the following method: processing the user identification according to an unintentional pseudo-random function to obtain first privacy data; constructing an encrypted transmission channel between the first encrypted server and the second encrypted server according to an inadvertent transmission protocol; and sending the first private data to the second encryption server through the encryption transmission channel, so that the second encryption server obtains feature data intersected with the first private data as the second feature data according to a private set PSI algorithm.

Wherein an unintentional Pseudo-Random Function (OPRF) is a protocol between an initiator and a participant, the protocol uses the participant as a sender and a receiver respectively, a claim risk recognition server is assumed to be used as a sender A in the protocol, a health management platform is used as a receiver B, and the sender A can construct seeds k of n unintentional Pseudo-Random functions through the OPRF _i Receiver B performs corresponding OPRF on each data in the database to obtain set H _B Sender a performs corresponding OPRF on each data in the database to obtain set H _A Sender a will aggregate H through the encrypted channel _A To the receiver B, the receiver calculatesSet H _A And set H _B The feature data common to sender a may be determined and obtained as second feature data for later use in performing the joint calculation.

It should be noted that, in the embodiment of the present application, what kind of transmission is used by the claim risk recognition server to perform the privacy set interaction with the participant is not limited.

Step 13, inputting the first characteristic data obtained by executing the step 11 into a first risk assessment model obtained by training in advance to obtain a first risk value;

in this embodiment of the present application, the first risk assessment model is obtained by jointly training a claim risk recognition server serving as an initiator and a health management platform serving as a participant according to a federal learning algorithm.

It should be noted that, before the risk assessment model training based on the federal learning algorithm, the privacy collection needs to be performed on the data between the initiator and the participant, specifically, the claim risk recognition server may obtain a corresponding first data set, the health management platform may obtain a corresponding second data set, and further, the privacy collection is used to perform the PSI algorithm (such as the OT algorithm) to calculate and determine the intersection data of the first data set and the second data set, that is, the common user of the claim risk recognition server and the health management platform.

In the embodiment of the application, since the same user may transact different types of services, and since the service types are different, the corresponding risk assessment modes may also have a certain difference, in order to ensure the accuracy of the risk assessment result, in the embodiment of the application, the intersection data may be grouped according to the data type of the intersection data, and the two models are respectively trained by using the grouping data and the overall data, and the two models together form the risk assessment model.

In one implementation, the risk assessment model may be specifically trained by the embodiments of the present application, and the specific method may include: grouping the intersection data according to the service characteristic data to obtain at least one group of grouping intersection data; training and updating the grouping intersection data based on a longitudinal Logistic Regression (LR) algorithm until a preset convergence condition is reached, and acquiring a first initial evaluation model; training and updating the intersection data based on a gradient lifting XGboost algorithm until a preset convergence condition is reached, and acquiring a second initial evaluation model; and constructing a first risk assessment model according to the first initial assessment model and the second initial assessment model.

The specific training process of the first initial evaluation model and the second initial evaluation model is described in detail below:

in one embodiment, the claim risk identification server and the health management platform may jointly train the first initial assessment model as follows: initializing a first model parameter on the first encryption server and a second model parameter on the second encryption model according to the intersection data; calculating and determining the model inner volumes of the first encryption server and the second encryption server according to the first model parameters and the second model parameters; determining a model residual value according to the model inner-volume and determining a loss function based on a longitudinal Logistic Regression (LR) algorithm according to the model inner-volume and the model residual value; the first encryption server calculates model gradients according to the model residual values, iteratively updates the first model parameters according to the model gradients, and updates each parameter in the first model parameters according to the model gradients in each iteration until the loss function meets a preset threshold value, so that logistic regression model training is completed, and a first initial evaluation model is obtained. The method specifically comprises the following substeps:

Sub-step 1301, the claim risk identification server creates an encryption key pair and sends the public key to the participant health management platform;

wherein a homomorphic encryption method (Paillier encryption) can be employed to create an encryption key pair (p, s) comprising a public key p and a private key s. The homomorphic encryption algorithm is a provable secure encryption system, the technology allows addition operation to ciphertext without decryption, data without plaintext access right can be calculated, high-level security is provided for data sharing among multiple parties, and therefore data privacy and model security of each party are protected.

In addition, for convenience of description, the claim risk recognition server will be hereinafter referred to as an initiator server, and the health management platform will be hereinafter referred to as a participant server.

In a substep 1302, the participant server and the initiator server initialize the model parameters on the respective servers respectively to obtain a first model parameter and a second model parameter;

sub-step 1303, the initiator server calculates the inner product of the first model parameter and the own feature value;

In an embodiment of the present application, the initiator server may calculate the inner product according to the following equation [1 ]:

wherein θ _A A first model parameter representing an initiator server,

representing the ith sample of the initiator server.

In a sub-step 1304, the participant server calculates an inner product of the second model parameter and the own characteristic value, and sends the calculation result to the initiator server in a plaintext form;

in an embodiment of the present application, the participant server may calculate the inner product according to the following equation [2 ]:

wherein θ _B A second model parameter representing a participant server,

representing the ith sample of the participant server.

Sub-step 1305, the initiator server summing the inner products obtained by performing sub-steps 1303 and 1304;

sub-step 1306, the initiator server calculates a model residual value according to the calculated inner product sum, and sends the encrypted residual value to the participant;

in an embodiment of the present application, the initiator server may calculate the model residual value according to the following equation [3 ]:

wherein, the liquid crystal display device comprises a liquid crystal display device,

representing the result obtained by calculation of the ith sample data substituted into the model, y ⁽ⁱ⁾ Representing the actual result of the ith sample data.

In the embodiment of the present application, the following formula [4 ] can be used ]Calculation of

Wherein sigmoid represents an activation function, which can be represented by the following formula [5 ]:

a substep 1307, wherein the participant server calculates a gradient from the encrypted residual values;

in an embodiment of the present application, the gradient may be calculated according to the following equation [6 ]:

and generating noise, and transmitting the encrypted noise and the encrypted gradient sum to the initiator.

Sub-step 1308, the initiator server calculates the loss function and the gradient, decrypts the gradient data sent by the participant and returns the decrypted gradient data;

in embodiments of the present application, the gradient may be according to the following equation [7 ]:

in an embodiment of the present application, the loss function may be determined according to the following equation [8 ]:

where m represents the number of samples.

Sub-step 1309, the initiator server and the participant server update the first model parameter and the second model parameter according to the respective calculated gradients;

sub-step 1310, the sub-steps 1303 to 1309 are circularly executed until the model converges, and the logistic regression model training is completed, so as to obtain a first initial evaluation model.

In one embodiment, the model used in the examples of the present application is shown in the following equation [9 ]:

wherein θ ^T x may be according to the following formula [10 ] ]And (3) determining:

θ ^T x＝θ ₀ +θ ₁ x ₁ +θ ₂ x ₂ +…[10]

in one embodiment, the claim risk identification server and the health management platform may jointly train the second initial assessment model as follows: creating a first decision tree from the intersection data; the first encryption server generates a key pair comprising a public key and a private key and sends the public key to the second encryption server; determining a first derivative and a second derivative corresponding to the intersection data according to the intersection data; encrypting the first derivative and the second derivative according to the private key to obtain an encrypted derivative, and transmitting the encrypted derivative to a second encryption server; receiving a second encryption derivative obtained after the second encryption server calculates the encrypted data; determining a feature split point score corresponding to a second encryption server according to the second encryption derivative; and determining a current splitting threshold according to the characteristic splitting point score, and sequentially establishing decision trees by the first encryption server and the second encryption server according to the splitting threshold until a preset stopping condition is reached, so as to finish the establishment of an mth decision tree and obtain a second initial evaluation model.

In particular, in one embodiment, the second initial assessment model may be trained in the following sub-steps:

13-a, the initiator server creates an encryption key pair and sends a public key to the participant health management platform;

in particular, the manner in which the encryption key pair is created may be referred to in the description of sub-step 1301, and is not described in detail herein.

13-b, the initiator server creates a first decision tree according to the intersection data;

specifically, the initiator server may select an optimal feature from q features held by the initiator server as a root node of an mth decision tree, divide N feature data into two sub-nodes to form two sub-node sample spaces G1 and G2, and satisfy g1.gtg2=g, thereby completing the binning processing of the feature data.

Step 13-c, determining a first derivative and a second derivative corresponding to the intersection data according to the intersection data;

specifically, the initiator server may calculate the first derivative of each of the intersection data according to the following equation [11 ]:

wherein y is _i Representing the true value of the ith feature data in the intersection data.

The initiator server may calculate the second derivative of each of the intersection data according to the following equation [12 ]:

13-d, encrypting the first derivative and the second derivative according to the private key to obtain an encrypted derivative, and transmitting the encrypted derivative to a participant server;

13-e, the participant server calculates a second encryption derivative according to the encryption derivative, and returns the calculated second encryption derivative to the initiator server;

in one embodiment, the participating server may calculate the second encrypted derivatives corresponding to the first and second derivatives according to the following equations [13] and [14], respectively:

G _L ＝∑g _i [13]

H _L ＝∑h _i [14]

a substep 13-f, wherein the initiator server decrypts the second encrypted derivative obtained by executing the substep 13-e, and determines the feature split point score corresponding to the participant server;

in one embodiment, the initiator server may calculate the feature split point score corresponding to the participant server according to the following equation [15 ]:

/>

wherein g=g _L +G _R ，H＝H _L +H _R Lambda is a regular term of the formula,

values representing left leaf nodes, +.>

Values representing right leaf nodes, +.>

Indicating the value before node splitting was not performed.

13-g, determining a current splitting threshold according to the characteristic splitting point score, and sequentially establishing decision trees by a first encryption server and a second encryption server according to the splitting threshold until a preset stopping condition is reached, and completing the establishment of an mth decision tree to obtain a second initial evaluation model;

The initiator server selects the splitting point with the largest score as the threshold value of the current node, calculates the weight, synchronizes the weight to the participant server, and repeatedly executes the substep 13-c to the substep 13-g until the preset stopping condition is reached, and completes the creation of the mth decision tree to obtain a second initial evaluation model.

In one embodiment, the mth decision tree may be represented by the following equation [16 ]:

F _m (x _i )＝F _m-1 (x _i )+f _m (x _i )[16]

wherein F is _m (x _i ) Representing the predicted value of step m, F _m-1 (x _i ) Representing the predicted value of step m-1, f _m (x _i ) Representing the mth decision tree.

After the first initial evaluation model and the second initial evaluation model are obtained through the training of the steps, the initiator server can set corresponding weights for the first initial evaluation model and the second initial evaluation model respectively, and the first risk evaluation model is obtained through a weighted combination mode.

And inputting the first characteristic data obtained by executing the step 11 into a first risk assessment model obtained by training in advance to obtain a first risk value.

Step 14, obtaining a second risk value sent by the participant server;

the second risk value is calculated by inputting second characteristic data into a second risk assessment model which is obtained through pre-training by the participant server. Specifically, the training manner of the second risk assessment model is the same as that of the first risk assessment model, and will not be described herein.

And 15, determining a risk assessment result corresponding to the service request to be identified according to the first risk value calculated by executing the step 13 and the second risk value obtained by executing the step 14.

Specifically, corresponding weights can be set for the first risk value and the second risk value respectively, weighted summation is performed on the first risk value and the second risk value according to the weights, and a risk evaluation result corresponding to the service request to be identified is obtained through calculation.

Specifically, in the embodiment of the present application, the risk assessment result may be calculated as follows: respectively determining weights corresponding to the first risk values and the second risk values; and carrying out weighted summation on the first risk value and the second risk value according to the weight, and calculating to obtain a risk assessment result corresponding to the service request to be identified.

In an implementation manner, the embodiment of the application further provides a risk identification device based on privacy calculation, which is used for solving the problem that the existing data source party is unwilling to perform data sharing due to privacy security, so that a service user cannot acquire multidimensional data, only uses a single data source to perform risk identification, and further causes inaccurate risk identification results. The specific structural diagram of the risk identification device based on privacy calculation is shown in fig. 2, and includes: a feature data acquisition unit 21, a data alignment unit 22, a risk value acquisition unit 23, and a risk assessment unit 24.

The feature data obtaining unit 21 is configured to obtain, according to a received service request to be identified, first feature data corresponding to the service request to be identified, where the service request to be identified carries a user identifier, and the first feature data includes user feature data;

the data alignment unit 22 is configured to process the user identifier according to an unintentional transmission protocol, obtain first private data, and send the first private data to a second encryption server, so that the second encryption server obtains second feature data according to the first private data;

A risk value obtaining unit 23, configured to input the first feature data into a first risk assessment model that is trained in advance, to obtain a first risk value;

a risk value obtaining unit 23, configured to obtain a second risk value sent by the second encryption server, where the second risk value is calculated by the second encryption server by inputting the second feature data into a second risk assessment model obtained by training in advance;

and the risk assessment unit 24 is configured to determine a risk assessment result corresponding to the service request to be identified according to the first risk value and the second risk value.

In one embodiment, the data alignment unit 22 is specifically configured to: processing the user identification according to an unintentional pseudo-random function to obtain first privacy data; constructing an encrypted transmission channel between the first encrypted server and the second encrypted server according to an inadvertent transmission protocol; and sending the first private data to the second encryption server through the encryption transmission channel, so that the second encryption server obtains feature data intersected with the first private data as the second feature data according to a private set PSI algorithm.

In one embodiment, the first risk assessment model is obtained by jointly training the first encryption server and the second encryption server according to a federal learning algorithm.

In one embodiment, the risk assessment model training unit is further included, specifically configured to: acquiring a first data set and a second data set, wherein the first data set is acquired by the first encryption server, the second data set is acquired by the second encryption server, and the data set comprises user characteristic data and service characteristic data; acquiring intersection data of a first data set and a second data set according to a private set intersection PSI algorithm; grouping the intersection data according to the service characteristic data to obtain at least one group of grouping intersection data; training and updating the grouping intersection data based on a longitudinal Logistic Regression (LR) algorithm until a preset convergence condition is reached, and acquiring a first initial evaluation model; training and updating the intersection data based on a gradient lifting XGboost algorithm until a preset convergence condition is reached, and acquiring a second initial evaluation model; and constructing a first risk assessment model according to the first initial assessment model and the second initial assessment model.

In one embodiment, the risk assessment model training unit is specifically configured to: initializing a first model parameter on the first encryption server and a second model parameter on the second encryption model according to the intersection data; calculating and determining the model inner volumes of the first encryption server and the second encryption server according to the first model parameters and the second model parameters; determining a model residual value according to the model inner-volume and determining a loss function based on a longitudinal Logistic Regression (LR) algorithm according to the model inner-volume and the model residual value; the first encryption server calculates model gradients according to the model residual values, iteratively updates the first model parameters according to the model gradients, and updates each parameter in the first model parameters according to the model gradients in each iteration until the loss function meets a preset threshold value, so that logistic regression model training is completed, and a first initial evaluation model is obtained.

In one embodiment, the risk assessment model training unit is specifically configured to: creating a first decision tree from the intersection data; the first encryption server generates a key pair comprising a public key and a private key and sends the public key to the second encryption server; determining a first derivative and a second derivative corresponding to the intersection data according to the intersection data; encrypting the first derivative and the second derivative according to the private key to obtain an encrypted derivative, and transmitting the encrypted derivative to a second encryption server; receiving a second encryption derivative obtained after the second encryption server calculates the encrypted data; determining a feature split point score corresponding to a second encryption server according to the second encryption derivative; and determining a current splitting threshold according to the characteristic splitting point score, and sequentially establishing decision trees by the first encryption server and the second encryption server according to the splitting threshold until a preset stopping condition is reached, so as to finish the establishment of an mth decision tree and obtain a second initial evaluation model.

In one embodiment, the risk assessment unit 24 is specifically configured to determine weights corresponding to the first risk value and the second risk value respectively; and carrying out weighted summation on the first risk value and the second risk value according to the weight, and calculating to obtain a risk assessment result corresponding to the service request to be identified.

By adopting the risk identification device based on privacy calculation provided by the embodiment of the application, when a service user needs to perform risk identification on a service request, a first encryption server corresponding to the service user can acquire corresponding first characteristic data according to the received service request to be identified, and a first risk value corresponding to the service request is calculated by inputting the first characteristic data into a first risk assessment model which is obtained by training in advance; meanwhile, in order to avoid the problem that risk identification is inaccurate due to the fact that only a single data source is used for risk identification, in the embodiment of the application, the first encryption server can process the user identification through an inadvertent transmission protocol to obtain first private data, the first private data is sent to the second encryption server, so that the second encryption server obtains second characteristic data according to the first private data, a second risk value is obtained by calculating a second risk assessment model obtained by inputting the second characteristic data through pre-training, the second risk value is sent to the first encryption server, so that risk assessment of the service request can be completed by the first encryption server according to the first risk value and the second risk value, and as the first risk value and the second risk value are obtained by calculating data provided by two different data sources respectively, compared with a risk identification result obtained by calculating the single data source, the risk identification result provided by the embodiment of the application has higher accuracy; meanwhile, in the embodiment of the application, although the data sharing of multiple data sources is involved, due to the adoption of the privacy calculation scheme, shared data is invisible to all data sources, so that the safety of data owned by all data sources is greatly ensured, more data sources can participate in the data sharing and use the data to perform data mining, risk analysis and other services on the premise of ensuring the data safety, the island effect of the data is eliminated, the value of the data is fully mined, and the accuracy of a risk analysis result obtained by carrying out risk analysis on the data is greatly improved.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to fig. 3, at the hardware level, the electronic device includes a processor, and optionally an internal bus, a network interface, and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory (non-volatile Memory), such as at least 1 disk Memory. Of course, the electronic device may also include hardware required for other services.

The processor, network interface, and memory may be interconnected by an internal bus, which may be an ISA (Industry Standard Architecture ) bus, a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus, or EISA (Extended Industry Standard Architecture ) bus, among others. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 3, but not only one bus or type of bus.

And the memory is used for storing programs. In particular, the program may include program code including computer-operating instructions. The memory may include memory and non-volatile storage and provide instructions and data to the processor.

The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to form the risk identification device based on privacy calculation on a logic level. The processor is used for executing the programs stored in the memory and is specifically used for executing the following operations:

the method comprises the steps that a first encryption server obtains first characteristic data corresponding to a service request to be identified according to the received service request to be identified, wherein the service request to be identified carries a user identifier, and the first characteristic data comprises user characteristic data; processing the user identifier according to an unintentional transmission protocol to obtain first private data, and sending the first private data to a second encryption server so that the second encryption server obtains second characteristic data according to the first private data; the first encryption server inputs the first characteristic data into a first risk assessment model which is obtained through pre-training, and a first risk value is obtained; acquiring a second risk value sent by the second encryption server, wherein the second risk value is calculated by the second encryption server by inputting the second characteristic data into a pre-trained second risk assessment model; and determining a risk assessment result corresponding to the service request to be identified according to the first risk value and the second risk value.

The method performed by the privacy computing-based risk identification electronic device disclosed in the embodiment shown in fig. 3 of the present application may be applied to or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.

Of course, other implementations, such as a logic device or a combination of hardware and software, are not excluded from the electronic device of the present application, that is, the execution subject of the following processing flow is not limited to each logic unit, but may be hardware or a logic device.

The present embodiments also provide a computer readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a portable electronic device comprising a plurality of application programs, enable the portable electronic device to perform the method of the embodiment of fig. 1, and in particular to:

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. A risk identification method based on privacy computation, comprising:

the method comprises the steps that a first encryption server obtains first characteristic data corresponding to a service request to be identified according to the received service request to be identified, wherein the service request to be identified carries a user identifier, and the first characteristic data comprises user characteristic data;

Processing the user identifier according to an unintentional transmission protocol to obtain first private data, and sending the first private data to a second encryption server so that the second encryption server obtains second characteristic data according to the first private data;

the first encryption server inputs the first characteristic data into a first risk assessment model which is obtained through pre-training, and a first risk value is obtained;

acquiring a second risk value sent by the second encryption server, wherein the second risk value is calculated by the second encryption server by inputting the second characteristic data into a pre-trained second risk assessment model;

and determining a risk assessment result corresponding to the service request to be identified according to the first risk value and the second risk value.

2. The method according to claim 1, wherein the processing the user identifier according to the careless transmission protocol, to obtain first private data, and sending the first private data to a second encryption server, so that the second encryption server obtains second characteristic data according to the first private data, specifically includes:

Processing the user identification according to an unintentional pseudo-random function to obtain first privacy data;

constructing an encrypted transmission channel between the first encrypted server and the second encrypted server according to an inadvertent transmission protocol;

and sending the first private data to the second encryption server through the encryption transmission channel, so that the second encryption server obtains feature data intersected with the first private data as the second feature data according to a private set PSI algorithm.

3. The method of claim 1, wherein the first risk assessment model is a result of joint training of the first encryption server and the second encryption server according to a federal learning algorithm.

4. The method according to claim 1, characterized in that the first risk assessment model is pre-trained, in particular comprising:

acquiring a first data set and a second data set, wherein the first data set is acquired by the first encryption server, the second data set is acquired by the second encryption server, and the data set comprises user characteristic data and service characteristic data;

Acquiring intersection data of a first data set and a second data set according to a private set intersection PSI algorithm;

grouping the intersection data according to the service characteristic data to obtain at least one group of grouping intersection data;

training and updating the grouping intersection data based on a longitudinal Logistic Regression (LR) algorithm until a preset convergence condition is reached, and acquiring a first initial evaluation model;

training and updating the intersection data based on a gradient lifting XGboost algorithm until a preset convergence condition is reached, and acquiring a second initial evaluation model;

and constructing a first risk assessment model according to the first initial assessment model and the second initial assessment model.

5. The method of claim 4, wherein the training update of the packet intersection data based on the longitudinal logistic regression LR algorithm is performed until a preset convergence condition is reached, and the obtaining a first initial evaluation model specifically includes:

initializing a first model parameter on the first encryption server and a second model parameter on the second encryption model according to the intersection data;

calculating and determining the model inner volumes of the first encryption server and the second encryption server according to the first model parameters and the second model parameters;

Determining a model residual value according to the model inner-volume and determining a loss function based on a longitudinal Logistic Regression (LR) algorithm according to the model inner-volume and the model residual value;

the first encryption server calculates model gradients according to the model residual values, iteratively updates the first model parameters according to the model gradients, and updates each parameter in the first model parameters according to the model gradients in each iteration until the loss function meets a preset threshold value, so that logistic regression model training is completed, and a first initial evaluation model is obtained.

6. The method of claim 4, wherein the training and updating the intersection data based on the gradient lifting XGboost algorithm until a preset convergence condition is reached, and obtaining a second initial evaluation model specifically includes:

creating a first decision tree from the intersection data;

the first encryption server generates a key pair comprising a public key and a private key and sends the public key to the second encryption server;

determining a first derivative and a second derivative corresponding to the intersection data according to the intersection data;

encrypting the first derivative and the second derivative according to the private key to obtain an encrypted derivative, and transmitting the encrypted derivative to a second encryption server;

Receiving a second encryption derivative obtained after the second encryption server calculates the encrypted data;

determining a feature split point score corresponding to a second encryption server according to the second encryption derivative;

and determining a current splitting threshold according to the characteristic splitting point score, and sequentially establishing decision trees by the first encryption server and the second encryption server according to the splitting threshold until a preset stopping condition is reached, so as to finish the establishment of an mth decision tree and obtain a second initial evaluation model.

7. The method of claim 1, wherein the determining, according to the first risk value and the second risk value, a risk assessment result corresponding to the service request to be identified specifically includes:

respectively determining weights corresponding to the first risk values and the second risk values;

and carrying out weighted summation on the first risk value and the second risk value according to the weight, and calculating to obtain a risk assessment result corresponding to the service request to be identified.

8. A privacy computation-based risk identification apparatus, comprising:

the device comprises a feature data acquisition unit, a feature data processing unit and a feature data processing unit, wherein the feature data acquisition unit is used for acquiring first feature data corresponding to a service request to be identified according to the received service request to be identified, the service request to be identified carries a user identifier, and the first feature data comprises user feature data;

The data alignment unit is used for processing the user identifier according to an careless transmission protocol to obtain first private data, and sending the first private data to a second encryption server so that the second encryption server can obtain second characteristic data according to the first private data;

the risk value acquisition unit is used for inputting the first characteristic data into a first risk assessment model which is obtained through pre-training, so as to obtain a first risk value;

the risk value obtaining unit is used for obtaining a second risk value sent by the second encryption server, wherein the second risk value is obtained by the second encryption server through calculation of a second risk assessment model obtained by inputting the second characteristic data into a pre-training mode;

the risk assessment unit is used for determining a risk assessment result corresponding to the service request to be identified according to the first risk value and the second risk value.

9. A privacy computation-based risk identification device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of any one of claims 1 to 7.

10. A computer readable storage medium, characterized in that instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method of any one of claims 1 to 7.